Stars
PyTorch code and models for the DINOv2 self-supervised learning method.
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
real time face swap and one-click video deepfake with only a single image
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Eclipse iceoryx™ - true zero-copy inter-process-communication
User-friendly Desktop Client App for AI Models/LLMs (GPT, Claude, Gemini, Ollama...)
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
video stabilization implementation of "A Non-linear filter for gyroscope-based video stabalization"
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
We wirte a filtflit function in java . The filtflit's output is the same as it's in Matlab .
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
🚀 Power Your World with AI - Explore, Extend, Empower.
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
Official Code for MotionCtrl [SIGGRAPH 2024]
This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
(IJCAI 2023) Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection