Highlights
- Pro
Starred repositories
Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.
RealSR super resolution implemented with ncnn library
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Trace liquid surface and liquid level in transparent vessels (python)
Implementation of Nougat Neural Optical Understanding for Academic Documents
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
从零编写游戏引擎教程 Writing a game engine tutorial from scratch
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
ONNX models of YOLO-World (an open-vocabulary object detection).
A Python framework for high performance GPU simulation and graphics
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
This is an implementation of zero-shot instance segmentation using Segment Anything.
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
pix2pix3D: Generating 3D Objects from 2D User Inputs
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Image-to-Image Translation in PyTorch
Official PyTorch implementation of StyleGAN3