Stars
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Reaching LLaMA2 Performance with 0.1M Dollars
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Phi-3.5 Vision
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
A high-throughput and memory-efficient inference and serving engine for LLMs
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Focal-Unet: Unet-like Focal Modulation for Medical Image Segmentation
FocalNet / FocalNet-DINO
Forked from IDEA-Research/DINOThis repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO
detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"