Stars
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
A monorepo for packages implementing CAT protocol
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Orchestrate zero-shot computer vision models
Example applications, microservices, and code samples for the Internet Computer
Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution
Survey on Data-centric Large Language Models
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
An Open-source Toolkit for LLM Development
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
BlueLM(蓝心大模型): Open large language models developed by vivo AI Lab
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention