Starred repositories
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Korean Sentence Embedding Model Performance Benchmark for RAG
Official repository for "AM-RADIO: Reduce All Domains Into One"
KoRean based SBERT pre-trained models (KR-SBERT) for PyTorch
Utilities intended for use with Llama models.
AnyLoc: Universal Visual Place Recognition (RA-L 2023)
LightGlue: Local Feature Matching at Light Speed (ICCV 2023)
Doppelgangers: Learning to Disambiguate Images of Similar Structures
A personal list of papers and resources of image matching and pose estimation, including perspective images and panoramas.
Code release for CVPR'24 submission 'OmniGlue'
Implementation of XFeat (CVPR 2024). Do you need robust and fast local feature extraction? You are in the right place!
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Korean Sentence Embedding Repository
[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
MTEB: Massive Text Embedding Benchmark
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Democratization of RT-2 "RT-2: New model translates vision and language into action"
Efficient vision foundation models for high-resolution generation and perception.
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingual text perception and comprehension capabilities across nine…
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone