Stars
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Tools for merging pretrained large language models.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Collection of open datasets in computer vision.
OCR, layout analysis, reading order, table recognition in 90+ languages
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
✨✨Latest Advances on Multimodal Large Language Models
Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
czczup / peft-torch112
Forked from huggingface/peftPEFT for PyTorch 1.12
骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
One Million Scenes for Autonomous Driving
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Effective Python: Second Edition — Source Code and Errata for the Book
Python bindings to the pointcloud library (pcl)
Geometric Computer Vision Library for Spatial AI
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the came…