Lists (3)
Sort Name ascending (A-Z)
Stars
Empowering Unified MLLM with Multi-granular Visual Generation
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Instruct-tune LLaMA on consumer hardware
A curated list of awesome LLM for Autonomous Driving resources (continually updated)
[CVPR2024] The code for "MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction"
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, ben…
[ICLR'23 Spotlight & IJCV'24] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Hackable and optimized Transformers building blocks, supporting a composable construction.
✨✨Latest Advances on Multimodal Large Language Models
LlamaIndex is a data framework for your LLM applications
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Aligning pretrained language models with instruction data generated by themselves.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
A comprehensive survey of forging vision foundation models for autonomous driving, including challenges, methodologies, and opportunities.
Official code for "DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics" (NeurIPS 2023)
An Open-source Toolkit for LLM Development
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
Using Low-rank adaptation to quickly fine-tune diffusion models.
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch
Official PyTorch implementation for a conditional diffusion probability model in BEV perception
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Layout-Guided multi-view driving scene video generation with latent diffusion model