Stars
[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
[ECCV 2022] Map-free Visual Relocalization: Metric Pose Relative to a Single Image
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
A Native-PyTorch Library for LLM Fine-tuning
Talk2BEV: Language-Enhanced Bird's Eye View Maps (Accepted to ICRA'24)
yt-dlp / FFmpeg-Builds
Forked from BtbN/FFmpeg-BuildsFFmpeg Builds for yt-dlp
LimSim & LimSim++: Integrated traffic and autonomous driving simulators with (M)LLM support
Open weights LLM from Google DeepMind.
[ECCV'24] Online Vectorized HD Map Construction using Geometry
Official implementations for paper: Anydoor: zero-shot object-level image customization
PyTorch code and models for the DINOv2 self-supervised learning method.
Explorations of Using Python to play Grand Theft Auto 5.
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
[ICLR 2024] Map Learning with Lane Segment for Autonomous Driving
[CoRL 2022] InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
A generative and self-guided robotic agent that endlessly propose and master new skills.
[CVPR 2024] LMDrive: Closed-Loop End-to-End Driving with Large Language Models
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
A curated list of awesome LLM for Autonomous Driving resources (continually updated)
A public available dataset for road boundary detection in aerial images
Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection