Skip to content

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

License

Notifications You must be signed in to change notification settings

LoganJoe/Awesome-LLM-3D

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-LLM-3D Awesome

Curated by Xianzheng Ma and Yash Bhalgat


🔥 Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models(LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents.

Table of Content

3D Understanding

ID Keywords Institute Paper Publication Others
1 3D-LLM UCLA 3D-LLM: Injecting the 3D World into Large Language Models NeurIPS'2023 github
2 LL3DA Fudan University LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning Arxiv github
3 LLM-Grounder U-Mich LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Arxiv github
4 Point-Bind CUHK Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following Arxiv github
5 3D-VisTA BIGAI 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment ICCV‘2023 github
6 LEO BIGAI An Embodied Generalist Agent in 3D World Arxiv github
7 OpenScene ETHz OpenScene: 3D Scene Understanding with Open Vocabularies CVPR’2022 github
8 LERF UC Berkeley LERF: Language Embedded Radiance Fields ICCV‘2023 github
9 ViewRefer CUHK ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding ICCV'2023 github
10 Contrastive Lift Oxford-VGG Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion NeurIPS'2023 github
11 CLIP2Scene HKU CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP CVPR'2023 github
12 PointLLM CUHK PointLLM: Empowering Large Language Models to UnderstandPoint Clouds Arxiv github
13 - MIT Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding Arxiv github
14 Chat-3D ZJU Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Arxiv github
15 PLA HKU PLA: Language-Driven Open-Vocabulary 3D Scene Understanding CVPR'2023 github
16 UniT3D TUM UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding ICCV'2023 github
17 CG3D JHU CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Arxiv github
18 JM3D-LLM Xiamen University JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues ACM MM'2023 github
19 Open-Fusion - Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation Arxiv github
20 - - From Language to 3D Worlds: Adapting Language Model for Point Cloud Perception OpenReview -
21 OpenNerf - OpenNerf: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views OpenReview github

3D Reasoning

ID keywords Institute (first) Paper Publication Others
1 3D-CLR UCLA 3D Concept Learning and Reasoning from Multi-View Images CVPR'2023 github
2 Transcribe3D TTI, Chicago Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning CoRL'2023 github

3D Generation

ID keywords Institute Paper Publication Others
1 3D-GPT ANU 3D-GPT: PROCEDURAL 3D MODELING WITH LARGE LANGUAGE MODELS Arxiv github
2 MeshGPT TUM MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers Arxiv project
3 ShapeGPT Fudan University ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model Arxiv github
4 DreamLLM MEGVII & Tsinghua DreamLLM: Synergistic Multimodal Comprehension and Creation Arxiv github
5 LLMR MIT, RPI & Microsoft LLMR: Real-time Prompting of Interactive Worlds using Large Language Models Arxiv github
6 ChatAvatar Deemos Tech DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance ACM TOG website

3D Embodied Agent

ID keywords Institute Paper Publication Others
1 RT-1 Google RT-1: Robotics Transformer for Real-World Control at Scale Arxiv github
2 RT-2 Google-DeepMind RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Arxiv github
3 SayPlan QUT Centre for Robotics SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning CoRL'2023 github
4 UniHSI Shanghai AI Lab Unified Human-Scene Interaction via Prompted Chain-of-Contacts Arxiv github
5 LLM-Planner The Ohio State University LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV'2023 github

3D Benchmarks

ID keywords Institute Paper Publication Others
1 ScanQA RIKEN AIP ScanQA: 3D Question Answering for Spatial Scene Understanding CVPR'2023 github
2 ScanRefer TUM ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language ECCV'2020 github
3 Scan2Cap TUM Scan2Cap: Context-aware Dense Captioning in RGB-D Scans CVPR'2021 github
4 SQA3D BIGAI SQA3D: Situated Question Answering in 3D Scenes ICLR'2023 github
5 - DeepMind & UCL Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects Arxiv github

Contributing

This is an active repository and your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.


If you have any questions about this opinionated list, please get in touch at [email protected].

Acknowledgement

This repo is inspired by Awesome-LLM

About

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published