Awesome-LLM-3D

Curated by Xianzheng Ma and Yash Bhalgat

🔥 Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models(LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents.

Table of Content

Awesome-LLM-3D

3D Understanding

ID	Keywords	Institute	Paper	Publication	Others
1	3D-LLM	UCLA	3D-LLM: Injecting the 3D World into Large Language Models	NeurIPS'2023	github
2	LL3DA	Fudan University	LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning	Arxiv	github
3	LLM-Grounder	U-Mich	LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent	Arxiv	github
4	Point-Bind	CUHK	Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following	Arxiv	github
5	3D-VisTA	BIGAI	3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	ICCV‘2023	github
6	LEO	BIGAI	An Embodied Generalist Agent in 3D World	Arxiv	github
7	OpenScene	ETHz	OpenScene: 3D Scene Understanding with Open Vocabularies	CVPR’2022	github
8	LERF	UC Berkeley	LERF: Language Embedded Radiance Fields	ICCV‘2023	github
9	ViewRefer	CUHK	ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding	ICCV'2023	github
10	Contrastive Lift	Oxford-VGG	Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion	NeurIPS'2023	github
11	CLIP2Scene	HKU	CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP	CVPR'2023	github
12	PointLLM	CUHK	PointLLM: Empowering Large Language Models to UnderstandPoint Clouds	Arxiv	github
13	-	MIT	Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding	Arxiv	github
14	Chat-3D	ZJU	Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes	Arxiv	github
15	PLA	HKU	PLA: Language-Driven Open-Vocabulary 3D Scene Understanding	CVPR'2023	github
16	UniT3D	TUM	UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding	ICCV'2023	github
17	CG3D	JHU	CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition	Arxiv	github
18	JM3D-LLM	Xiamen University	JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues	ACM MM'2023	github
19	Open-Fusion	-	Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation	Arxiv	github
20	-	-	From Language to 3D Worlds: Adapting Language Model for Point Cloud Perception	OpenReview	-
21	OpenNerf	-	OpenNerf: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views	OpenReview	github

3D Reasoning

ID	keywords	Institute (first)	Paper	Publication	Others
1	3D-CLR	UCLA	3D Concept Learning and Reasoning from Multi-View Images	CVPR'2023	github
2	Transcribe3D	TTI, Chicago	Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning	CoRL'2023	github

3D Generation

ID	keywords	Institute	Paper	Publication	Others
1	3D-GPT	ANU	3D-GPT: PROCEDURAL 3D MODELING WITH LARGE LANGUAGE MODELS	Arxiv	github
2	MeshGPT	TUM	MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers	Arxiv	project
3	ShapeGPT	Fudan University	ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model	Arxiv	github
4	DreamLLM	MEGVII & Tsinghua	DreamLLM: Synergistic Multimodal Comprehension and Creation	Arxiv	github
5	LLMR	MIT, RPI & Microsoft	LLMR: Real-time Prompting of Interactive Worlds using Large Language Models	Arxiv	github
6	ChatAvatar	Deemos Tech	DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance	ACM TOG	website

3D Embodied Agent

ID	keywords	Institute	Paper	Publication	Others
1	RT-1	Google	RT-1: Robotics Transformer for Real-World Control at Scale	Arxiv	github
2	RT-2	Google-DeepMind	RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	Arxiv	github
3	SayPlan	QUT Centre for Robotics	SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning	CoRL'2023	github
4	UniHSI	Shanghai AI Lab	Unified Human-Scene Interaction via Prompted Chain-of-Contacts	Arxiv	github
5	LLM-Planner	The Ohio State University	LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	ICCV'2023	github

3D Benchmarks

ID	keywords	Institute	Paper	Publication	Others
1	ScanQA	RIKEN AIP	ScanQA: 3D Question Answering for Spatial Scene Understanding	CVPR'2023	github
2	ScanRefer	TUM	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	ECCV'2020	github
3	Scan2Cap	TUM	Scan2Cap: Context-aware Dense Captioning in RGB-D Scans	CVPR'2021	github
4	SQA3D	BIGAI	SQA3D: Situated Question Answering in 3D Scenes	ICLR'2023	github
5	-	DeepMind & UCL	Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects	Arxiv	github

Contributing

This is an active repository and your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.

If you have any questions about this opinionated list, please get in touch at [email protected].

Acknowledgement

This repo is inspired by Awesome-LLM

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-LLM-3D

Curated by Xianzheng Ma and Yash Bhalgat

Table of Content

3D Understanding

3D Reasoning

3D Generation

3D Embodied Agent

3D Benchmarks

Contributing

Acknowledgement

About

Releases

Packages

License

LoganJoe/Awesome-LLM-3D

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLM-3D

Curated by Xianzheng Ma and Yash Bhalgat

Table of Content

3D Understanding

3D Reasoning

3D Generation

3D Embodied Agent

3D Benchmarks

Contributing

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages