DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Xu, Zhenhua; Zhang, Yujia; Xie, Enze; Zhao, Zhen; Guo, Yong; Wong, Kwan-Yee. K.; Li, Zhenguo; Zhao, Hengshuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.01412 (cs)

[Submitted on 2 Oct 2023 (v1), last revised 14 Mar 2024 (this version, v4)]

Title:DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Authors:Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao

View PDF HTML (experimental)

Abstract:Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to-end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mix-finetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the fine-tuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V. The code and dataset will be publicly available.

Comments:	The project page is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2310.01412 [cs.CV]
	(or arXiv:2310.01412v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.01412

Submission history

From: Zhenhua Xu [view email]
[v1] Mon, 2 Oct 2023 17:59:52 UTC (19,225 KB)
[v2] Sun, 8 Oct 2023 13:47:23 UTC (19,225 KB)
[v3] Tue, 13 Feb 2024 02:47:59 UTC (10,346 KB)
[v4] Thu, 14 Mar 2024 17:05:43 UTC (10,346 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators