Bridging the Open vs. Closed Loop Gap: New Open-Loop Evaluation Benchmarks for End-to-End Autonomous Driving Planning
- TODO List
- Contributions
- Inference the Trajectory
- Quick Evaluation based on Provided Trajectory
- Benchmark Codes
- Related nuPlan Implementation
- Acknowledgment
- License
- Code and Data for Quick Evaluation Release
- nuPlan Evaluation Release
- Detailed Code and Instructions for Four Benchmark Methods Release
- Incorprating Other Open-loop Metrics
-
We comprehensively reveal the limitation of current L2 open-loop evaluation, demonstrating that current L2 metric cannot account for the dynamics of the real world.
-
We introduce novel open-loop evaluation benchmarks including 4 improvements specifically designed for end-to-end autonomous driving planning, better reflecting the dynamic nature of real-world driving scenarios.
-
Extensive experiments and reevaluation of existing methods demonstrate that our approach significantly mitigates the issues associated with conventional open-loop evaluations, bridging the gap between open-loop and closed-loop evaluations.
-
We explore the relationship between open-loop and closed-loop benchmarks, highlighting that a well-designed open-loop benchmark can serve as an effective rapid test for end-to-end autonomous driving.
We show how to reimplement four methods for trajectory inference in data readme. The methods include:
-
MLP. We build a MLP with 3 hidden layers, each with 512 hidden units, and output layer with dimension 12 representing 6 waypoints in a trajectory.
-
Llama2 Driver. We follow gptdriver while we use llama2-7b instead of GPT-3.5. Our Llama2 Driver (ego only) takes only ego status as input and outputs a trajectory. Both inputs and outputs are just text. For our Llama2 Driver(without ego status version), the model takes navigation command and the motion prediction ground truth results, which are converted to text refered to \cite{mao2023gptdriver}, and outputs a trajectory.
-
UniAD and VAD-Base are well-known and peer-reviewed open-source end-to-end autonomous driving models.
Notes: We benchmark these methods because they are open-sourced, allowing for transparent validation of results. We are open to including additional methods in the benchmark as long as their predicted trajectories are provided or can be reproduced using their official code repositories.
For simplification, we provide the trajectory data to be evaluated in data folder. You can use these data for the metric code below.
python metric_codes/compute_L2.py
# For results of different models:
# change the npy files used in
# pred = np.load('vad_ego_prediction.npy')
python metric_codes/compute_ADE_x.py
# For results of different models:
# change the npy files used in
# pred = np.load('vad_ego_prediction.npy')
python metric_codes/compute_ADE_y.py
# For results of different models:
# change the npy files used in
# pred = np.load('vad_ego_prediction.npy')
#For 11 centers
python metric_codes/kmeans_11_MAC.py
#For 16 centers
python metric_codes/kmeans_16_MAC.py
#For 21 centers
python metric_codes/kmeans_21_MAC.py
#For 11 centers
python metric_codes/kmeans_11.py
#For 16 centers
python metric_codes/kmeans_16.py
#For 21 centers
python metric_codes/kmeans_21.py
For the detailed results, please refer to our paper.
You can find more in /tuplan_plugin/
The /tuplan_plugin/README.md
shows the training, inference and evaluation for nuplan benchmarks.
Sincere appreciation for their great contributions.
Before using the dataset, you should register on the nuScenes website and agree to the terms of use of the nuScenes. The code and the generated data are both subject to the MIT License.