Temporal Difference Learning for Model Predictive Control

Hansen, Nicklas; Wang, Xiaolong; Su, Hao

Computer Science > Machine Learning

arXiv:2203.04955 (cs)

[Submitted on 9 Mar 2022 (v1), last revised 19 Jul 2022 (this version, v2)]

Title:Temporal Difference Learning for Model Predictive Control

Authors:Nicklas Hansen, Xiaolong Wang, Hao Su

View PDF

Abstract:Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at this https URL.

Comments:	Code and videos: this https URL
Subjects:	Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2203.04955 [cs.LG]
	(or arXiv:2203.04955v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2203.04955

Submission history

From: Nicklas Hansen [view email]
[v1] Wed, 9 Mar 2022 18:58:28 UTC (10,276 KB)
[v2] Tue, 19 Jul 2022 18:14:36 UTC (11,523 KB)

Computer Science > Machine Learning

Title:Temporal Difference Learning for Model Predictive Control

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Temporal Difference Learning for Model Predictive Control

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators