PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Chen, Chang; Baek, Junyeob; Deng, Fei; Kawaguchi, Kenji; Gulcehre, Caglar; Ahn, Sungjin

Computer Science > Machine Learning

arXiv:2406.06793 (cs)

[Submitted on 10 Jun 2024]

Title:PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Authors:Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

View PDF HTML (experimental)

Abstract:Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals. Our experimental results suggest that PlanDQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as AntMaze, Kitchen, and Calvin as long-horizon tasks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.06793 [cs.LG]
	(or arXiv:2406.06793v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.06793

Submission history

From: Chang Chen [view email]
[v1] Mon, 10 Jun 2024 20:59:53 UTC (3,091 KB)

Computer Science > Machine Learning

Title:PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators