MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Chen, Yuxin; Tang, Chen; Li, Chenran; Tian, Ran; Zhan, Wei; Stone, Peter; Tomizuka, Masayoshi

Computer Science > Robotics

arXiv:2406.16258 (cs)

[Submitted on 24 Jun 2024 (v1), last revised 28 Oct 2024 (this version, v2)]

Title:MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Authors:Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Wei Zhan, Peter Stone, Masayoshi Tomizuka

View PDF HTML (experimental)

Abstract:Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. Instead of inferring the complete human behavior characteristics, MEReQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function. Extensive evaluations on simulated and real-world tasks demonstrate that MEReQ achieves sample-efficient policy alignment from human intervention.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
ACM classes:	I.2.6; I.2.9
Cite as:	arXiv:2406.16258 [cs.RO]
	(or arXiv:2406.16258v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2406.16258

Submission history

From: Yuxin Chen [view email]
[v1] Mon, 24 Jun 2024 01:51:09 UTC (32,884 KB)
[v2] Mon, 28 Oct 2024 19:17:41 UTC (19,299 KB)

Computer Science > Robotics

Title:MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators