Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Zhang, Linrui; Shen, Li; Yang, Long; Chen, Shixiang; Yuan, Bo; Wang, Xueqian; Tao, Dacheng

Computer Science > Machine Learning

arXiv:2205.11814 (cs)

[Submitted on 24 May 2022 (v1), last revised 17 Jun 2022 (this version, v2)]

Title:Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Authors:Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Bo Yuan, Xueqian Wang, Dacheng Tao

View PDF

Abstract:Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the proposed method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

Comments:	IJCAI2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Cite as:	arXiv:2205.11814 [cs.LG]
	(or arXiv:2205.11814v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.11814

Submission history

From: Li Shen [view email]
[v1] Tue, 24 May 2022 06:15:51 UTC (1,870 KB)
[v2] Fri, 17 Jun 2022 02:39:04 UTC (1,870 KB)

Computer Science > Machine Learning

Title:Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators