GitHub - charleshsc/QT: ICML'2024: Q-value Regularized Transformer for Offline Reinforcement Learning

Q-value Regularized Transformer for Offline Reinforcement Learning

| 📑 Paper | 🐱 Github Repo |

Shengchao Hu^1,2, Ziqing Fan^1,2, Chaoqin Huang^1,2, Li Shen^3,4*, Ya Zhang^1,2, Yanfeng Wang^1,2, Dacheng Tao⁵

¹ Shanghai Jiao Tong University, ² Shanghai AI Laboratory, ³ Sun Yat-sen University, ⁴ JD Explore Academy, ⁵ Nanyang Technological University.

Overview

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns within individual trajectories and the optimal returns across multiple trajectories. Fortunately, Dynamic Programming (DP) methods offer a solution by leveraging a value function to approximate optimal future returns for each state, while these techniques are prone to unstable learning behaviors, particularly in long-horizon and sparse-reward scenarios.

Building upon these insights, we propose the Q-value regularized Transformer (QT), which combines the trajectory modeling ability of the Transformer with the predictability of optimal future returns from DP methods. QT learns an action-value function and integrates a term maximizing action-values into the training loss of CSM, which aims to seek optimal actions that align closely with the behavior policy. Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods, highlighting the potential of QT to enhance the state-of-the-art in offline RL.

Quick Start

When your environment is ready, you could run scripts in the "run.sh". For example:

python experiment.py --seed 123 \
    --env hopper --dataset medium   \
    --eta 1.0 --grad_norm 9.0 \
    --exp_name qt --save_path ./save/    \
    --max_iters 500 --num_steps_per_iter 1000 --lr_decay \
    --early_stop --k_rewards --use_discount  \

Citation

If you find this work is relevant with your research or applications, please feel free to cite our work!

@inproceedings{QT,
    title={Q-value Regularized Transformer for Offline Reinforcement Learning},
    author={Hu, Shengchao and Fan, Ziqing and Huang, Chaoqin and Shen, Li and Zhang, Ya and Wang, Yanfeng and Tao, Dacheng},
    booktitle={International Conference on Machine Learning},
    year={2024},
}

Acknowledgments

This repo benefits from DT and Diffusion-QL. Thanks for their wonderful works!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
D4RL		D4RL
decision_transformer		decision_transformer
LICENSE		LICENSE
README.md		README.md
experiment.py		experiment.py
logger.py		logger.py
run.sh		run.sh
tabulate.py		tabulate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q-value Regularized Transformer for Offline Reinforcement Learning

| 📑 Paper | 🐱 Github Repo |

Contents

Overview

Quick Start

Citation

Acknowledgments

About

Releases

Packages

Languages

License

charleshsc/QT

Folders and files

Latest commit

History

Repository files navigation

Q-value Regularized Transformer for Offline Reinforcement Learning

| 📑 Paper | 🐱 Github Repo |

Contents

Overview

Quick Start

Citation

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages