Imitation from Observation with Bootstrapped Contrastive Learning

Imitation from observation algorithm to train agents to perform tasks using only a limited number of pixel-based expert observations and based on a behavioral learning principle.

An encoder that takes videos of agent trajectories and embeds them in a "behavioral space" is trained using contrastive learning (enforcing successful trajectories to lie close together). We use this to encode N expert videos in a region of the behavioral space in blue. The reward function corresponds to the distance of the agent's trajectory to the set of expert trajectories. As the agent progresses, its current trajectories are incorporated as "negative" examples into the contrastive learning in red.

Demonstration videos. The expert on the left and the IfO agent on the right.

Installation

conda env create -f env.yml
conda activate ifobl

Training (example with reacher hard task)

Train expert

python train.py task=reacher_hard exp_group=reacher_hard exp_id=1

Watch training on tensorboard

tensorboard --logdir exp_local

Generate 5000 expert videos

export PYTHONPATH="${PYTHONPATH}:`pwd`" && python scripts/generate_dmc_video.py --env reacher_hard2 --episode_len 60

Use --num-train and --num-valid flags to change respectively the number of training and validation videos to generate.

Pretrain image and video encoders

python train_cmc.py task=reacher_hard2 exp_id=1

Watch training on tensorboard

tensorboard --logdir cmc_exp_local

Train agent

python train_rlv2.py task=reacher_hard2

Watch training on tensorboard

tensorboard --logdir rlv2_exp_local

Evaluation videos are generated in rlv2_exp_local/reacher_hard/<exp_id>/train_video directory.

Additional information

cmc_model.py: contains the models, neural networks and losses used to train the trajectory encoder
drqv2.py: contains the implementations of the policy and q-value functions used to train the agents and experts
rl_model.py: contains the implementations of the policy and q-value functions used to train the state-based agents
final_run: contains the scripts to train the experts and agents for other tasks such as Walker run, Hopper stand and finger turn
dmc.py: contains environment creation functions and environment wrappers
scripts/generate_dmc_video.py: shows how to use trained agents in test time

Acknowledgements

We reuse Denis Yarats's DrQv2 code to train our RL agents

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
cc		cc
cfgs		cfgs
cmc_cfgs		cmc_cfgs
demo		demo
domain_xmls		domain_xmls
final_run		final_run
rl_cfgs		rl_cfgs
rlv2_cfgs		rlv2_cfgs
run		run
run_v2		run_v2
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alexnet.py		alexnet.py
cmc_model.py		cmc_model.py
context_changers.py		context_changers.py
datasets.py		datasets.py
dmc.py		dmc.py
drqv2.py		drqv2.py
env.yml		env.yml
logger.py		logger.py
losses.py		losses.py
metaworld_env.py		metaworld_env.py
nets.py		nets.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
requirements_cc.txt		requirements_cc.txt
rl_model.py		rl_model.py
train.py		train.py
train_cmc.py		train_cmc.py
train_rl.py		train_rl.py
train_rlv2.py		train_rlv2.py
utils.py		utils.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imitation from Observation with Bootstrapped Contrastive Learning

Installation

Training (example with reacher hard task)

Additional information

Acknowledgements

About

Releases

Packages

Languages

License

medric49/ifobl

Folders and files

Latest commit

History

Repository files navigation

Imitation from Observation with Bootstrapped Contrastive Learning

Installation

Training (example with reacher hard task)

Additional information

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages