Skip to content

Code for the distance learning part of Learning Reward Functions for Robotic Manipulation by Watching Humans (ICRA 2023)

License

Notifications You must be signed in to change notification settings

minttusofia/hold-rewards

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HOLD Reward Models

Official implementation of the following paper:

Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid
ICRA 2023
[Paper] | [Project website]

This repository includes the training of HOLD models (a.k.a functional distance models) on video data. This implementation is based on Scenic.

For the RL policy training experiments in the paper, see https://github.com/minttusofia/hold-policies.

Installation

To train models, python 3.9 is required (required for dmvr, which is a dependency of scenic).

To install this codebase, run

$ git clone https://github.com/minttusofia/hold-rewards.git
$ cd hold-rewards
$ pip install .

For a GPU-enabled installation of jax (recommended), see https://github.com/google/jax/tree/jax-v0.2.28#pip-installation-gpu-cuda.
For example, to install jax for CUDA >= 11.1 and cuDNN >= 8.2, run:

$ pip install "jax[cuda11_cudnn82]>=0.2.21,<0.3" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Training HOLD on Something-Something v2

Make the following changes to the scenic config file
(in scenic/projects/func_dist/configs/holdr/vivit_large_factorized_encoder.py for HOLD-R, or
scenic/projects/func_dist/configs/holdc/resnet50.py for HOLD-C):

  • Set DATA_DIR to the directory where Something-Something v2 data (and optionally, any pretrained model checkpoints) are saved.
  • Set NUM_DEVICES to the number of GPUs / TPUs to use.

Run

python scenic/projects/func_dist/main.py \
--config=scenic/projects/func_dist/configs/holdr/vivit_large_factorized_encoder.py \
--workdir=/PATH/TO/OUT_DIR

where /PATH/TO/OUT_DIR is the directory to which experiment checkpoints will be written.

For HOLD-C, use --config=scenic/projects/func_dist/configs/holdc/resnet50.py.

Trained models

We release the trained model checkpoints used in the paper:

To use these as reward models in policy training with SAC (as in the paper), please refer to our policy training repo https://github.com/minttusofia/hold-policies.

Citing HOLD

If you found this implementation or the released models useful, you are encouraged to cite our paper:

@article{alakuijala2023learning,  
    title={Learning Reward Functions for Robotic Manipulation by Observing Humans},  
    author={Alakuijala, Minttu and Dulac-Arnold, Gabriel and Mairal, Julien and Ponce, Jean and Schmid, Cordelia},  
    journal={2023 IEEE International Conference on Robotics and Automation (ICRA)},  
    year={2023},  
}

About

Code for the distance learning part of Learning Reward Functions for Robotic Manipulation by Watching Humans (ICRA 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published