Official implementation of the following paper:
Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid
ICRA 2023
[Paper] | [Project website]
This repository includes the training of HOLD models (a.k.a functional distance models) on video data. This implementation is based on Scenic.
For the RL policy training experiments in the paper, see https://github.com/minttusofia/hold-policies.
To train models, python 3.9 is required (required for dmvr, which is a dependency of scenic).
To install this codebase, run
$ git clone https://github.com/minttusofia/hold-rewards.git
$ cd hold-rewards
$ pip install .
For a GPU-enabled installation of jax (recommended), see https://github.com/google/jax/tree/jax-v0.2.28#pip-installation-gpu-cuda.
For example, to install jax for CUDA >= 11.1 and cuDNN >= 8.2, run:
$ pip install "jax[cuda11_cudnn82]>=0.2.21,<0.3" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Make the following changes to the scenic config file
(in scenic/projects/func_dist/configs/holdr/vivit_large_factorized_encoder.py
for HOLD-R, or
scenic/projects/func_dist/configs/holdc/resnet50.py
for HOLD-C):
- Set
DATA_DIR
to the directory where Something-Something v2 data (and optionally, any pretrained model checkpoints) are saved. - Set
NUM_DEVICES
to the number of GPUs / TPUs to use.
Run
python scenic/projects/func_dist/main.py \
--config=scenic/projects/func_dist/configs/holdr/vivit_large_factorized_encoder.py \
--workdir=/PATH/TO/OUT_DIR
where /PATH/TO/OUT_DIR
is the directory to which experiment checkpoints will be written.
For HOLD-C, use --config=scenic/projects/func_dist/configs/holdc/resnet50.py
.
We release the trained model checkpoints used in the paper:
To use these as reward models in policy training with SAC (as in the paper), please refer to our policy training repo https://github.com/minttusofia/hold-policies.
If you found this implementation or the released models useful, you are encouraged to cite our paper:
@article{alakuijala2023learning,
title={Learning Reward Functions for Robotic Manipulation by Observing Humans},
author={Alakuijala, Minttu and Dulac-Arnold, Gabriel and Mairal, Julien and Ponce, Jean and Schmid, Cordelia},
journal={2023 IEEE International Conference on Robotics and Automation (ICRA)},
year={2023},
}