-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
286 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,112 @@ | ||
# VisTracker | ||
Official implementation for the CVPR'23 paper | ||
# VisTracker (CVPR'23) | ||
#### Official implementation for the CVPR 2023 paper: Visibility Aware Human-Object Interaction Tracking from Single RGB Camera | ||
|
||
Train a model: | ||
[[ArXiv]](https://arxiv.org/abs/2204.02445) [[Project Page]](https://virtualhumans.mpi-inf.mpg.de/chore) | ||
|
||
<p align="left"> | ||
<img src="https://datasets.d2.mpi-inf.mpg.de/cvpr23vistracker/teaser.png" alt="teaser" width="512"/> | ||
</p> | ||
|
||
## Contents | ||
1. [Dependencies](#dependencies) | ||
2. [Dataset preparation](#dataset-preparation) | ||
3. [Run demo](#run-demo) | ||
4. [Training](#training) | ||
5. [Evaluation](#evaluation) | ||
5. [Citation](#citation) | ||
6. [License](#license) | ||
|
||
## Dependencies | ||
The code is tested with `torch 1.6, cuda10.1, debian 11`. The environment setup is the same as CHORE, ECCV'22. Please follow the instructions [here](https://github.com/xiexh20/CHORE#dependencies). | ||
|
||
|
||
## Dataset preparation | ||
We work on the extended BEHAVE dataset, to have the dataset ready, you need to download some files and run some processing scripts to prepare the data. All files are provided in [this webpage](https://virtualhumans.mpi-inf.mpg.de/behave/license.html). | ||
|
||
1. Download the video files: [color videos of test sequences](https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/video/date03_color.tar), [frame time information](https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/video/date03_time.tar). | ||
2. Extract RGB images: follow [this script](https://github.com/xiexh20/behave-dataset#generate-images-from-raw-videos) from BEHAVE dataset repo to extract RGB images. Please enable `-nodepth` tag to extract RGB images only. Example: `python tools/video2images.py /BS/xxie-3/static00/rawvideo/Date03/Date03_Sub03_chairwood_hand.0.color.mp4 /BS/xxie-4/static00/behave-fps30/ -nodepth` | ||
3. Download human and object masks: [masks for all test sequences](https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/masks/masks-date03.tar). Download and unzip them into one folder. | ||
4. Rename the mask files to follow the BEHAVE dataset structure: `python tools/rename_masks.py -s SEQ_FOLDER -m MASK_ROOT` Example: `python tools/rename_masks.py -s /BS/xxie-4/static00/behave-fps30/Date03_Sub03_chairwood_hand -m /BS/xxie-5/static00/behave_release/30fps-masks-new/` | ||
5. Download [openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose) and [FrankMocap](https://github.com/facebookresearch/frankmocap) detections: [packed data for test sequences](https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/behave-packed-test-seqs.zip) | ||
6. Process the packed data to BEHAVE dataset format: `python tools/pack2separate.py -s SEQ_FOLDER -p PACKED_ROOT`. Example: `python tools/pack2separate.py -s /BS/xxie-4/static00/behave-fps30/Date03_Sub03_chairwood_hand -p /scratch/inf0/user/xxie/behave-packed` | ||
|
||
## Run demo | ||
You can find all the commands of the pipeline in `scripts/demo.sh`. To run it, you need to download the pretrained models from [here](https://datasets.d2.mpi-inf.mpg.de/cvpr23vistracker/models.zip) and unzip them in the folder `experiments`. | ||
|
||
Also, the dataset files should be prepared as described above. | ||
|
||
Once done, you can run the demo for one sequence simply by: | ||
```shell | ||
bash scripts/vistracker_pipeline.sh SEQ_FOLDER | ||
``` | ||
example: `bash scripts/vistracker_pipeline.sh /BS/xxie-4/static00/test-seq/Date03_Sub03_chairwood_hand` | ||
|
||
It will take around 6~8 hours to finish a sequence of 1500 frames (50s). | ||
|
||
Tips: the runtime bottlenecks are the SMPL-T pre-fitting (step 1-2) and joint optimization (step 6) in `scripts/demo.sh`. If you have a cluster with multiple GPU machines, you can run multiple sequences in parallel by specifying the `--start` and `--end` option for these commands. This will separate one long sequence into several chunks and each job only optimizes the chunk specified by start and end frames. | ||
|
||
## Training | ||
Train a SIF-Net model: | ||
```shell | ||
python -m torch.distributed.launch --nproc_per_node=NUM_GPU --master_port 6789 --use_env train_launch.py -en tri-vis-l2 | ||
``` | ||
Note that to train this model, you also need to prepare the GT registrations (meshes) in order to run online boundary sampling during training. We provide an example script to save SMPL and object meshes from packed parameters: | ||
`python tools/pack2separate_params.py -s SEQ_FOLDER -p PACKED_PATH`, similar to `tools/pack2separate.py`. The packed training data for this can be downloaded from [here (part1)](https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/behave-packed-train-seqs-p1.zip) and [here (part2)](https://datasets.d2.mpi-inf.mpg.de/cvpr22behave/behave-packed-train-seqs-p2.zip) | ||
|
||
|
||
Train motion infill model: | ||
```shell | ||
python -m torch.distributed.launch --nproc_per_node=NUM_GPU --master_port 6787 --use_env train_mfiller.py -en cmf-k4-lrot | ||
``` | ||
For this, you need to specify the path to all packed GT files. | ||
|
||
|
||
## Evaluation | ||
```shell | ||
python recon/eval/evalvideo_packed.py -split splits/behave-test-30fps.json -sn RECON_NAME -m ours -w WINDOW_SIZE | ||
``` | ||
where `RECON_NAME` is your own save name for the reconstruction, and `WINDOW_SIZE` is the alignment window size (main paper Sec. 4). `WINDOW_SIZE=1` is equivalent to the evaluation used by CHORE. | ||
|
||
## Citation | ||
If you use our code, please cite: | ||
```bibtex | ||
@inproceedings{xie2023vistracker, | ||
title = {Visibility Aware Human-Object Interaction Tracking from Single RGB Camera}, | ||
author = {Xie, Xianghui and Bhatnagar, Bharat Lal and Pons-Moll, Gerard }, | ||
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
month={June}, | ||
year={2023} | ||
} | ||
``` | ||
If you use BEHAVE dataset, please also cite: | ||
```bibtex | ||
@inproceedings{bhatnagar22behave, | ||
title = {BEHAVE: Dataset and Method for Tracking Human Object Interactions}, | ||
author={Bhatnagar, Bharat Lal and Xie, Xianghui and Petrov, Ilya and Sminchisescu, Cristian and Theobalt, Christian and Pons-Moll, Gerard}, | ||
booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
month = {jun}, | ||
organization = {{IEEE}}, | ||
year = {2022}, | ||
} | ||
``` | ||
|
||
## License | ||
Copyright (c) 2023 Xianghui Xie, Max-Planck-Gesellschaft | ||
|
||
Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software"). | ||
|
||
The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects. | ||
|
||
Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
|
||
You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time. | ||
|
||
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the **Visibility Aware Human-Object Interaction Tracking from Single RGB Camera** paper in documents and papers that report on research using this Software. | ||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
from .smplpytorch import SMPL_Layer | ||
from .smpl_generator import SMPLHGenerator | ||
from .wrapper_pytorch import SMPL_MODEL_ROOT | ||
|
||
|
||
def get_smpl(gender, hands, model_root='/BS/xxie2020/static00/mysmpl/smplh'): | ||
def get_smpl(gender, hands, model_root=SMPL_MODEL_ROOT): | ||
"simple wrapper to get SMPL model" | ||
return SMPL_Layer(model_root=model_root, | ||
gender=gender, hands=hands) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
""" | ||
packed openpose and mocap results to separate files in each frame in BEHAVE dataset format | ||
mocap: only save the parameters, not meshes | ||
keywords: pose, betas | ||
openpose: save as json files | ||
keywords: body_joints, face_joints, left_hand_joints, right_hand_joints | ||
""" | ||
import os, sys | ||
sys.path.append(os.getcwd()) | ||
import joblib | ||
import os.path as osp | ||
from tqdm import tqdm | ||
import json | ||
from behave.frame_data import FrameDataReader | ||
|
||
|
||
def pack2separate(args): | ||
reader = FrameDataReader(args.seq_folder) | ||
seq_name = reader.seq_name | ||
packed_data = joblib.load(osp.join(args.packed_path, f'{seq_name}_GT-packed.pkl')) | ||
assert len(packed_data['frames']) == len(reader), f'Warning: number of frames does not match for seq {seq_name}!' | ||
|
||
# save as separate openpose and mocap files | ||
for idx in tqdm(range(len(reader))): | ||
for kid in reader.kids: | ||
outfile = osp.join(reader.get_frame_folder(idx), f'k{kid}.mocap.json') | ||
if not osp.isfile(outfile): | ||
json.dump( | ||
{ | ||
"pose":packed_data['mocap_poses'][idx, kid].tolist(), | ||
"betas":packed_data['mocap_betas'][idx, kid].tolist(), | ||
}, | ||
open(outfile, 'w') | ||
) | ||
outfile = osp.join(reader.get_frame_folder(idx), f'k{kid}.color.json') | ||
if not osp.isfile(outfile): | ||
json.dump( | ||
{ | ||
"body_joints": packed_data["joints2d"][idx, kid].tolist(), | ||
}, | ||
open(outfile, 'w') | ||
) | ||
print("all done") | ||
|
||
|
||
if __name__ == '__main__': | ||
from argparse import ArgumentParser | ||
parser = ArgumentParser() | ||
parser.add_argument('-s', '--seq_folder') | ||
parser.add_argument('-p', '--packed_path', default="/scratch/inf0/user/xxie/behave-packed")# root path to all packed files | ||
|
||
args = parser.parse_args() | ||
|
||
pack2separate(args) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
""" | ||
load SMPL and object parameters from packed pkl file (extended BEHAVE data) | ||
save SMPL and object registration (meshes) in BEHAVE data format | ||
""" | ||
import os, sys | ||
|
||
import numpy as np | ||
import torch | ||
|
||
sys.path.append(os.getcwd()) | ||
import joblib | ||
import os.path as osp | ||
from tqdm import tqdm | ||
import json | ||
from psbody.mesh import Mesh | ||
from scipy.spatial.transform import Rotation | ||
|
||
from behave.frame_data import FrameDataReader | ||
from lib_smpl import get_smpl | ||
from behave.utils import load_template | ||
|
||
|
||
def pack2separate_params(args): | ||
reader = FrameDataReader(args.seq_folder) | ||
seq_name = reader.seq_name | ||
smpl_name, obj_name = "fit03", 'fit01-smooth' | ||
obj_cat = reader.seq_info.get_obj_name(True) | ||
|
||
packed_data = joblib.load(osp.join(args.packed_path, f'{seq_name}_GT-packed.pkl')) | ||
assert len(packed_data['frames']) == len(reader), f'Warning: number of frames does not match for seq {seq_name}!' | ||
|
||
temp = load_template(reader.seq_info.get_obj_name()) | ||
smplh_layer = get_smpl(reader.seq_info.get_gender(), True) | ||
faces = smplh_layer.faces.copy() | ||
for idx in tqdm(range(0, len(reader), args.interval)): | ||
# object | ||
outfile = osp.join(reader.get_frame_folder(idx), obj_cat, obj_name, f'{obj_cat}_fit.ply') | ||
if not osp.isfile(outfile): | ||
os.makedirs(osp.dirname(outfile), exist_ok=True) | ||
angle, trans = packed_data['obj_angles'][idx], packed_data['obj_trans'][idx] | ||
rot = Rotation.from_rotvec(angle).as_matrix() | ||
obj_fit = np.matmul(temp.v, rot.T) + trans | ||
Mesh(obj_fit, temp.f).write_ply(outfile) | ||
|
||
# SMPL | ||
outfile = osp.join(reader.get_frame_folder(idx), 'person', smpl_name, 'person_fit.ply') | ||
if not osp.isfile(outfile): | ||
os.makedirs(osp.dirname(outfile), exist_ok=True) | ||
verts, _, _, _ = smplh_layer(torch.from_numpy(packed_data['poses'][idx:idx+1]), | ||
torch.from_numpy(packed_data['betas'][idx:idx+1]), | ||
torch.from_numpy(packed_data['trans'][idx:idx+1])) | ||
verts = verts[0].cpu().numpy() | ||
Mesh(verts, faces).write_ply(outfile) | ||
print("all done") | ||
|
||
|
||
if __name__ == '__main__': | ||
from argparse import ArgumentParser | ||
|
||
parser = ArgumentParser() | ||
parser.add_argument('-s', '--seq_folder') | ||
parser.add_argument('-p', '--packed_path', | ||
default="/scratch/inf0/user/xxie/behave-packed") # root path to all packed files | ||
parser.add_argument('-i', '--interval', default=30, type=int, | ||
help="interval between two saved frames, if set to 1, save for all frames") | ||
|
||
|
||
args = parser.parse_args() | ||
|
||
pack2separate_params(args) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
""" | ||
rename and move the human and object masks to BEHAVE dataset structure | ||
""" | ||
|
||
import os, sys | ||
sys.path.append(os.getcwd()) | ||
import joblib | ||
import os.path as osp | ||
from tqdm import tqdm | ||
import json | ||
from glob import glob | ||
from behave.frame_data import FrameDataReader | ||
|
||
|
||
def rename_masks(args): | ||
reader = FrameDataReader(args.seq_folder) | ||
seq_name = reader.seq_name | ||
mask_path = osp.join(args.mask_path, seq_name) | ||
ps_files = glob(osp.join(mask_path, 't*k1.person_mask.png')) | ||
obj_files = glob(osp.join(mask_path, 't*k1.obj_rend_mask.png')) | ||
assert len(ps_files) == len(obj_files), 'the number of mask files does not match!' | ||
assert len(ps_files) == len(reader), 'the number of frames between mask and RGB images does not match!' | ||
|
||
files_all = glob(osp.join(mask_path, 't*.png')) | ||
count = 0 | ||
for file in tqdm(files_all): | ||
fname = osp.join(args.seq_folder, *osp.basename(file).split("-")) | ||
if osp.isfile(fname): | ||
continue | ||
cmd = f'mv {file} {fname}' | ||
# print(cmd) | ||
os.system(cmd) | ||
# count += 1 | ||
# if count == 10: | ||
# break | ||
print("all done!") | ||
|
||
|
||
|
||
if __name__ == '__main__': | ||
from argparse import ArgumentParser | ||
|
||
parser = ArgumentParser() | ||
parser.add_argument('-s', '--seq_folder') | ||
parser.add_argument('-m', '--mask_path', | ||
default="/scratch/inf0/user/xxie/behave-packed") # root path to all mask files | ||
|
||
args = parser.parse_args() | ||
|
||
rename_masks(args) |