Lighthouse

Lighthouse is a user-friendly library for reproducible video moment retrieval and highlight detection (MR-HD). It supports seven models, four features (video and audio features), and six datasets for reproducible MR-HD, MR, and HD. In addition, we prepare an inference API and Gradio demo for developers to use state-of-the-art MR-HD approaches easily.

News: Our demo paper is available on arXiv. Any comments are welcome: Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection

Milestones

We will release v1.0 until the end of September. Our plan includes:

: Reduce the configuration files (issue #19)
: Update the trained weights and feature files on Google Drive and Zenodo
: Introduce PyTest for inference API (issue #21)
: Introduce Linter for inference API (issue #20)

Installation

Install ffmpeg first. If you are an Ubuntu user, run:

apt install ffmpeg

Then, install pytorch, torchvision, torchaudio, and torchtext based on your GPU environments. Note that the inference API is available for CPU environments. We tested the codes on Python 3.9 and CUDA 11.7:

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 torchtext==0.15.1

Finally, run to install dependency libraries:

pip install 'git+https://github.com/line/lighthouse.git'

Inference API (Available for both CPU/GPU mode)

Lighthouse supports the following inference API:

import torch
from lighthouse.models import CGDETRPredictor

# use GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"

# slowfast_path is necesary if you use clip_slowfast features
query = 'A man is speaking in front of the camera'
model = CGDETRPredictor('results/clip_slowfast_cg_detr/qvhighlight/best.ckpt', device=device,
                        feature_name='clip_slowfast', slowfast_path='SLOWFAST_8x8_R50.pkl')

# encode video features
model.encode_video('api_example/RoripwjYFp8_60.0_210.0.mp4')

# moment retrieval & highlight detection
prediction = model.predict(query)
print(prediction)
"""
pred_relevant_windows: [[start, end, score], ...,]
pred_saliency_scores: [score, ...]

{'query': 'A man is speaking in front of the camera',
 'pred_relevant_windows': [[117.1296, 149.4698, 0.9993],
                           [-0.1683, 5.4323, 0.9631],
                           [13.3151, 23.42, 0.8129],
                           ...],
 'pred_saliency_scores': [-10.868017196655273,
                          -12.097496032714844,
                          -12.483806610107422,
                          ...]}
"""

Run python api_example/demo.py to reproduce the results. It automatically downloads pre-trained weights for CG-DETR (CLIP backbone). If you want to use other models, download pre-trained weights. When using clip_slowfast features, it is necessary to download slowfast pre-trained weights. When using clip_slowfast_pann features, in addition to the slowfast weight, download panns weights.

Limitation: The maximum video duration is 150s due to the current benchmark datasets. For CPU users, set feature_name='clip' because CLIP+Slowfast or CLIP+Slowfast+PANNs features are very slow without GPUs.

Gradio demo

Run python gradio_demo/demo.py. Upload the video and input text query, and click the blue button.

Supported models, datasets, and features

Models

Moment retrieval & highlight detection

Datasets

Moment retrieval & highlight detection

Moment retrieval

Highlight detection

Features

Reproduce the experiments

Pre-trained weights

Pre-trained weights can be downloaded from here. Download and unzip on the home directory. If you want individual weights, download from reproduced results tables.

Datasets

Due to the copyright issue, we here distribute only feature files. Download and place them under ./features directory. To extract features from videos, we use HERO_Video_Feature_Extractor.

The whole directory should be look like this:

lighthouse/
├── api_example
├── configs
├── data
├── features # Download the features and place them here
│   ├── ActivityNet
│   │   ├── clip
│   │   ├── clip_text
│   │   ├── resnet
│   │   └── slowfast
│   ├── Charades
│   │   ├── clip
│   │   ├── clip_text
│   │   ├── resnet
│   │   ├── slowfast
│   │   └── vgg
│   ├── QVHighlight
│   │   ├── clip
│   │   ├── clip_text
│   │   ├── pann
│   │   ├── resnet
│   │   └── slowfast
│   ├── tacos
│   │   ├── clip
│   │   ├── clip_text
│   │   ├── meta
│   │   ├── resnet
│   │   └── slowfast
│   ├── tvsum
│   │   ├── audio
│   │   ├── clip
│   │   ├── clip_text
│   │   ├── i3d
│   │   ├── resnet
│   │   ├── slowfast
│   │   └── tvsum_anno.json
│   └── youtube_highlight
│       ├── clip
│       ├── clip_text
│       └── slowfast
├── gradio_demo
├── images
├── lighthouse
├── results # The pre-trained weights are saved in this directory
└── training

Training and evaluation

Training

The general training command is:

PYTHONPATH=. python training/train.py --config configs/DATASET/FEATURE_MODEL_DATASET.yml

	Options
Model	moment_detr, qd_detr, eatr, cg_detr, uvcom, tr_detr, taskweave
Feature	resnet_glove, clip, clip_slowfast, clip_slowfast_pann
Dataset	qvhighlight, activitynet, charades, tacos, tvsum, youtube_highlight

For example, to train moment_detr on QVHighlights with CLIP+Slowfast features, run:

PYTHONPATH=. python training/train.py --config configs/qvhighlight/clip_slowfast_moment_detr_qvhighlight.yml

To train the models on HD datasets (i.e., TVSum and YouTube Highlight), you need to specify the domain.
For example, to train moment_detr in BK domain on TVSum, run:

PYTHONPATH=. python training/train.py --config configs/tvsum/clip_slowfast_moment_detr_tvsum.yml --domain BK

Evaluation

The evaluation command is (in this example, we evaluate QD-DETR/CLIP+Slowfast on the QVHighlight val set):

PYTHONPATH=. python training/evaluate.py --config configs/qvhighlight/clip_slowfast_qd_detr_qvhighlight.yml \ 
                                         --model_path results/clip_slowfast_qd_detr/qvhighlight/best.ckpt \
                                         --eval_split_name val \
                                         --eval_path data/qvhighlight/highlight_val_release.jsonl

To generate submission files for QVHighlight test sets, run (QVHighlights only):

PYTHONPATH=. python training/evaluate.py --config configs/qvhighlight/clip_slowfast_qd_detr_qvhighlight.yml \ 
                                         --model_path results/clip_slowfast_qd_detr/qvhighlight/best.ckpt \
                                         --eval_split_name test \
                                         --eval_path data/qvhighlight/highlight_test_release.jsonl

Then zip hl_val_submission.jsonl and hl_test_submission.jsonl, and submit it to the Codalab (QVHighlights only):

zip -r submission.zip val_submission.jsonl test_submission.jsonl

Reproduced results

See here. You can download individual checkpoints.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

LICENSE

Apache License 2.0

Contact

Taichi Nishimura ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lighthouse

Milestones

Installation

Inference API (Available for both CPU/GPU mode)

Gradio demo

Supported models, datasets, and features

Models

Datasets

Features

Reproduce the experiments

Pre-trained weights

Datasets

Training and evaluation

Training

Evaluation

Reproduced results

Contributing

LICENSE

Contact

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
api_example		api_example
configs		configs
data		data
gradio_demo		gradio_demo
images		images
lighthouse		lighthouse
markdown		markdown
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

License

line/lighthouse

Folders and files

Latest commit

History

Repository files navigation

Lighthouse

Milestones

Installation

Inference API (Available for both CPU/GPU mode)

Gradio demo

Supported models, datasets, and features

Models

Datasets

Features

Reproduce the experiments

Pre-trained weights

Datasets

Training and evaluation

Training

Evaluation

Reproduced results

Contributing

LICENSE

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages