Skip to content

Multiple Pairwise Ranking Networks for Personalized Video Summarization

Notifications You must be signed in to change notification settings



Repository files navigation

This repository provides PyTorch implementations for Multi-ranker paper published in ICCV 2021.

This code is based on DR-DSN and VASNet implementations.

Pairwise Ranking Model for Global Summarization (Standard ranker)

Standard ranker learns a ranking function that associates high ranking scores to important video segments so that a summary can be built by selecting the top-ranked segments.

Multiple Pairwise Ranking Model for Personalized Summarization (Multi-ranker)

Given the number of preferences , Multi-ranker learns a set of sub-rankers that are jointly trained so the local summaries conform with the preferences and the global summary max-aggregates the sub-rankers' scores.


TVSum dataset is a collection of 50 YouTube videos grouped into 10 categories. Each video is split into a set of 2 second-long shots. 20 users are asked to rate how important each shot is, compared to other shots from the same video in order to build 20 reference summaries. The GT summary for each video is defined as the mean of the corresponding 20 reference summaries.

SumMe dataset is constituted of 25 videos containing a variety of events. For each video, 15 to 18 reference interval-based keyshot summaries were associated. These summaries are converted to frame-level reference summaries by marking the frames contained in the keyshots with score 1 and frames not contained in the keyshots by score 0. Afterwards, the GT summary associated with each video is defined as the mean of 15 to 18 reference summaries.

FineGym is a fine-grained action recognition dataset that provides action level temporal annotations for 156 YouTube gymnasium videos. Since the videos are of long duration, we only used 50 sampled videos for experiments purpose and listed their ID in the Supplemental Material. In this case, we do not have reference summaries instead, we define one reference summary and the GT summary for each video by marking the frames contained in the action keyshots with score 1 and frames not contained in the keyshots by score 0.

Datasets Preprocessing

We opt for the following dataset preprocessing to obtain the videos' segment features:

  • 3D ResNet pretrained on Kinetics with features of 2048 dimensions where each feature represents a segment of 16 frames (mainly used for FineGym dataset).
  • The baseline features provided by VASNet with features of 1024 dimensions where each feature represents a segment of 15 frames (mainly used for SumMe and TVsum datasets).

This code relies on the baseline features with the corresponding files eccv16_dataset_tvsum_google_pool5.h5 and eccv16_dataset_summe_google_pool5.h5 for TVSum and SumMe datasets respectively. However, these files are designed for the classical summarization pipeline (importance score estimation + KTS segmentation + segments selection) while the summarization pipeline in this work consists only of importance score estimation. We suggest the updated baseline with altered files to iccv21_dataset_tvsum_google_pool5.h5 and iccv21_dataset_summe_google_pool5.h5 by deleting irrelevant keys and modifying the key user_summary to correspond to the original users/annotators reference summaries in TVSum and SumMe.


  • Python 3.6.9
  • Pytorch 1.3.1
  • Torchvision 0.4.2

Getting Started


  • Install PyTorch and Torchvision from or through the following pip/conda commands:
pip install torch==1.3.1+cu92 torchvision==0.4.2+cu92 -f
conda install pytorch==1.3.1 torchvision==0.4.2 cudatoolkit=9.2 -c pytorch
  • Clone this repository.
git clone
cd Multi-ranker

Standard ranker training

  • Run to generate segment-level pairwise comparisons per each video using the updated baseline files of each dataset iccv21_dataset_tvsum_google_pool5.h5 and iccv21_dataset_summe_google_pool5.h5 placed in dataset folder.

  • Run dataset/ to generate a json file that contains the dataset splits and the training, validation, and test sets according to the experimental protocol.

  • Run to launch the training of Standard ranker per each split for a selected validation/test option. Or simply set your parameters and run like the following:

python3 --epoch=1 --batch_size=128 --dataset=tvsum --mode=training --model_name=ranker_b128_p2_s0_v4 --pairset=./pairset/tvsum/pairs_2k.npy --split=0 --validation=4
python3 --epoch=1 --batch_size=128 --dataset=summe --mode=training --model_name=ranker_b128_p2_s0_v4 --pairset=./pairset/summe/pairs_2k.npy --split=0 --validation=4

Standard ranker evaluation

  • Run to aggregate the evaluations of the trained Standard rankers across the dataset splits for a selected validation/test option.
python3 --save_dir=models/tvsum --metric=kendall
python3 --save_dir=models/tvsum --metric=spearman
Standard ranker kendall tau validation-test: [0.17562000/0.02417100]
Human kendall tau validation-test: [0.17551309/0.02265591]
Standard ranker spearman rho validation-test: [0.23012279/0.03191806]
Human spearman rho validation-test: [0.20185220/0.02602781]
python3 --save_dir=models/summe --metric=kendall
python3 --save_dir=models/summe --metric=spearman
Standard ranker kendall tau validation-test: [0.00682317/0.04831031]
Human kendall tau validation-test: [0.17960041/0.01065329]
Standard ranker spearman rho validation-test: [0.00865731/0.05963134]
Human spearman rho validation-test: [0.18633639/0.01112780]

Multi-ranker training

  • Run dataset/clustering/ to sample segment features from the dataset for k-means clustering.

  • Run dataset/clustering/ to generate GT and reference summaries per each cluster/preference.

  • Run to generate segment-level pairwise comparisons per each video with respect to each preference.

  • Run to launch the training of Multi-ranker per each split for a selected validation/test option. Or simply set your parameters and run like the following:

python3 --epoch=1 --batch_size=128 --dataset=tvsum --mode=training --model_name=multi_ranker_pr4_l0.5_b128_p2_s0_v4 --pairset_multi=./pairset/tvsum/pairs_multi_2k_4.npy --pairset=./pairset/tvsum/pairs_2k.npy --users=dataset/clustering/preferences_tvsum_4.npy --multi=True --split=0 --validation=4 --preference=4 --lbda=0.5
python3 --epoch=1 --batch_size=128 --dataset=summe --mode=training --model_name=multi_ranker_pr4_l0.5_b128_p2_s0_v4 --pairset_multi=./pairset/summe/pairs_multi_2k_4.npy --pairset=./pairset/summe/pairs_2k.npy --users=dataset/clustering/preferences_summe_4.npy --multi=True --split=0 --validation=4 --preference=4 --lbda=0.5

Multi-ranker evaluation

  • Run to aggregate the evaluations of the trained Multi-ranker across the dataset splits for selected validation/test, number of pairs, batch size, lambda and preference options.
python3 --save_dir=models/tvsum --metric=kendall
Kendall tau global validation-test: [0.16549421/0.03240645]
Kendall tau local validation-test: [0.37415068/0.06964904]
Global human kendall tau validation-test: [0.17551309/0.02265591]
Local human kendall tau validation-test: [0.87186231/0.01099596]
python3 --save_dir=models/summe --metric=kendall
Kendall tau global validation-test: [0.00070850/0.04999284]
Kendall tau local validation-test: [0.00175680/0.04216613]
Global human kendall tau validation-test: [0.17960041/0.01065329]
Local human kendall tau validation-test: [0.26068578/0.01967284]
  • To aggregate the local evaluations of the trained Multi-ranker models across the dataset splits with respect to each preference options, run with --mode=local_preference, and then run as follows:
python3 --save_dir=models/tvsum --metric=kendall
  • To aggregate the personalized evaluations of the trained Multi-ranker models across the dataset splits with respect to each combination of preference set, run with --mode=comb_preference, and then run as follows:
python3 --save_dir=models/tvsum --metric=kendall


If you use this code for your research, please cite our paper.

title = {Multiple Pairwise Ranking Networks for Personalized Video Summarization},
author = {Yassir Saquil and Da Chen and Yuan He and Chuan Li and Yongliang Yang},
year = {2021},
booktitle = {ICCV},


  • You can find our ICCV 2021 poster here.


Multiple Pairwise Ranking Networks for Personalized Video Summarization






No releases published


No packages published


  • Python 100.0%