Skip to content
forked from Vujas-Eteph/CiVOS

[ICIP 2022] CiVOS: Revisiting Click-Based Interactive Video Object Segmentation

License

Notifications You must be signed in to change notification settings

ronghanghu/CiVOS

 
 

Repository files navigation

CiVOS : Revisiting Click-Based Interactive Video Object Segmentation

Stephane Vujasinovic, Sebastian Bullinger, Stefan Becker, Norbert Scherer-Negenborn, Michael Arens, Rainer Stiefelhagen

ICIP 2022


📰 New Project (18/09/2023):
READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation

TL;DR: We manage the memory of STM like sVOS methods to better deal with long video. To attain long-term performance we estimate the inter-frame diversity of the base memory and integrate the embeddings of an incoming frame into the memory if it enhances the diversity. In return, we are able to limit the number of memory slots and deal with unconstrained video sequences without hindering the performance on short sequences and alleviate the need for a sampling interval.


Paper

Abstract

While current methods for interactive Video Object Segmentation (iVOS) rely on scribble-based interactions to generate precise object masks, we propose a Click-based interactive Video Object Segmentation (CiVOS) framework to simplify the required user workload as much as possible. CiVOS builds on de-coupled modules reflecting user interaction and mask propagation. The interaction module converts click-based interactions into an object mask, which is then inferred to the remaining frames by the propagation module. Additional user interactions allow for a refinement of the object mask. The approach is extensively evaluated on the popular interactive DAVIS dataset, but with an inevitable adaptation of scribble-based interactions with click-based counterparts. We consider several strategies for generating clicks during our evaluation to reflect various user inputs and adjust the DAVIS performance metric to perform a hardware-independent comparison. The presented CiVOS pipeline achieves competitive results, although requiring a lower user workload.

Architecture

[Paper] [ArXiv] [PDF]

@INPROCEEDINGS{Vujasinović_2021_ICIP,
  author={Vujasinović, Stéphane and Bullinger, Sebastian and Becker, Stefan and Scherer-Negenborn, Norbert and Arens, Michael and Stiefelhagen, Rainer},
  booktitle={2022 IEEE International Conference on Image Processing (ICIP)}, 
  title={Revisiting Click-Based Interactive Video Object Segmentation}, 
  year={2022},
  pages={2756-2760},
  doi={10.1109/ICIP46576.2022.9897460}}

Setting up the environment

  1. The framework is built with Python 3.7 and relies on the following packages:
    • NumPy 1.21.4
    • SciPy 1.7.2
    • PyTorch 1.10.0
    • torchvision 0.11.1
    • OpenCV 4.5.4 (opencv-python-headless if you don't want to use the demo)
    • Cython 0.29.24
    • scikit-learn 0.20.3
    • scikit-image 0.18.3
    • scipy 1.7.2
    • Pillow 8.4.0
    • imgaug 0.4.0
    • albumentations 1.10
    • tqdm 4.62.3
    • PyYaml 6.0
    • easydict 1.9
    • future 0.18.2
    • cffi 1.15.0
    • davis-interactive 1.0.4
    • networkx 2.6.3 for DAVIS
    • gdown 4.2.0 for downloading pretrained models
    • tensorboard 2.4.1
  2. Download the DAVIS dataset download_datasets.py
  3. Download the pretrained models download_models.py

Guide for Demo

  1. Adapt the paths and variables in Demo.yml
  2. Launch CiVOS_Demo.py (Nota bene: only 1 object can be segmented in the Demo)
  3. Mouse and keyboard bindings:
    • Positive interaction: left mouse click
    • Negative interaction: right mouse click
    • Predict a mask of the object of interest for the video sequence: space bar
    • Visualize the results with the keys x(forward direction) and y(backward direction)
    • Quit the demo with key q

How to evaluate on DAVIS

  1. Adapt the paths and variables of EXAMPLE_DEBUGGING.yml
  2. Adapt and lauch the bash file CiVOS_evaluation_script_example.sh
  3. Read .csv files results with Summarize_with_DAVIS_arbitrary_report.py

Results

Quantitative evaluation on the interactive DAVIS 2017 validation set.

Methods Training interaction Testing interaction R-AUC-J&F AUC-J&F J&F@60s
MANet Scribbles Scribbles 0.72 0.79 0.79
ATNet Scribbles Scribbles 0.75 0.80 0.80
MiVOS Scribbles Scribbles 0.81 0.87 0.88
GIS-RAmap Scribbles Scribbles 0.79 0.86 0.87
MiVOS Clicks Clicks 0.70 0.78 0.79
CiVOS Clicks Clicks 0.76 0.83 0.84

R-AUC-J&F results on the DAVIS 2017 validation set for CiVOS by generating clicks in three different ways.

Maximal Number of Clicks 1 2 3 4 5 6 7
Interaction Strategy 1 0.69 - - - - - -
Interaction Strategy 2 0.72 0.76 0.76 0.75 0.75 0.75 0.76
Interaction Strategy 3 0.74 0.77 0.78 0.78 0.78 0.78 0.78

Credits

RiTM: GitHub, Paper

MiVOS: GitHub, Paper

DeepLabV3Plus: GitHub, Paper

DAVIS-interactive: GitHub, Project

About

[ICIP 2022] CiVOS: Revisiting Click-Based Interactive Video Object Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Other 0.4%