Skip to content

Generating Time Series Prototypes with ShapeDTW Barycenter Averaging

License

Notifications You must be signed in to change notification settings

MSD-IRIMAS/ShapeDBA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging

This is now available in aeon-toolkit !

Simply run the following:

from aeon.datasets import load_classification
from aeon.clustering.averaging import elastic_barycenter_average

X, y = load_classification(name="Coffee")
average_class_0 = elastic_barycenter_average(X[y == 0], distance="shape_dtw", reach=15)

This repository contains the code of our paper "ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging" accepted at 8th Workshop on Advanced Analytics and Learning on Temporal Data (AALTD 2023) in conjunction with the 2023 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

This work was done by Ali Ismail-Fawaz, Hassan Ismail Fawaz, Fran ̧cois Petitjean, Maxime Devanne, Jonathan Weber, Stefano Berretti, Geoffery I. Webb and Germain Forestier.

Summary figure

summary

Usage of code

Before doing anything, run the following command in root directory to build the necessary cython components of the Dynamic Time Warping (DTW) algorithm and its variants.

./utils/build-cython.sh

In order to use the code, first step is to adapt it to your own machine as follows:

  1. You should download the datasets of the UCR Archive
  2. The directory containing the dataset of the UCR Archive should be put in the variable root_dir_dataset_archive in this line of the main.py file
  3. Specify the root_dir directory where the results will be stored in this line of the main.py file

Two options can be chosen when running the main.py file:

  1. Visualize the resulted average per class of a dataset using the following command:
    python3 main.py visualize_average <dataset_name> <archive_name> <averaging_method>
    An example would be: python3 main.py visualize_average Coffee UCRArchie_2018 shapeDBA
    The <archive_name should be the same name as the directory containing the datasets, for instance the directory of the dataset should be root_dir_dataset_archive + '<archive_name>/Coffee'
    The choices for the <averaging_method> are: mean, DBA, softDBA and ShapeDBA
  2. Generate the clustering results of the paper. First in this line of the constants.py file, you can choose which datasets to use in the study by edditing the UNIVARIATE_DATASET_NAMES_2018 list variable.
    Then you can produce the results by running the following command: python3 main.py data_clustering

Results

We compared the usage of Kmeans with Euclidean Distance, DBA, softDBA and ShapeDBA as well as the Kshape algorithm following the ARI metric and the running time.

We present in what follows both the Multi-Comparison Matric (MCM) and the Critical Difference Diagram (CDD) of both studies.

ARI

summary

summary

Computational Runtime

summary

summary

Requirements

numpy==1.24.3
tslearn==0.5.3.2
matplotlib==3.7.1
cython==0.29.34
pandas==2.0.1
sklearn==1.2.2
scipy==1.10.1

Citation

If you use this work please cite the following corresponding paper:

@inproceedings{ismail-fawaz2023shapedba,
  author = {Ismail-Fawaz, Ali and Ismail Fawaz, Hassan and Petitjean, François and Devanne, Maxime and Weber, Jonathan and Berretti, Stefano and Webb, Geoffrey I and Forestier, Germain},
  title = {ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging},
  booktitle = {ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data},
  city = {Turin},
  country = {Italy},
  year = {2023},
}

Acknowledgments

This work was supported by the ANR DELEGATION project (grant ANR-21-CE23-0014) of the French Agence Nationale de la Recherche. The authors would like to acknowledge the High Performance Computing Center of the University of Strasbourg for supporting this work by providing scientific support and access to computing resources. Part of the computing resources were funded by the Equipex Equip@Meso project (Programme Investissements d’Avenir) and the CPER Alsacalcul/Big Data. The authors would also like to thank the creators and providers of the UCR Archive.