Skip to content

An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection

Notifications You must be signed in to change notification settings

JacksonWzyyy/TSB-UAD

 
 

Repository files navigation

TSB-UAD: An End-to-End Anomaly Detection Benchmark Suite for Univariate Time-Series Data

TSB-UAD is a new open, end-to-end benchmark suite to ease the evaluation of univariate time-series anomaly detection methods. Overall, TSB-UAD contains 12686 time series with labeled anomalies spanning different domains with high variability of anomaly types, ratios, and sizes. Specifically, TSB-UAD includes 10 previously proposed datasets containing 900 time series from real-world data science applications. Motivated by flaws in certain datasets and evaluation strategies in the literature, we study anomaly types and data transformations to contribute two collections of datasets. Specifically, we generate 958 time series using a principled methodology for transforming 126 time-series classification datasets into time series with labeled anomalies. In addition, we present a set of data transformations with which we introduce new anomalies in the public datasets, resulting in 10828 time series (92 datasets) with varying difficulty for anomaly detection.

Publication

If you use TSB-UAD in your project or research, please cite our paper:

John Paparrizos, Yuhao Kang, Paul Boniol, Ruey S. Tsay, Themis Palpanas, and Michael J. Franklin. TSB-UAD: An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection. PVLDB, 15(8): 1697 - 1711, 2022. doi:10.14778/3529337.3529354

Datasets

Due to limitations in the upload size on GitHub, we host the datasets at a different location:

Public: http:https://chaos.cs.uchicago.edu/tsb-uad/public.zip

Synthetic: http:https://chaos.cs.uchicago.edu/tsb-uad/synthetic.zip

Artificial: http:https://chaos.cs.uchicago.edu/tsb-uad/artificial.zip

The UCR classification datasets used to generate the Artificial datasets: http:https://chaos.cs.uchicago.edu/tsb-uad/UCR2018-NEW.zip

Contributors

  • John Paparrizos (University of Chicago)
  • Yuhao Kang (University of Chicago)
  • Alex Wu (University of Chicago)
  • Teja Bogireddy (University of Chicago)

Installation

The following tools are required to install TSB-UAD from source:

  • git
  • conda (anaconda or miniconda)

Steps

  1. Clone this repository using git and change into its root directory.
git clone https://github.com/johnpaparrizos/TSB-UAD.git
cd TSB-UAD/
  1. Create and activate a conda-environment 'TSB'.
conda env create --file environment.yml
conda activate TSB
  1. Install TSB-UAD using setup.py:
python setup.py install
  1. Install the dependencies from requirements.txt:
pip install -r requirements.txt

Usage

  • test_anomaly_detectors.ipynb : The performance of 11 popular anomaly detectors.
  • test_artificialConstruction.ipynb: The synthesized dataset based on anomaly construction.
  • test_transformer.ipynb: The effects of 11 transformations.

Benchmark

In ./data contains four folders:

  • benchmark/ contains ten public datasets. Below shows some typical outliers in these ten datasets.

  • UCR2018-NEW/ contains 128 subfolders

  • artificial/ contains the data that are constructed based on UCR2018-NEW

  • synthetic/ contains the data that are synthesized by local and global tranformations

Anomaly Detector

We test eleven algorithms in the module.

Below shows a result based on Autoencoder.

For each output figure, the left panel shows the real time series with outliers (red), anomaly score obtained by each anomaly detector, and the correpsonding TP/FP/TN/FN classification.

The right panel shows the ROC curve. AUC represents the area under the ROC curve. Larger AUC indicates better performance.

About

An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 89.0%
  • Python 11.0%