Skip to content
/ sast Public

Scalable and Accurate Subsequence Transform

Notifications You must be signed in to change notification settings

frankl1/sast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAST: Scalable and Accurate Subsequence Transform for Time Series Classification

🚀 SAST and SASTClassifier now available in Aeon-toolkit.

SAST is a novel shapelet-based time series classification method inspired by the core object recognition capability of human brain. SAST is more accurate than STC while being more scalable.

SASTEN is an ensemble of 3 SAST models. SASTEN is more accurate than SAST and more scalable than STC.

SASTEN-A is an ensemble of 3 approximated SAST models. The approximation is done by considering only a subset of the subsequences in the dataset.

STC-k is a shapelet transform classifier which generate shapelet candidates from at most k reference time series per class. If k is a float, then k x n_c instances are used per class, where n_c is the total number of instances in class c.

Updated version with more results

Results

SAST vs SASTEN

Pairwise accuracy comparison

Critical difference diagram

SAST-models CDD

STC-k vs STC

Pairwise accuracy comparison

STC vs STC-1 STC vs STC-0.25
STC vs STC-0.5 STC vs STC-0.75

Critical difference diagram

SCT vs STC-k CDD

SAST vs STC

SAST vs STC-1 SAST vs STC-1

Critical difference diagram

CDD SAST vs STC

Percentage of wins per problem type

win-per-dataset-type-stck

SAST vs others shapelets methods

Pairwise accuracy comparison

SAST vs ELIS++ SAST vs LS

SAST vs FS

Critical difference diagram

SAST vs other shapelets CDD

Percentage of wins per problem types

win-per-dataset-type-shapelet

SAST vs SOTA

Pairwise accuracy comparison

scatter-sast-vs-rocket scatter-sast-ridge-vs-hive-cote

scatter-sast-ridge-vs-chief

Percentage of wins per problem type

win-per-dataset-type-sota

Scalability plots

  • Regarding the length of time series

  • Regarding the number of time series in the dataset

Usage

import numpy as np
from sast.utils import *
from sast.sast import *
from sklearn.linear_model import RidgeClassifierCV

clf = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10))
sast_ridge = SAST(cand_length_list=np.arange(min_shp_length, max_shp_length+1),
		          nb_inst_per_class=nb_inst_per_class, 
		          random_state=None, classifier=clf)

sast_ridge.fit(X_train, y_train)

prediction = sast_ridge.predict(X_test)

Dependencies

  • numpy == 1.18.5
  • numba == 0.50.1
  • scikit-learn == 0.23.1
  • sktime == 0.5.3