Auto-Grouping for SSA #122

lucasplagwitz · 2021-12-03T10:48:03Z

Hi,

I am very interested in your efficient implementation of the Singular Spectrum Analysis.
Do you ever think of an advanced grouping for the components?
You mentioned in the SSA-exmaple that

The first subseries consists of the trend of the original time series.
The second and third subseries consist of noise.

But should the trend not rather be a zero-like signal?

And wouldn't it be nicer to show the possibility of decomposing an additive signal into three components (trend, seasonal, residual)?
Packages like JULIAs-SSA explicitly return trend and seasonal.
They group by a specific consideration of the eigenvalues, which in my opinion also disregard a few special cases.

My commit is only a short demo and tries to demonstrate the potential of the
SSA with a grouping inspired by A METHOD OF TREND EXTRACTION USING SINGULAR SPECTRUM ANALYSIS.
Also saw that the R package uses a similar approach.

Are you interested in the topic of grouping SSA-components?
Then I could add tests and generalize the process.

Best regards,
Lucas

pep8speaks · 2021-12-03T10:48:12Z

Hello @lucasplagwitz! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-12-07 15:59:29 UTC

codecov · 2021-12-03T10:53:16Z

Codecov Report

Merging #122 (b4967b9) into main (bbfef38) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #122   +/-   ##
=======================================
  Coverage   99.07%   99.07%           
=======================================
  Files          97       97           
  Lines        5699     5727   +28     
=======================================
+ Hits         5646     5674   +28     
  Misses         53       53

Impacted Files	Coverage Δ
pyts/bag_of_words/bow.py	`100.00% <ø> (ø)`
pyts/decomposition/tests/test_ssa.py	`100.00% <ø> (ø)`
pyts/decomposition/ssa.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbfef38...b4967b9. Read the comment docs.

johannfaouzi · 2021-12-06T16:32:52Z

Hi,

Sorry for the delayed response. Thank you for your pull request! I have not given singular spectrum analysis much attention because I've never seen it used for time series classification (it's available in this package just because I implemented it so I thought that it was better to make it available anyway). I'm glad that it's useful for some people.

I'm in favor of adding more advanced grouping strategies! I looked at your code and the reference that you provided and I have a few remarks:

I don't think that scipy.signal.periodogram does what is mentioned in the reference. According to Wikipedia, there are several definitions of a periodogram. I think that scipy.signal.periodogram implements the time-averaging approach (because it uses Welch's method). In the reference, each element of the periodogram is proportional to the amplitude. If I'm not mistaken, it's probably better to reimplement the periodogram.
It's more efficient to use broadcasting instead of for loops whenever possible. Moreover, the indices for the trend and the residuals are the same for all the time series and all the windows, so we can just compute them once.
The reference mentions automatic computations of w0 and c0. Do you plan on implementing them? It may be a good idea to add them as parameters of the class if one user wants to set their values.
The original example is based on independent random values from a standard normal distribution... It's probably a terrible idea and I should use another dataset.
I'm wondering if it would be better to add your example as a separate example illustrating specifically the 'auto' grouping to extract the trend, the seasonality and the residuals.
If you plan on working on this, as you mentioned, it would be nice to add some tests for the new functionality (and fixing the linting issues).

Edit: I think the code is wrong. In the reference, it is mentioned that the periodogram is computed from the eigenvectors, but X_elem[i, :, 0] does not correspond the matrix of eigenvector? The matrix of eigenvectors is v.

Edit 2: I edited the code to use the eigenvectors.

Here's a code snippet illustrating my proposed changes (at the end it should be implemented inside the class as you did, it's just a working example):

import numpy as np
from numba import njit, prange
from sklearn.utils.validation import check_array
from pyts.datasets import make_cylinder_bell_funnel
from pyts.utils.utils import _windowed_view
from scipy.signal import periodogram


# Toy dataset
X, _ = make_cylinder_bell_funnel(n_samples=6, random_state=42, shuffle=False)
X += 3 * np.sin(np.arange(X.shape[1]))

# Original code for SSA (until grouping)
window_size = 20

@njit
def _outer_dot(v, X, n_samples, window_size, n_windows):
    X_new = np.empty((n_samples, window_size, window_size, n_windows))
    for i in prange(n_samples):
        for j in prange(window_size):
            X_new[i, j] = np.dot(np.outer(v[i, :, j], v[i, :, j]), X[i])
    return X_new


@njit
def _diagonal_averaging(X, n_samples, n_timestamps, window_size,
                        n_windows, grouping_size, gap):
    X_new = np.empty((n_samples, grouping_size, n_timestamps))
    first_row = [(0, col) for col in range(n_windows)]
    last_col = [(row, n_windows - 1) for row in range(1, window_size)]
    indices = first_row + last_col
    for i in prange(n_samples):
        for group in prange(grouping_size):
            for (j, k) in indices:
                X_new[i, group, j + k] = np.diag(
                    X[i, group, :, ::-1], gap - j - k - 1
                ).mean()
    return X_new


X = check_array(X, dtype='float64')
n_samples, n_timestamps = X.shape
n_windows = n_timestamps - window_size + 1

X_window = np.transpose(
    _windowed_view(X, n_samples, n_timestamps,
                   window_size, window_step=1), axes=(0, 2, 1)
).copy()
X_tranpose = np.matmul(X_window,
                       np.transpose(X_window, axes=(0, 2, 1)))
w, v = np.linalg.eigh(X_tranpose)
w, v = w[:, ::-1], v[:, :, ::-1]

X_elem = _outer_dot(v, X_window, n_samples, window_size, n_windows)


# Grouping
w0 = 0.1
c0 = 0.85

f = np.arange(0, 1 + window_size // 2) / window_size
Pxx = np.abs(np.fft.rfft(v, axis=1, norm='ortho')) ** 2
if Pxx.shape[-1] % 2 == 0:
    Pxx[:, 1:-1, :] *= 2
else:
    Pxx[:, 1:, :] *= 2

Pxx_cumsum = np.cumsum(Pxx, axis=1)
idx_trend = np.where(f < w0)[0][-1]
idx_resid = Pxx_cumsum.shape[1] // 2

trend = Pxx_cumsum[:, idx_trend, :] / Pxx_cumsum[:, -1, :] > c0
resid = Pxx_cumsum[:, idx_resid, :] / Pxx_cumsum[:, -1, :] < c0
season = np.logical_and(~trend, ~resid)

X_groups = np.empty((n_samples, 3, window_size, n_windows))
for i in range(n_samples):
    for j, arr in enumerate((trend, season, resid)):
        X_groups[i, j] = X_elem[i, arr[i]].sum(axis=0)
        
# SSA
grouping_size = X_groups.shape[1]
if window_size >= n_windows:
    X_groups = np.transpose(X_groups, axes=(0, 1, 3, 2))
    gap = window_size
else:
    gap = n_windows

X_ssa = _diagonal_averaging(
    X_groups, n_samples, n_timestamps, window_size,
    n_windows, grouping_size, gap
)
X_ssa = np.squeeze(X_ssa)

Best,
Johann

lucasplagwitz · 2021-12-06T23:06:40Z

Thank you very much for the very helpful input. It really helps to examine things again sensibly. And your code is so much cleaner.

I have now separated out the example as a second one. I hope that my adjustments fit into your style. Otherwise, you can of course modify everything or give me feedback if you want me to change something.

Especially for analyses like I'm currently doing, the automatic components-grouping and your SSA is very useful!

johannfaouzi

Thank you very much! It's great, I just left a few minor remarks.

Also, please add your contribution in the changelog. It will be released in version 0.13.0. Please add your name at the end of the description (there are a couple of examples in version 0.10.0).

johannfaouzi · 2021-12-07T08:27:30Z

pyts/decomposition/ssa.py

 raise TypeError("'groups' must be either None, an integer "
 "or array-like.")


You need to change the sentence to add 'auto'.

johannfaouzi · 2021-12-07T08:28:50Z

pyts/decomposition/ssa.py

+ if not 0 < self.lower_frequency_bound < 1:
+ raise ValueError(
+ "'lower_frequency_bound' must be greater than 0 and "
+ "lower than 1."
+ )


lower_frequency_bound must be between 0 and 0.5 I think.

johannfaouzi · 2021-12-07T08:36:04Z

pyts/decomposition/ssa.py

- is squeezed and its shape is (n_samples, n_timestamps).
+ equal to ``groups``. If ``groups='auto'``, ``n_splits`` is equal
+ to three. If ``groups`` is array-like, ``n_splits`` is equal to
+ the length of ``groups``. If ``n_split=1``, ``X_new`` is squeezed


Missing s in n_splits (the typo was there before, but since you're changing the corresponding line, you can fix it).

johannfaouzi · 2021-12-07T08:49:20Z

pyts/decomposition/ssa.py

@@ -183,10 +224,27 @@ def _check_params(self, n_timestamps):
 "(got {0}).".format(self.window_size)
 )
 window_size = max(2, ceil(self.window_size * n_timestamps))
- if not (self.groups is None
+ if not (self.groups is None or self.groups == "auto"


It's a bit better to make sure that groups is a string before checking an equality: if self.groups is an array, I think that the current code could raise a warning because it performs the comparison element-wise. Another (not as good) solution would be to check the equality after the list / tuple / array instance check.

I would change the condition like this:

if not (self.groups is None or (isinstance(self.groups, str) and self.groups == "auto") or isinstance(self.groups, (int, list, tuple, np.ndarray))):

johannfaouzi · 2021-12-07T08:51:33Z

pyts/decomposition/ssa.py

+ lower_frequency_bound : float (default = 0.075)
+ The boundary of the periodogram to characterize trend, seasonal and
+ residual components. Ignored if 'groups' is not set to 'auto'.
+
+
+ lower_frequency_contribution : float (default = 0.85)
+ The relative threshold to characterize trend, seasonal and
+ residual components by considering the periodogram.
+ Ignored if 'groups' is not set to 'auto'.


It would be nice to add the bounds for each parameter in their description (something like It must between a and b. before the Ignored sentence).

johannfaouzi · 2021-12-07T08:57:12Z

pyts/decomposition/tests/test_ssa.py

+ "'lower_frequency_bound' must be a float."),
+
+ ({'lower_frequency_bound': 1.2}, ValueError,
+ "'lower_frequency_bound' must be greater than 0 and lower than 1."),


Upper bound needs to be changed.

lucasplagwitz · 2021-12-07T16:24:00Z

Absolutely right. Hopefully I didn't miss a line. Thanks for the review!

johannfaouzi

Perfect! Thank you very much!

add auto-grouping in SSA

67fe578

lucasplagwitz added 2 commits December 6, 2021 23:18

generalize auto-grouping for SSA

957c3c0

fix pep8 violations

e19bbe4

lucasplagwitz marked this pull request as ready for review December 6, 2021 23:07

johannfaouzi requested changes Dec 7, 2021

View reviewed changes

update changelog

b4967b9

lucasplagwitz requested a review from johannfaouzi December 7, 2021 16:24

johannfaouzi approved these changes Dec 7, 2021

View reviewed changes

johannfaouzi merged commit 0f023a6 into johannfaouzi:main Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-Grouping for SSA #122

Auto-Grouping for SSA #122

lucasplagwitz commented Dec 3, 2021

pep8speaks commented Dec 3, 2021 •

edited

Loading

codecov bot commented Dec 3, 2021 •

edited

Loading

johannfaouzi commented Dec 6, 2021 •

edited

Loading

lucasplagwitz commented Dec 6, 2021

johannfaouzi left a comment

johannfaouzi Dec 7, 2021

johannfaouzi Dec 7, 2021

johannfaouzi Dec 7, 2021

johannfaouzi Dec 7, 2021

johannfaouzi Dec 7, 2021

johannfaouzi Dec 7, 2021

lucasplagwitz commented Dec 7, 2021

johannfaouzi left a comment

		raise TypeError("'groups' must be either None, an integer "
		"or array-like.")

Auto-Grouping for SSA #122

Auto-Grouping for SSA #122

Conversation

lucasplagwitz commented Dec 3, 2021

pep8speaks commented Dec 3, 2021 • edited Loading

Comment last updated at 2021-12-07 15:59:29 UTC

codecov bot commented Dec 3, 2021 • edited Loading

Codecov Report

johannfaouzi commented Dec 6, 2021 • edited Loading

lucasplagwitz commented Dec 6, 2021

johannfaouzi left a comment

Choose a reason for hiding this comment

johannfaouzi Dec 7, 2021

Choose a reason for hiding this comment

johannfaouzi Dec 7, 2021

Choose a reason for hiding this comment

johannfaouzi Dec 7, 2021

Choose a reason for hiding this comment

johannfaouzi Dec 7, 2021

Choose a reason for hiding this comment

johannfaouzi Dec 7, 2021

Choose a reason for hiding this comment

johannfaouzi Dec 7, 2021

Choose a reason for hiding this comment

lucasplagwitz commented Dec 7, 2021

johannfaouzi left a comment

Choose a reason for hiding this comment

pep8speaks commented Dec 3, 2021 •

edited

Loading

codecov bot commented Dec 3, 2021 •

edited

Loading

johannfaouzi commented Dec 6, 2021 •

edited

Loading