-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-Grouping for SSA #122
Conversation
Hello @lucasplagwitz! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-12-07 15:59:29 UTC |
Codecov Report
@@ Coverage Diff @@
## main #122 +/- ##
=======================================
Coverage 99.07% 99.07%
=======================================
Files 97 97
Lines 5699 5727 +28
=======================================
+ Hits 5646 5674 +28
Misses 53 53
Continue to review full report at Codecov.
|
Hi, Sorry for the delayed response. Thank you for your pull request! I have not given singular spectrum analysis much attention because I've never seen it used for time series classification (it's available in this package just because I implemented it so I thought that it was better to make it available anyway). I'm glad that it's useful for some people. I'm in favor of adding more advanced grouping strategies! I looked at your code and the reference that you provided and I have a few remarks:
Edit: I think the code is wrong. In the reference, it is mentioned that the periodogram is computed from the eigenvectors, but Edit 2: I edited the code to use the eigenvectors. Here's a code snippet illustrating my proposed changes (at the end it should be implemented inside the class as you did, it's just a working example): import numpy as np
from numba import njit, prange
from sklearn.utils.validation import check_array
from pyts.datasets import make_cylinder_bell_funnel
from pyts.utils.utils import _windowed_view
from scipy.signal import periodogram
# Toy dataset
X, _ = make_cylinder_bell_funnel(n_samples=6, random_state=42, shuffle=False)
X += 3 * np.sin(np.arange(X.shape[1]))
# Original code for SSA (until grouping)
window_size = 20
@njit
def _outer_dot(v, X, n_samples, window_size, n_windows):
X_new = np.empty((n_samples, window_size, window_size, n_windows))
for i in prange(n_samples):
for j in prange(window_size):
X_new[i, j] = np.dot(np.outer(v[i, :, j], v[i, :, j]), X[i])
return X_new
@njit
def _diagonal_averaging(X, n_samples, n_timestamps, window_size,
n_windows, grouping_size, gap):
X_new = np.empty((n_samples, grouping_size, n_timestamps))
first_row = [(0, col) for col in range(n_windows)]
last_col = [(row, n_windows - 1) for row in range(1, window_size)]
indices = first_row + last_col
for i in prange(n_samples):
for group in prange(grouping_size):
for (j, k) in indices:
X_new[i, group, j + k] = np.diag(
X[i, group, :, ::-1], gap - j - k - 1
).mean()
return X_new
X = check_array(X, dtype='float64')
n_samples, n_timestamps = X.shape
n_windows = n_timestamps - window_size + 1
X_window = np.transpose(
_windowed_view(X, n_samples, n_timestamps,
window_size, window_step=1), axes=(0, 2, 1)
).copy()
X_tranpose = np.matmul(X_window,
np.transpose(X_window, axes=(0, 2, 1)))
w, v = np.linalg.eigh(X_tranpose)
w, v = w[:, ::-1], v[:, :, ::-1]
X_elem = _outer_dot(v, X_window, n_samples, window_size, n_windows)
# Grouping
w0 = 0.1
c0 = 0.85
f = np.arange(0, 1 + window_size // 2) / window_size
Pxx = np.abs(np.fft.rfft(v, axis=1, norm='ortho')) ** 2
if Pxx.shape[-1] % 2 == 0:
Pxx[:, 1:-1, :] *= 2
else:
Pxx[:, 1:, :] *= 2
Pxx_cumsum = np.cumsum(Pxx, axis=1)
idx_trend = np.where(f < w0)[0][-1]
idx_resid = Pxx_cumsum.shape[1] // 2
trend = Pxx_cumsum[:, idx_trend, :] / Pxx_cumsum[:, -1, :] > c0
resid = Pxx_cumsum[:, idx_resid, :] / Pxx_cumsum[:, -1, :] < c0
season = np.logical_and(~trend, ~resid)
X_groups = np.empty((n_samples, 3, window_size, n_windows))
for i in range(n_samples):
for j, arr in enumerate((trend, season, resid)):
X_groups[i, j] = X_elem[i, arr[i]].sum(axis=0)
# SSA
grouping_size = X_groups.shape[1]
if window_size >= n_windows:
X_groups = np.transpose(X_groups, axes=(0, 1, 3, 2))
gap = window_size
else:
gap = n_windows
X_ssa = _diagonal_averaging(
X_groups, n_samples, n_timestamps, window_size,
n_windows, grouping_size, gap
)
X_ssa = np.squeeze(X_ssa) Best, |
Thank you very much for the very helpful input. It really helps to examine things again sensibly. And your code is so much cleaner. I have now separated out the example as a second one. I hope that my adjustments fit into your style. Otherwise, you can of course modify everything or give me feedback if you want me to change something. Especially for analyses like I'm currently doing, the automatic components-grouping and your SSA is very useful! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! It's great, I just left a few minor remarks.
Also, please add your contribution in the changelog. It will be released in version 0.13.0
. Please add your name at the end of the description (there are a couple of examples in version 0.10.0
).
pyts/decomposition/ssa.py
Outdated
raise TypeError("'groups' must be either None, an integer " | ||
"or array-like.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to change the sentence to add 'auto'.
pyts/decomposition/ssa.py
Outdated
if not 0 < self.lower_frequency_bound < 1: | ||
raise ValueError( | ||
"'lower_frequency_bound' must be greater than 0 and " | ||
"lower than 1." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower_frequency_bound must be between 0 and 0.5 I think.
pyts/decomposition/ssa.py
Outdated
is squeezed and its shape is (n_samples, n_timestamps). | ||
equal to ``groups``. If ``groups='auto'``, ``n_splits`` is equal | ||
to three. If ``groups`` is array-like, ``n_splits`` is equal to | ||
the length of ``groups``. If ``n_split=1``, ``X_new`` is squeezed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing s in n_splits
(the typo was there before, but since you're changing the corresponding line, you can fix it).
pyts/decomposition/ssa.py
Outdated
@@ -183,10 +224,27 @@ def _check_params(self, n_timestamps): | |||
"(got {0}).".format(self.window_size) | |||
) | |||
window_size = max(2, ceil(self.window_size * n_timestamps)) | |||
if not (self.groups is None | |||
if not (self.groups is None or self.groups == "auto" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit better to make sure that groups is a string before checking an equality: if self.groups is an array, I think that the current code could raise a warning because it performs the comparison element-wise. Another (not as good) solution would be to check the equality after the list / tuple / array instance check.
I would change the condition like this:
if not (self.groups is None
or (isinstance(self.groups, str) and self.groups == "auto")
or isinstance(self.groups, (int, list, tuple, np.ndarray))):
pyts/decomposition/ssa.py
Outdated
lower_frequency_bound : float (default = 0.075) | ||
The boundary of the periodogram to characterize trend, seasonal and | ||
residual components. Ignored if 'groups' is not set to 'auto'. | ||
|
||
|
||
lower_frequency_contribution : float (default = 0.85) | ||
The relative threshold to characterize trend, seasonal and | ||
residual components by considering the periodogram. | ||
Ignored if 'groups' is not set to 'auto'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add the bounds for each parameter in their description (something like It must between a and b.
before the Ignored sentence).
pyts/decomposition/tests/test_ssa.py
Outdated
"'lower_frequency_bound' must be a float."), | ||
|
||
({'lower_frequency_bound': 1.2}, ValueError, | ||
"'lower_frequency_bound' must be greater than 0 and lower than 1."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upper bound needs to be changed.
Absolutely right. Hopefully I didn't miss a line. Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thank you very much!
Hi,
I am very interested in your efficient implementation of the Singular Spectrum Analysis.
Do you ever think of an advanced grouping for the components?
You mentioned in the SSA-exmaple that
The first subseries consists of the trend of the original time series.
The second and third subseries consist of noise.
But should the trend not rather be a zero-like signal?
And wouldn't it be nicer to show the possibility of decomposing an additive signal into three components (trend, seasonal, residual)?
Packages like JULIAs-SSA explicitly return trend and seasonal.
They group by a specific consideration of the eigenvalues, which in my opinion also disregard a few special cases.
My commit is only a short demo and tries to demonstrate the potential of the
SSA with a grouping inspired by A METHOD OF TREND EXTRACTION USING SINGULAR SPECTRUM ANALYSIS.
Also saw that the R package uses a similar approach.
Are you interested in the topic of grouping SSA-components?
Then I could add tests and generalize the process.
Best regards,
Lucas