Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getEuclideanDistance should depends on the state of timeseries #28

Open
assaad opened this issue Oct 3, 2018 · 4 comments
Open

getEuclideanDistance should depends on the state of timeseries #28

assaad opened this issue Oct 3, 2018 · 4 comments

Comments

@assaad
Copy link
Contributor

assaad commented Oct 3, 2018

The getEuclidean Distance gives incorrect results.
It should depends whether the timeseries is normalized or not and whether the query is normalized or not.
In case of normalization, both should be de-normalized (value * std + avg).
The current implementation normalize again !

As well, for the epsilon radius search, std needs to be taken into account
Cheers!
Assaad

@patrickzib
Copy link
Owner

patrickzib commented Oct 4, 2018

Thank you for pointing this out.

There are two cases:

  1. the query is z-normed: thus we have to z-norm all the subsequences.

  2. query is not z-normed:
    a) we can either force the query to be z-normed, which is one call to query.norm() before performing the query or
    b) we can not apply z-normalization at all, which would be a check in getDistance.

Typically, we force the query to be z-normed in this case. However, I agree that it is possible to leave out z-norm, too. The proposed pull-request however, breaks the code by removing z-norm for subsequences.

@assaad
Copy link
Contributor Author

assaad commented Oct 4, 2018

For my understanding, it's better to return the un-normalized distance measure. So a potential fix will be: if the query is normalized -> unnormalize its values, if the timeseries is normalized -> un-normalize the values -> then calculate the euclidean distance between original un-normalized versions of both timeseries data and query :)
Big thanks anyway!

@patrickzib
Copy link
Owner

Removing it, is not sooo easy. For example, SFA uses the momentary Fourier transform. The MFT applies z-normalization to each subsequence:

TimeSeries.calcIncrementalMeanStddev(this.windowSize, timeSeries.getData(), means, stds);

The same does TimeSeries.getDisjointSequences and TimeSeries.getSubsequences

So, when removing it from getDistance, you suddenly try to lower bound un-normalized subsequences with z-normalized words. This should fail.

@assaad
Copy link
Contributor Author

assaad commented Oct 4, 2018

yes i know, that's why i didn't propose a fix. I saw that in subsequence matching, each subsequence is normalized with different mean and std, so i can't un-normalized. maybe we need 2 separate methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants