WT in the paper leaks info #6

JannyKul · 2019-02-26T18:08:36Z

Hey Timothy, I added a comment on DeepLearning_Financial too about this and tried to expand here. There's no other way they get to the results they do.

Interested about your thoughts

timothyyu · 2019-03-01T01:05:42Z

I am skeptical about the results from the model as described in the paper - that is why I am attempting to replicate both the model and apply to the raw data.

The deeplearning_financial implementation of the WSAE-LSTM model isn't exactly 1:1 to how it's described in the source journal - one example is that the haar wavelet is supposed to be the WT type used, but the author uses the db4/db6 wavelet type:
https://github.com/mlpanda/DeepLearning_Financial/blob/master/models/wavelet.py#L5

timothyyu · 2019-03-01T01:09:56Z

@JannyKul by applying the wavelet transform separately on each train-validate-test split, no future information should be allowed to leak into the model:

Example applied to train-validate-test split of first csci300 index data segment:

timothyyu · 2019-03-02T07:16:35Z

see #7 & 8073c42

the way I implemented the train-validate-test split + fit scaling with RobustScaler on the train set, transforming the validate and test sets respectively (per period) avoids/sidesteps both the wavelet transform and scaling from leaking future info/data

timothyyu · 2019-03-02T23:56:05Z

issue reopened - this has not been entirely resolved yet, but I am confident I am on the right track

for each index dataset --- > split into 24 intervals [DONE]
split each of those 24 intervals into train-validate-test splits [DONE]
for each train period, scale with fit_transform, and then apply the scaling from the train set on the validate and test sets, respectively
then apply the wavelet transform on the scaled train set, and use the threshold/sigma values from the train set on the validate and test sets, respectively - by not applying the same methodology to the validate and test sets for the wavelet transform, future data may be leaked into the model

timothyyu · 2019-03-03T04:09:49Z

for each train period, scale with fit_transform, and then apply the scaling from the train set on the validate and test sets, respectively

Implemented as of v0.1.2 / b715d88
https://github.com/timothyyu/wsae-lstm/releases/tag/v0.1.2

asdfqwer2015 · 2019-03-03T11:34:59Z

Hope to see the amazing results. Please keep refining :)

timothyyu · 2019-03-04T20:42:02Z

def waveletSmooth( x, wavelet="haar", level=2, declevel=2):
    # calculate the wavelet coefficients
    coeff = pywt.wavedec( x, wavelet, mode='periodization',level=declevel,axis=0 )
    # calculate a threshold
    sigma = mad(coeff[-level])
    #print("sigma: ",sigma)
    uthresh = sigma * np.sqrt( 2*np.log( len( x ) ) )
    coeff[1:] = ( pywt.threshold( i, value=uthresh, mode="hard" ) for i in coeff[1:] )
    # reconstruct the signal using the thresholded coefficients
    y = pywt.waverec( coeff, wavelet, mode='periodization',axis=0 )
    return y,sigma,uthresh

timothyyu · 2019-03-04T20:42:33Z

timothyyu · 2019-03-04T20:49:53Z

@JannyKul to prevent/side step the issue of the wavelet transform leaking data into the rest of the model, I'm going to see if I can save the sigma and uthresh values from each train set, and then use those values on the validate and test sets, respectively.

While I'm almost certain this will lower the overall accuracy of the model, it is a technically more correct/accurate approach to prevent data from leaking.

That being said, running the wavelet transform independently on each train-validate-test split when there are clearly defined intervals for each split ----> this sets the groundwork/precedent for an online/constantly trained version of the model, where the parameters for sigma and uthresh will be adjusted on the fly between intervals - so it's not entirely technically incorrect, either.

timothyyu · 2019-03-04T21:03:18Z

EDIT:
However, if the sigma and uthreshold values don't match when reversing the denoising process, then the reconstructed signal is going to be invalid - and since there are clearly defined intervals + prediction periods, applying the transform independently should be fine, but I will save the calculated sigma and uthresh values to revisit this aspect of the model later.

JannyKul · 2019-03-13T17:12:53Z

@timothyyu I agree with your thought process but if we just take a step back for a moment and start with what we know - we can use ML to give us either a momentum signal or a mean-reversion signal. Which we construct depends on the features we extract from the time series. If we smooth out the over reaction/under-reaction movements using a simple moving average and train with this we're creating a momentum signal. If we feature engineer with volatility/range then we create a mean-reversionary signal. A WT doesn't actually help us with either; we cut off the outliers so trying to get our model to find a mean-reversionary signal will fail and the direction changes are so erratic a momentum signal will fail too. I suspect your technique of applying WT on train/val/test separately will have a model that fits v well on the train set but never generalises to the val/test set. Saving sigma and uthreshold is a good next logical step but I definitely recall the paper not making mention of this and I found a few older papers talking about applying WT and coming to ridiculously high accuracy rates too and not making mention of this either. I basically came to the conclusion that for these authors earning an accreditation to further their career was more important than advancing human knowledge. I suspect much like the Bre-X scandal of the 90's we are both looking for something that quite simply doesn't exist. I'd definitely be interested if you make any progress so please so keep me updated so I can sheepishly retract my comment. Also, I haven't gone through your other notebook re crypto order book yet but intuitively thats a technique that should work - good luck sir!

timothyyu · 2019-03-15T16:36:55Z

@JannyKul
Your comment(s)/criticism of the model design and existing attempts to replicate/implement said model are valid - they do not need retraction.

For a streaming/online model, a dynamic sigma & uthresh that is adjusted at set intervals will probably work, but for a batch fed model/static model, saving the sigma & uthresh values is a critical step that I think is essential toward evaluating this kind of model properly.

In attempting to replicate the results of the paper, I fully intend to go beyond what is described in the original paper (and existing attempts to implement said model) - if there are errors in implementation or design, I will use my best academic/empirical judgement to evaluate said error and address it.

Also, see issue #7 - it is highly relevant to this issue, specifically in the application of scaling/denoising.

timothyyu · 2019-07-01T22:28:50Z

incomplete/work in progress:
9c796bf#diff-3c3d6d5243e1476c8c1f21078c759772R39

timothyyu self-assigned this Mar 1, 2019

timothyyu closed this as completed Mar 2, 2019

timothyyu reopened this Mar 2, 2019

timothyyu mentioned this issue Mar 8, 2019

dual stage normalization and scaling #7

Open

timothyyu mentioned this issue Jun 10, 2019

"level" parameter in waveletSmooth function #9

Open

timothyyu mentioned this issue Jul 11, 2019

Denoising with wavelet transform #12

Closed

timothyyu added the priority-1 label Jul 12, 2019

timothyyu added the question Further information is requested label Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WT in the paper leaks info #6

WT in the paper leaks info #6

JannyKul commented Feb 26, 2019

timothyyu commented Mar 1, 2019 •

edited

Loading

timothyyu commented Mar 1, 2019 •

edited

Loading

timothyyu commented Mar 2, 2019

timothyyu commented Mar 2, 2019

timothyyu commented Mar 3, 2019 •

edited

Loading

asdfqwer2015 commented Mar 3, 2019

timothyyu commented Mar 4, 2019

timothyyu commented Mar 4, 2019

timothyyu commented Mar 4, 2019

timothyyu commented Mar 4, 2019 •

edited

Loading

JannyKul commented Mar 13, 2019

timothyyu commented Mar 15, 2019 •

edited

Loading

timothyyu commented Jul 1, 2019

WT in the paper leaks info #6

WT in the paper leaks info #6

Comments

JannyKul commented Feb 26, 2019

timothyyu commented Mar 1, 2019 • edited Loading

timothyyu commented Mar 1, 2019 • edited Loading

timothyyu commented Mar 2, 2019

timothyyu commented Mar 2, 2019

timothyyu commented Mar 3, 2019 • edited Loading

asdfqwer2015 commented Mar 3, 2019

timothyyu commented Mar 4, 2019

timothyyu commented Mar 4, 2019

timothyyu commented Mar 4, 2019

timothyyu commented Mar 4, 2019 • edited Loading

JannyKul commented Mar 13, 2019

timothyyu commented Mar 15, 2019 • edited Loading

timothyyu commented Jul 1, 2019

timothyyu commented Mar 1, 2019 •

edited

Loading

timothyyu commented Mar 1, 2019 •

edited

Loading

timothyyu commented Mar 3, 2019 •

edited

Loading

timothyyu commented Mar 4, 2019 •

edited

Loading

timothyyu commented Mar 15, 2019 •

edited

Loading