Skip to content
This repository has been archived by the owner on Nov 22, 2017. It is now read-only.

.movingstd() does not seem to display correct values #177

Open
Michael-EV opened this issue Aug 31, 2016 · 6 comments
Open

.movingstd() does not seem to display correct values #177

Michael-EV opened this issue Aug 31, 2016 · 6 comments
Labels

Comments

@Michael-EV
Copy link

Michael-EV commented Aug 31, 2016

Hello!

After noticing that .movingstd() does not have a 'position' option like .movingaverage(), I decided to look at the source code to see what the default position is for .movingstd(). Ater determining that it is 'left' I decided to compare the values it spits out to the actual moving standard deviation (which I computed in R -- code is at the bottom -- it is worth nothing that my values for moving average were the same as Timelion's .movingaverage()). I noticed that my sd values were not the same as Timelion's whether I used a population or sample standard deviation. However, the graph for my moving standard deviation and Timelion's .movingstd() were roughly the same shape.

Thanks.


My Timelion query: ".quandl(WIKI/FB), .quandl(WIKI/FB).movingstd(5), .quandl(WIKI/FB).movingaverage(5)", with time interval set to '1w' and time frame set to '6 months'


Here is my code for moving standard deviation in R:

`FB_data <- c(110.05, 108.48, 111.56, 112.13, 113.75, 114.25, 110.79, 111.21, 116.82, 117.16, 120.38, 116.96, 119.56, 118.97, 117.54, 114.42, 111.01, 114.20, 116.43, 117.74, 119.90, 124.65, 124.98, 124.70, 123.60, 124.05) ## this was taken from Timelion

mov_sd <- c()
for(i in 1:(length(FB_data) - 5))
{
mov_sd <- c(mov_sd, sd(FB_data[i:(i+4)])); ## take sd of FB_data[1:5], then FB_data[2:6], etc...
}

mov_sd <- round(mov_sd, digits=2)
plot(mov_sd, type = 'l') ## graph to compare to Timelion .movingstd() -- very similar shape
`

@rashidkpc
Copy link
Contributor

@polyfractal any thoughts?

@polyfractal
Copy link
Contributor

Could you paste the numerical output of both? How different are the values? It could just be due to floating-point rounding error, and the rounding you're doing at the end. Or maybe a bug on our end :)

Related, I'm looking at the movingstd() function, it looks to be calculating the unbiased sample variance (e.g. denominator is n-1)... I'm not sure why I did it that way. This is the complete population, so it should really just be dividing by n.

Also also, we should probably just use mathjs for all the math operations, would be simpler and provides more features.

@Michael-EV
Copy link
Author

Sure!

Here are the numbers I am pulling from:

FB_data <- c(108.48, 111.56, 112.13, 113.75, 114.25, 110.79, 111.21, 116.82, 117.16, 120.38, 116.96, 119.56, 118.97, 117.54, 114.42, 111.01, 114.20, 116.43, 117.74, 119.90, 124.65, 124.98, 124.70, 123.60, 124.5, 126.85)


Here are is the .movingstd() output from Timelion:

FB_Timelion_SD <- c(5.58, 5.40, 5.46, 5.51, 5.87, 6.13, 6.31, 5.63, 5.61, 5.57, 5.60, 5.76, 6.32, 6, 5.8, 5.93, 5.79, 6.29, 6.31, 5.94, 5.58, 5.61)


And here is the output from my mov_sd function:

mov_sd <- (2.28, 1.46, 1.53, 2.46, 3.00, 4.14, 3.31, 1.67, 1.50, 1.41, 2.01, 3.56, 3.12, 2.50, 2.56, 3.41, 3.97, 3.92, 3.35, 2.12, 0.56, 1.25)

REMINDER: I am using the formula detailed in my original submissions. Counting in R vectors starts at 1, not 0.


Timelion's Moving Sd Graphed
timelion_sd

My Moving Sd Graphed
my_sd

@Michael-EV
Copy link
Author

The difference seems a bit too large to be a rounding error...maybe the data displayed on Timelion's interface is not the actual data being used in .movingstd()?

@polyfractal
Copy link
Contributor

Hm, not sure. Definitely too big of a difference to be rounding error given those numbers (e.g. not super large or super small, so no floating point trickery going on).

Busting out good ol' excel, I can confirm your R findings that the Timelion values are definitely wrong:

image

I'm not sure what's going on here, and the map/combine/chain javascript shenangins aren't my forte (would prefer if these were old fashioned loops).

I don't really have time to debug it (I'm not really involved with Timelion, this was just a one-off for a different project that I contributed). Perhaps someone else could pick it up? It's probably something silly with how I structured the slice/map/reduce stuff

@tbragin tbragin added the bug label Nov 15, 2016
@mbertani
Copy link

mbertani commented Nov 21, 2016

It seems that the issue is, as @polyfractal hinted, in the reduce function, here: movingstd.js:L38. I don't understand why it isn't working.

I haven't set up the development environment with node, in order to fully debug the issue. At the moment I'm a bit busy with work, and I still have to read all the contribution guidelines++, to submit a PR to kibana's master branch. But I have a solution to the bug with the following code: movingstd2.txt.

What I did was to use movingaverage.js as a template, and add just some few lines to the toPoint fuction:
var variance = _.chain(pairSlice).map(function (point) {
return Math.pow(point[1] - average,2);
}).reduce(function (memo, num) {
return memo + num;
}).value() / (_window - 1);
That is to say, I moved the bit where one subtracts the average to each point and squares, into the map function, and just do a collect in the reduce part. And I also added the option to choose where to place the window slice (left, center, right), since this has to fit with how one takes the moving average too.

To test the code, I just:

  • Rename movingstd2.txt to movingstd2.js, and place it in the series_functions folder in kibana
  • Delete the [/optimize/bundles/timelion.bundle.js]
  • Restart Kibana. The previous deleted file will be restored with the new movingstd2 function available in timelion.

Then one can test with the following:

  • Select in the time picker dates from Nov/11/2016 to Nov/21/2016
  • Write the following query in timelion and choose 1d as time interval:
    .quandl(WIKI/FB),.quandl(WIKI/FB).movingstd(6).yaxis(2), .quandl(WIKI/FB).movingstd2(6,left).yaxis(2)
    Then you should get the following figure:
    image

If we calculate manually what the results should have been and compare to our new mvstd2 function:
{Vector of values} sample-std mvstd2
{120.44 119.13 119.13 116.73 117.20 116.84 } 1.53269 1.53
{119.13 119.13 116.73 117.20 116.84 118.39} 1.11788 1.12
{119.13 116.73 117.20 116.84 118.39 118.39 } 0.9888 0.99
where I used wolfram alpha's calculator to calculate the sample-std values, and mvstd2 are the results from timelion, which are showed in green in the figure above. Then we see that our new function shows the correct values, and the actual mvstd from timelion is way off (in Red).

I kept the unbiased standard deviation (aka sample standard deviation) which divides by N-1 using Bessel's correction. It seems this is the correct way to calculate the moving standard deviation, since we can not use the population mean (for most cases our data is changing all the time).

If we now look at the last 3 months, we can see from the following figure that mvstd2 (in green) really follows the change in the data, but mvstd (in red) looks very flat at the beginning and doesn't really reflects the changes in the data:
image

Let me know what you think. I'll try to submit a PR on the weekend, if time allows.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants