Scaling with NaN #135

wpreimes · 2018-04-16T12:11:46Z

Hi! I'm using the linreg scaling for bias correction. Currently I get an error when the candidate and reference time series contain NaNs (at different points in time), because the regression cannot be calculated if the 2 TS don't match. But why can't we calculate the model from the coinciding values (dropna before linreg) and apply the correction to ALL values of the candidate (with nans)? Then the not-nan candidate values would be scaled and no values are dropped? I think this would be a better solution, but I guess there's also a reason why you implemented it differently?

cpaulik · 2018-04-16T18:44:07Z

If needed we could refactor the code to make this possible, but for it has never come up until now. The reason is that we normally only calculate metrics like correlation etc. on matched time series in which both time series have NaN values removed already.

What is the application of having a scaled time series including NaN values? The NaN values do not change anyway and if you need arrays of consistent size across your application you can add them back after the scaling is done.

wpreimes · 2018-04-17T11:33:13Z

It's not about keeping the nans, but keeping values in the candidate that don't have a counter part in the reference.
In my case, I scale CCI SM to MERRA2 SM. CCI starts in 1978, MERRA in 1980. So after bias correction the values before 1980 in CCI are lost (because for bias correction I have to drop all values where either series is nan so that the 2 periods match).
I guess it makes sense not to match 2 sets that have completely different temporal coverage, but in my case I think it would be justifiable. Question is if this is a exceptional case or not.

cpaulik · 2018-04-17T12:01:31Z

What you want is probably something similar to what we already provide for CDF matching. You want to calculate slope and intercept in one function and then apply it in another. See https://github.com/TUW-GEO/pytesmo/blob/master/pytesmo/scaling.py#L240 and https://github.com/TUW-GEO/pytesmo/blob/master/pytesmo/scaling.py#L266 for the CDF matching example.

Feel free to refactor the linreg function into two separate functions that are then called by linreg if you need this functionality.

wpreimes · 2018-04-17T13:51:05Z

exactly. I will try something like this, if you want I can make a PR, and then you can decide if you want to include it.

cpaulik · 2018-04-17T15:42:22Z

Please make a PR. It shouldn't change much and could be useful.

wpreimes · 2018-06-04T12:55:55Z

I made a PR. It was a very minor change after all but "fixes" the linreg scaling as I described above, if you want it in a separate function, we could do that (but I dont see a benefit yet).

wpreimes mentioned this issue Jun 19, 2018

Linear regression scaling with nans #137

Merged

cpaulik closed this as completed Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling with NaN #135

Scaling with NaN #135

wpreimes commented Apr 16, 2018

cpaulik commented Apr 16, 2018

wpreimes commented Apr 17, 2018

cpaulik commented Apr 17, 2018

wpreimes commented Apr 17, 2018

cpaulik commented Apr 17, 2018

wpreimes commented Jun 4, 2018

Scaling with NaN #135

Scaling with NaN #135

Comments

wpreimes commented Apr 16, 2018

cpaulik commented Apr 16, 2018

wpreimes commented Apr 17, 2018

cpaulik commented Apr 17, 2018

wpreimes commented Apr 17, 2018

cpaulik commented Apr 17, 2018

wpreimes commented Jun 4, 2018