Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling with NaN #135

Closed
wpreimes opened this issue Apr 16, 2018 · 6 comments
Closed

Scaling with NaN #135

wpreimes opened this issue Apr 16, 2018 · 6 comments

Comments

@wpreimes
Copy link
Member

Hi! I'm using the linreg scaling for bias correction. Currently I get an error when the candidate and reference time series contain NaNs (at different points in time), because the regression cannot be calculated if the 2 TS don't match. But why can't we calculate the model from the coinciding values (dropna before linreg) and apply the correction to ALL values of the candidate (with nans)? Then the not-nan candidate values would be scaled and no values are dropped? I think this would be a better solution, but I guess there's also a reason why you implemented it differently?

@cpaulik
Copy link
Collaborator

cpaulik commented Apr 16, 2018

If needed we could refactor the code to make this possible, but for it has never come up until now. The reason is that we normally only calculate metrics like correlation etc. on matched time series in which both time series have NaN values removed already.

What is the application of having a scaled time series including NaN values? The NaN values do not change anyway and if you need arrays of consistent size across your application you can add them back after the scaling is done.

@wpreimes
Copy link
Member Author

It's not about keeping the nans, but keeping values in the candidate that don't have a counter part in the reference.
In my case, I scale CCI SM to MERRA2 SM. CCI starts in 1978, MERRA in 1980. So after bias correction the values before 1980 in CCI are lost (because for bias correction I have to drop all values where either series is nan so that the 2 periods match).
I guess it makes sense not to match 2 sets that have completely different temporal coverage, but in my case I think it would be justifiable. Question is if this is a exceptional case or not.

@cpaulik
Copy link
Collaborator

cpaulik commented Apr 17, 2018

What you want is probably something similar to what we already provide for CDF matching. You want to calculate slope and intercept in one function and then apply it in another. See https://github.com/TUW-GEO/pytesmo/blob/master/pytesmo/scaling.py#L240 and https://github.com/TUW-GEO/pytesmo/blob/master/pytesmo/scaling.py#L266 for the CDF matching example.

Feel free to refactor the linreg function into two separate functions that are then called by linreg if you need this functionality.

@wpreimes
Copy link
Member Author

exactly. I will try something like this, if you want I can make a PR, and then you can decide if you want to include it.

@cpaulik
Copy link
Collaborator

cpaulik commented Apr 17, 2018

Please make a PR. It shouldn't change much and could be useful.

@wpreimes
Copy link
Member Author

wpreimes commented Jun 4, 2018

I made a PR. It was a very minor change after all but "fixes" the linreg scaling as I described above, if you want it in a separate function, we could do that (but I dont see a benefit yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants