Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN discrepancy between older versions (<=1.5.11) and newer versions (>=1.6.0) #145

Closed
dcervenkov opened this issue Nov 15, 2023 · 2 comments

Comments

@dcervenkov
Copy link

I get different datasets from the same request in older and newer versions of meteostat. I narrowed it down to the jump from 1.5.11 to 1.6.0.

To make the comparison as apples-to-apples as possible, I'm using pandas==2.0.3 which works with both meteostat==1.5.11 and meteostat==1.6.0.

Steps to reproduce

meteostat 1.5.11

python3 -m venv venv_meteostat15
./venv_meteostat15/bin/pip install pandas==2.0.3 meteostat==1.5.11
./venv_meteostat15/bin/python -c "from meteostat import Hourly; import datetime; print(Hourly(loc='11520', start=datetime.datetime(2021, 1, 1, 0, 0, 0), end=datetime.datetime(2022, 11, 3, 23, 0, 0), timezone='Europe/Prague').normalize().fetch().info())"

Result

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 16128 entries, 2021-01-01 00:00:00+01:00 to 2022-11-03 23:00:00+01:00
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   temp    16128 non-null  float64
 1   dwpt    16128 non-null  float64
 2   rhum    16128 non-null  float64
 3   prcp    12628 non-null  float64
 4   snow    0 non-null      float64
 5   wdir    16128 non-null  float64
 6   wspd    16128 non-null  float64
 7   wpgt    15865 non-null  float64
 8   pres    16128 non-null  float64
 9   tsun    0 non-null      float64
 10  coco    15888 non-null  float64
dtypes: float64(11)
memory usage: 1.5 MB
None

meteostat 1.6.0

python3 -m venv venv_meteostat16
./venv_meteostat16/bin/pip install pandas==2.0.3 meteostat==1.6.0
./venv_meteostat16/bin/python -c "from meteostat import Hourly; import datetime; print(Hourly(loc='11520', start=datetime.datetime(2021, 1, 1, 0, 0, 0), end=datetime.datetime(2022, 11, 3, 23, 0, 0), timezone='Europe/Prague').normalize().fetch().info())"

Result

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 16128 entries, 2021-01-01 00:00:00+01:00 to 2022-11-03 23:00:00+01:00
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   temp    16128 non-null  float64
 1   dwpt    16128 non-null  float64
 2   rhum    16128 non-null  float64
 3   prcp    3867 non-null   float64
 4   snow    0 non-null      float64
 5   wdir    16128 non-null  float64
 6   wspd    16128 non-null  float64
 7   wpgt    15865 non-null  float64
 8   pres    16128 non-null  float64
 9   tsun    0 non-null      float64
 10  coco    15888 non-null  float64
dtypes: float64(11)
memory usage: 1.5 MB
None

Notice the prcp column has 12628 non-null values in 1.5.11 but only 3867 non-null values in 1.6.0!

@clampr
Copy link
Member

clampr commented Nov 18, 2023

Thank you for reaching out @dcervenkov,

I looked into this and I can confirm that you should use >= 1.6.0 for the correct data. Version 1.5.7 uses an outdated endpoint which is still available so we don't break existing installations. However, the data you're receiving from this endpoint was removed due to a bug in DWD MOSMIX data.

@clampr clampr closed this as completed Nov 18, 2023
@dcervenkov
Copy link
Author

Thanks for looking into this and explaining the situation, @clampr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants