Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stations and Hourly data availability times don't match #151

Open
tdlangland opened this issue Mar 8, 2024 · 0 comments
Open

Stations and Hourly data availability times don't match #151

tdlangland opened this issue Mar 8, 2024 · 0 comments

Comments

@tdlangland
Copy link

This repo (and project in general) have been super useful as a centralized spot to grab met data, thank you!

Wanted to raise an issue that was noticed when following a query pattern that looks like:

  1. determine nearby Stations
  2. inventory filter to sites with data during the period of interest
  3. fetch Hourly data from these stations

The 'hourly_end' field in Stations results doesn't always match the actual hourly data availability obtained via the Hourly class.

This discrepancy means we end up skipping the inventory step and request hourly data from more sites than necessary to ensure we get what is actually available.

Given the workaround, this is not a functional issue but does mean passing on more query load, which you might not want!

A minimal example that shows a discrepancy at the time of posting:

from datetime import datetime, timedelta
from meteostat import Stations, Hourly

stations = Stations()
# force latest data so an old cache isn't the problem
stations.max_age = 0

stations = stations.nearby(30.2416, -90.9827)
station = stations.fetch(1)

print(
    f"(Stations) Latest hourly data for {station['name'].iloc[0]} ({station.index[0]}): {station['hourly_end'].iloc[0]}"
)

utc_now = datetime.utcnow()
hourly = Hourly(
    loc=station.index.tolist(),
    start=utc_now - timedelta(days=3),
    end=utc_now,
    model=False,
)
hourly = hourly.fetch()

print(
    f"(Hourly) Latest hourly data for {station['name'].iloc[0]} ({station.index[0]}): {hourly.index.max()}"
)

>>> (Stations) Latest hourly data for Louisiana Regional Airport (7O9W0): 2024-03-03 00:00:00
>>> (Hourly) Latest hourly data for Louisiana Regional Airport (7O9W0): 2024-03-07 16:00:00

Manually removing the cache doesn't seem to resolve the difference either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant