Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial sup3r data pipeline #1

Closed
grantbuster opened this issue Jan 24, 2022 · 8 comments
Closed

Initial sup3r data pipeline #1

grantbuster opened this issue Jan 24, 2022 · 8 comments
Assignees
Labels

Comments

@grantbuster
Copy link
Member

Rough steps:

  1. Check out the WTK data on Eagle:
    a. /datasets/WIND/conus/v1.0.0/wtk_conus_2007.h5 (hourly data)
    b. /datasets/WIND/conus/v1.0.0/2007/wtk_conus_2007_2m.h5 (5min data)
  2. Start with rex to get the indices associated with a spatial raster:
    a. https://github.com/NREL/rex/blob/4edb1cd42ef13f3d93029e8a6c280bd50ae801e0/rex/resource_extraction/resource_extraction.py#L1254
  3. Check out the super training API:
    a.
    def train(self, x, y, n_batch=None, batch_size=128, n_epoch=100,
  4. Make a data pipeline/handler that delivers the high res data along with coarsened low res data to super (average spatial and sample temporal)
  5. Start with ~40k hourly timesteps with a 1000km x 1000km raster. WTK is about 2km so for the fine res dataset this would be a 500x500 raster over 40k hours (multiple years or multiple spatial locations). ~50x spatial enhancement so coarse will be averaged down to a 10x10 raster. 
    

Considerations:

  1. Windspeed and direction must be -> u and v
  2. Direction is cardinal (north) but WRF u and v are grid-orthogonal
    a. https://github.com/NREL/wtk/blob/13991a4dc57a2d06eac932be2615627893ad5769/wtk/hrrr.py#L246
  3. We should start with just u and v as “channels” but definitely code with the anticipation of new channels (more hub heights, topography, temp, pressure)
  4. Big mem constraints – consider batching training batches using py generator and not duplicating all of the data
    a. https://github.com/NREL/phygnn/blob/2028e6cae5c5bf1610e858cf656d90d68496cba1/phygnn/base.py#L367
  5. Consider that we’re going to make “base” models using the WTK h5 source datasets, but will be transfer learning on native 2D WRF NetCDF files
  6. Use float32
  7. Do spatial only for now, consider temporal dimension
  8. Multi year data handler:
    a. https://github.com/NREL/rex/blob/4edb1cd42ef13f3d93029e8a6c280bd50ae801e0/rex/multi_year_resource.py#L286

Charge code:
WFED 11556 03.01.03

Timeline goal:
Have a GAN trained using the new sup3r infrastructure and data pipeline by Feb 7th

bnb32 added a commit that referenced this issue Jan 25, 2022
@bnb32
Copy link
Collaborator

bnb32 commented Jan 25, 2022

What units does the wind toolkit give the wind direction in? wtk seems to output it in degrees but experiments with data give me numbers above 20k.

@grantbuster
Copy link
Member Author

scaled precision perhaps? Use the rex resource handlers.

@bnb32
Copy link
Collaborator

bnb32 commented Jan 26, 2022

Ah yeah that info is in ResourceX.attrs. Missed that.

@grantbuster
Copy link
Member Author

If you use the rex resource handlers they should auto-scale/unscale the data.

@bnb32
Copy link
Collaborator

bnb32 commented Jan 26, 2022

ah ok, instead of going through ResourceX use ResourceX._res and open_dataset. Lots to navigate here.

@grantbuster
Copy link
Member Author

See example of data access here:
https://nrel.github.io/rex/misc/examples.wind.html#windx-python-class

@bnb32
Copy link
Collaborator

bnb32 commented Jan 26, 2022

lol. And I'm over here trying to reinvent the wheel.

@grantbuster
Copy link
Member Author

Yeah if you think we've done it before we probably have, just ask where to find the tools :)

bnb32 added a commit that referenced this issue Feb 2, 2022
@bnb32 bnb32 closed this as completed in 9c1a5cb Feb 3, 2022
malihass added a commit that referenced this issue Nov 16, 2022
Sync up with main repo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants