the way to calculate `time_means` in script get_stats.py is wrong #3

veya2ztn · 2022-08-16T06:43:47Z

Please see: https://github.com/NVlabs/FourCastNet/blob/master/data_process/get_stats.py

**time_means = np.zeros((1,21,721, 1440))**

for ii, year in enumerate(years):
    
    with h5py.File('/pscratch/sd/s/shas1693/data/era5/train/'+ str(year) + '.h5', 'r') as f:

        rnd_idx = np.random.randint(0, 1460-500)
        global_means += np.mean(f['fields'][rnd_idx:rnd_idx+500], keepdims=True, axis = (0,2,3))
        global_stds += np.var(f['fields'][rnd_idx:rnd_idx+500], keepdims=True, axis = (0,2,3))

global_means = global_means/len(years)
global_stds = np.sqrt(global_stds/len(years))
**time_means = time_means/len(years)**

the time_means is constant zero follow this script.
What is the correct defination for this value?

BTW, may I know how you calculate the time_means_daily.h5 file?
From its size (127G) I can only guess it is a $(1460,21,720,1440)$ tensor.

The text was updated successfully, but these errors were encountered:

YueZhou-oh · 2022-10-11T07:13:21Z

hey, are training and test .h5 files , eg. train/2015.h5 with simliar data shape (4D data)

phrasenmaeher · 2023-09-08T07:10:34Z

I am also wondering about that, did you find any solution so far?
In their paper they write

we use a time-averaged climatology in this work, motivated by [Rasp et al., 2020])

which is https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020MS002405 defined just above A1, so that seems to be the correct way 🤷🏼

phrasenmaeher · 2023-09-08T07:21:27Z

Digging further into this, I found in the appendix this description:

long-term-mean-subtracted value of predicted (/true) variable v at the location denoted by the grid co-ordinates (m, n) at the forecast time-step l. The long-term mean of a variable is simply the mean value of that variable over a large number of historical samples in the training dataset. The long-term mean-subtracted variables X ̃ pred/true represent the anomalies of those variables that are not captured by the long term mean values

which reads that we subtract from our variables their mean -- which we do during data loading, and the mean is correctly computed over a long term (in get_stats.py)

--
Edit: However, there's the thing that the variables are also scaled by their std_dev. so it's not only the mean that is removed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the way to calculate `time_means` in script get_stats.py is wrong #3

the way to calculate `time_means` in script get_stats.py is wrong #3

veya2ztn commented Aug 16, 2022

YueZhou-oh commented Oct 11, 2022

phrasenmaeher commented Sep 8, 2023

phrasenmaeher commented Sep 8, 2023 •

edited

the way to calculate time_means in script get_stats.py is wrong #3

the way to calculate time_means in script get_stats.py is wrong #3

Comments

veya2ztn commented Aug 16, 2022

the time_means is constant zero follow this script. What is the correct defination for this value?

YueZhou-oh commented Oct 11, 2022

phrasenmaeher commented Sep 8, 2023

phrasenmaeher commented Sep 8, 2023 • edited

the way to calculate `time_means` in script get_stats.py is wrong #3

the way to calculate `time_means` in script get_stats.py is wrong #3

the time_means is constant zero follow this script.
What is the correct defination for this value?

phrasenmaeher commented Sep 8, 2023 •

edited