Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud regime analysis #251

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

Isaaciwd
Copy link

Add a cloud regime analysis script to the ADF.

@nusbaume nusbaume self-requested a review August 13, 2023 02:50
@nusbaume nusbaume added plotting Related to plot generation analysis Related to data analysis and statistics labels Aug 13, 2023
Copy link
Collaborator

@nusbaume nusbaume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies @Isaaciwd and @brianpm for taking almost six months longer then I originally promised, but I finally got around to reviewing this PR. It's definitely some great work!

I have a fairly large amount of change requests, but hopefully most (all?) of them should be easy to resolve. Of course please let me know if you have any questions or concerns with any of my requests or suggestions, and thanks again for the effort!


MODIS_emd_centers:
category: "Clouds"
obs_file: 'MODIS_emd-means_n_init5_centers_1.np'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing an extra y at the end:

Suggested change
obs_file: 'MODIS_emd-means_n_init5_centers_1.np'
obs_file: 'MODIS_emd-means_n_init5_centers_1.npy'

import xarray as xr
import matplotlib as mpl
from mpl_toolkits.axes_grid1 import make_axes_locatable
from numba import njit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is numba required for this script to run? If so then we should probably add it to the ADF-provided env/conda_environment.yaml file so that this script can be used with that environment as well.

import os


#global num_iter, n_samples, data, ds, ht_var_name, tau_var_name, k, height_or_pressure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line still needed? If not then I would go ahead and remove it.

def cloud_regime_analysis(adf, wasserstein_or_euclidean = "euclidean", data_product='all', premade_cloud_regimes=None, lat_range=None, lon_range=None, only_ocean_or_land=False):
"""
This script/function is designed to generate 2-D lat/lon maps of Cloud Regimes (CRs), as well as plots of the CR
centers themselves. It can fit data into CRs using either Wassertstein (AKA Earth Movers Distance) or the more conventional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here:

Suggested change
centers themselves. It can fit data into CRs using either Wassertstein (AKA Earth Movers Distance) or the more conventional
centers themselves. It can fit data into CRs using either Wasserstein (AKA Earth Movers Distance) or the more conventional

import glob
from math import ceil
import time
import dask
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dask is not currently included in the ADF-provided env/conda_environment.yaml file. I would recommend adding it in this PR unless the use of dask is optional for this script.

# Opening an initial dataset
init_ds_b = xr.open_dataset(files[0])

print(f' -Starting {data} CAM baseline data') #testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove #testing comment?

# Variable that gets set to true if var is missing in the data file, and is used to skip processing that dataset
missing_var = False

# Trying to open time series files from cam)ts_loc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here:

Suggested change
# Trying to open time series files from cam)ts_loc
# Trying to open time series files from cam_ts_loc

# Masking out the land or water
if only_ocean_or_land == 'L': ds_b = ds_b.where(land == 1)
elif only_ocean_or_land == 'O': ds_b = ds_b.where(land == 0)
else: raise Exception('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to kill the ADF here:

Suggested change
else: raise Exception('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water')
else:
print('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water')
return

weights_b=weights_b[valid_indicies_b]

if np.min(mat_b < 0):
raise Exception (f'Found negative value in ds_b.{var_name}, if this is a fill value for missing data, convert to nans and try again')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.