New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud regime analysis #251

Open

Isaaciwd wants to merge 2 commits into NCAR:main from Isaaciwd:cloud_regime_analysis

Isaaciwd commented Jul 25, 2023

Add a cloud regime analysis script to the ADF.

Isaaciwd added 2 commits

July 25, 2023 15:38


Add cloud_regime_analysis.py and add COSP variable to adf_variable_de…

ffe1723

…faults.yaml


Minor comment changes

4de6ee9

nusbaume self-requested a review

August 13, 2023 02:50

nusbaume added plotting analysis labels

nusbaume requested changes

View reviewed changes

Collaborator

nusbaume left a comment

Apologies @Isaaciwd and @brianpm for taking almost six months longer then I originally promised, but I finally got around to reviewing this PR. It's definitely some great work!

I have a fairly large amount of change requests, but hopefully most (all?) of them should be easy to resolve. Of course please let me know if you have any questions or concerns with any of my requests or suggestions, and thanks again for the effort!

lib/adf_variable_defaults.yaml

+MODIS_emd_centers:
+ category: "Clouds"
+ obs_file: 'MODIS_emd-means_n_init5_centers_1.np'

Collaborator

nusbaume Dec 26, 2023

I think this is missing an extra y at the end:

Suggested change

 obs_file: 'MODIS_emd-means_n_init5_centers_1.np'

 obs_file: 'MODIS_emd-means_n_init5_centers_1.npy'

scripts/plotting/cloud_regime_analysis.py

+import xarray as xr
+import matplotlib as mpl
+from mpl_toolkits.axes_grid1 import make_axes_locatable
+from numba import njit

Collaborator

nusbaume Dec 28, 2023

Is numba required for this script to run? If so then we should probably add it to the ADF-provided env/conda_environment.yaml file so that this script can be used with that environment as well.

scripts/plotting/cloud_regime_analysis.py

		import os


		#global num_iter, n_samples, data, ds, ht_var_name, tau_var_name, k, height_or_pressure

Collaborator

nusbaume Dec 28, 2023

Is this line still needed? If not then I would go ahead and remove it.

scripts/plotting/cloud_regime_analysis.py

+def cloud_regime_analysis(adf, wasserstein_or_euclidean = "euclidean", data_product='all', premade_cloud_regimes=None, lat_range=None, lon_range=None, only_ocean_or_land=False):
+ """
+ This script/function is designed to generate 2-D lat/lon maps of Cloud Regimes (CRs), as well as plots of the CR
+ centers themselves. It can fit data into CRs using either Wassertstein (AKA Earth Movers Distance) or the more conventional

Collaborator

nusbaume Dec 28, 2023

Typo here:

Suggested change

 centers themselves. It can fit data into CRs using either Wassertstein (AKA Earth Movers Distance) or the more conventional 

 centers themselves. It can fit data into CRs using either Wasserstein (AKA Earth Movers Distance) or the more conventional

scripts/plotting/cloud_regime_analysis.py

+import glob
+from math import ceil
+import time
+import dask

Collaborator

nusbaume Dec 28, 2023

dask is not currently included in the ADF-provided env/conda_environment.yaml file. I would recommend adding it in this PR unless the use of dask is optional for this script.

scripts/plotting/cloud_regime_analysis.py

+ # Opening an initial dataset
+ init_ds_b = xr.open_dataset(files[0])
+ print(f' -Starting {data} CAM baseline data') #testing

Collaborator

nusbaume Jan 5, 2024

Remove #testing comment?

scripts/plotting/cloud_regime_analysis.py

+ # Variable that gets set to true if var is missing in the data file, and is used to skip processing that dataset
+ missing_var = False
+ # Trying to open time series files from cam)ts_loc

Collaborator

nusbaume Jan 5, 2024

Typo here:

Suggested change

 # Trying to open time series files from cam)ts_loc

 # Trying to open time series files from cam_ts_loc

scripts/plotting/cloud_regime_analysis.py

+ # Masking out the land or water
+ if only_ocean_or_land == 'L': ds_b = ds_b.where(land == 1)
+ elif only_ocean_or_land == 'O': ds_b = ds_b.where(land == 0)
+ else: raise Exception('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water')

Collaborator

nusbaume Jan 5, 2024

Don't want to kill the ADF here:

Suggested change

 else: raise Exception('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water')

 else:

 print('Invalid option for only_ocean_or_land: Please enter "O" for ocean only, "L" for land only, or set to False for both land and water')

 return

scripts/plotting/cloud_regime_analysis.py

+ weights_b=weights_b[valid_indicies_b]
+ if np.min(mat_b < 0):
+ raise Exception (f'Found negative value in ds_b.{var_name}, if this is a fill value for missing data, convert to nans and try again')

Collaborator

nusbaume Jan 5, 2024