Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken demo: data download and mean clim #798

Closed
lee1043 opened this issue Nov 8, 2021 · 10 comments · Fixed by #802, #793 or #804
Closed

Broken demo: data download and mean clim #798

lee1043 opened this issue Nov 8, 2021 · 10 comments · Fixed by #802, #793 or #804

Comments

@lee1043
Copy link
Contributor

lee1043 commented Nov 8, 2021

Demo 0 and Demo 1 are broken regardless of the recent PR for the workflow improvement. In the demos they show following error messages. I suspect an altered directory structure under demo_data may have caused these errors, which need to be checked.

Demo 0, result of cell [3]

Download failed

Demo 1, result of cell [3]

INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: REGION: Global
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: alternate1 is an obs
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: Could not figure out obs mask name from obs json file
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: TEST DATA IS: ACCESS1-0
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: ACCESS1-0 is a model
ERROR::2021-11-08 14:25::pcmdi_metrics::basicTest:: Failed opening 3D OBS rlut alternate1 /Users/lee1043/Documents/Research/git/pcmdi_metrics_20211107/pcmdi_metrics/doc/jupyter/Demo/demo_data/PCMDIobs2_clims/rlut/CERES-EBAF-4-0/v20210804/rlut_mon_CERES-EBAF-4-0_PCMDI_gn.200301-201812.AC.v20210804.nc
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: TEST DATA IS: CanCM4
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: CanCM4 is a model
ERROR::2021-11-08 14:25::pcmdi_metrics::basicTest:: Failed opening 3D OBS rlut alternate1 /Users/lee1043/Documents/Research/git/pcmdi_metrics_20211107/pcmdi_metrics/doc/jupyter/Demo/demo_data/PCMDIobs2_clims/rlut/CERES-EBAF-4-0/v20210804/rlut_mon_CERES-EBAF-4-0_PCMDI_gn.200301-201812.AC.v20210804.nc
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: default is an obs
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: Could not figure out obs mask name from obs json file
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: TEST DATA IS: ACCESS1-0
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: ACCESS1-0 is a model
ERROR::2021-11-08 14:25::pcmdi_metrics::basicTest:: Failed opening 3D OBS rlut default /Users/lee1043/Documents/Research/git/pcmdi_metrics_20211107/pcmdi_metrics/doc/jupyter/Demo/demo_data/PCMDIobs2_clims/rlut/CERES-EBAF-4-1/v20210804/rlut_mon_CERES-EBAF-4-1_PCMDI_gn.200301-201812.AC.v20210804.nc
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: TEST DATA IS: CanCM4
INFO::2021-11-08 14:25::pcmdi_metrics::basicTest:: CanCM4 is a model
ERROR::2021-11-08 14:25::pcmdi_metrics::basicTest:: Failed opening 3D OBS rlut default /Users/lee1043/Documents/Research/git/pcmdi_metrics_20211107/pcmdi_metrics/doc/jupyter/Demo/demo_data/PCMDIobs2_clims/rlut/CERES-EBAF-4-1/v20210804/rlut_mon_CERES-EBAF-4-1_PCMDI_gn.200301-201812.AC.v20210804.nc

Above, file looking for in the demo_data directory:
PCMDIobs2_clims/rlut/CERES-EBAF-4-0/v20210804/rlut_mon_CERES-EBAF-4-0_PCMDI_gn.200301-201812.AC.v20210804.nc
But actual file existing in the demo_data directory:
PCMDIobs2_clims/atmos/rlut/CERES-EBAF-4-1/rlut_mon_CERES-EBAF-4-1_BE_gn_200301-201812.v20200421.AC.nc

@gleckler1 @acordonez please let me know if you have any thought on this.

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 9, 2021

Note for Demo 0

Error reproduce

import os

files_md5 = "data_files.txt"
samples = open(files_md5).readlines()
print(samples)

n0 = 1

for sample in samples[n0:]:
    print(sample)
    good_md5, name = sample.split()

Result

['https://pcmdiweb.llnl.gov/pss/pmpdata/\n', '9d92d486fe3963b29f4d4926e47eab8b  CMIP5_demo_clims/cmip5.historical.ACCESS1-0.r1i1p1.mon.pr.198101-200512.AC.v20200426.nc\n', '16fb29fa02cc8c68e170502bca145640  CMIP5_demo_clims/cmip5.historical.ACCESS1-0.r1i1p1.mon.rlut.198101-200512.AC.v20200426.nc\n', '71aea66241de722a75e26d8fe55a1a22  CMIP5_demo_clims/cmip5.historical.ACCESS1-0.r1i1p1.mon.zg.198101-200512.AC.v20200426.nc\n', '44cce7b8a402a1004ca53565ab378615  CMIP5_demo_clims/cmip5.historical.CanCM4.r1i1p1.mon.pr.198101-200512.AC.v20200426.nc\n', '40b2bfa71a3b7d2febb55652ef551001  CMIP5_demo_clims/cmip5.historical.CanCM4.r1i1p1.mon.rlut.198101-200512.AC.v20200426.nc\n', '8bb70ae9036280c79d4e305b801d8066  CMIP5_demo_clims/cmip5.historical.CanCM4.r1i1p1.mon.zg.198101-200512.AC.v20200426.nc\n', '1abfd5cbcceac61ac51fed2e2c398eed  CMIP5_demo_clims/cmip6.historical.MCM-UA-1-0.r1i1p1f1.mon.zg.198101-200512.AC.v20201119.nc\n', 'd5e86e3680cce1a0004bbe9663d20cf6  CMIP5_demo_data/psl_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc\n', 'dd810f1de8a5db5cddf6a0ab22717cdc  CMIP5_demo_data/sftlf_fx_ACCESS1-0_amip_r0i0p0.nc\n', '63ea73e990aa7f2f53de8b493eb3e051  CMIP5_demo_data/cmip5.amip.ACCESS1-0.sftlf.nc\n', 'aa2384a8957af8ee6e652fb6a27f1f4a  CMIP5_demo_data/cmip5.historical.GISS-E2-H.sftlf.nc\n', '2c402715c026eb1d39ec8e450857ef6e  CMIP5_demo_data/ts_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc\n', 'ce9aa736ee548e3dc20749de95b6f3fb  CMIP5_demo_timeseries/historical/atmos/day/pr/pr_day_GISS-E2-H_historical_r6i1p1_20000101-20051231.nc\n', '804a9542e3681917c9da7ba3d62503fa  PCMDIobs2_clims/atmos/rlut/CERES-EBAF-4-0/rlut_mon_CERES-EBAF-4-0_BE_gn_200003-201810.v20200421.AC.nc\n', '0358a26ade74574eb44a4c37fb0c59e1  PCMDIobs2_clims/atmos/rlut/CERES-EBAF-4-1/rlut_mon_CERES-EBAF-4-1_BE_gn_200301-201812.v20200421.AC.nc\n', 'd773f68878d213db1c253417f7c7d380  PCMDIobs2_clims/atmos/pr/GPCP-2-3/pr_mon_GPCP-2-3_BE_gn_197901-201907.v20200421.AC.nc\n', '1a75c3fc6ca9f6dc1867898dcc8343e8  PCMDIobs2_clims/atmos/zg/ERA-INT/zg_mon_ERA-INT_BE_gn_198901-201001.v20200421.AC.nc\n', '8534afbd8ab49e7a48b64b1f92c813f9  PCMDIobs2/atmos/mon/rlut/CERES-EBAF-4-1/gn/v20200707/rlut_mon_CERES-EBAF-4-1_BE_gn_v20200707_200301-201812.nc\n', 'b6741c3f979b77a23509778f3a28403d  PCMDIobs2/atmos/mon/pr/GPCP-2-3/gn/v20200707/pr_mon_GPCP-2-3_BE_gn_v20200707_197901-201907.nc\n', 'fc21b9030f19abb3f752bd0bf8e42c00  PCMDIobs2/atmos/mon/psl/20CR/gn/v20200707/psl_mon_20CR_BE_gn_v20200707_187101-201212.nc\n', 'f74b3785195105e5b931b17f3e7eac19  PCMDIobs2/atmos/mon/ts/HadISST-1-1/gn/v20200707/ts_mon_HadISST-1-1_BE_gn_v20200707_187001-201907.nc\n', '1615175d54328d001b817c7e9cb56eeb  PCMDIobs2/atmos/day/pr/GPCP-IP/gn/v20200719/pr_day_GPCP-IP_BE_gn_v20200719_19980101-19981231.nc\n', '4a58b33b2088e4254c82cdce3c7d0eae  PCMDIobs2/atmos/day/pr/GPCP-IP/gn/v20200719/pr_day_GPCP-IP_BE_gn_v20200719_19990101-19991231.nc\n', 'eab076619d05c886648f33d507f4a721  misc_demo_data/atm/3hr/pr/pr_3hr_IPSL-CM5A-LR_historical_r1i1p1_5x5_1997-1999.nc\n', '3214c2480d017662d78ae7e50542beaa  misc_demo_data/fx/sftlf.GPCP-IP.1x1.nc\n', '\n']
9d92d486fe3963b29f4d4926e47eab8b  CMIP5_demo_clims/cmip5.historical.ACCESS1-0.r1i1p1.mon.pr.198101-200512.AC.v20200426.nc

16fb29fa02cc8c68e170502bca145640  CMIP5_demo_clims/cmip5.historical.ACCESS1-0.r1i1p1.mon.rlut.198101-200512.AC.v20200426.nc

71aea66241de722a75e26d8fe55a1a22  CMIP5_demo_clims/cmip5.historical.ACCESS1-0.r1i1p1.mon.zg.198101-200512.AC.v20200426.nc

44cce7b8a402a1004ca53565ab378615  CMIP5_demo_clims/cmip5.historical.CanCM4.r1i1p1.mon.pr.198101-200512.AC.v20200426.nc

40b2bfa71a3b7d2febb55652ef551001  CMIP5_demo_clims/cmip5.historical.CanCM4.r1i1p1.mon.rlut.198101-200512.AC.v20200426.nc

8bb70ae9036280c79d4e305b801d8066  CMIP5_demo_clims/cmip5.historical.CanCM4.r1i1p1.mon.zg.198101-200512.AC.v20200426.nc

1abfd5cbcceac61ac51fed2e2c398eed  CMIP5_demo_clims/cmip6.historical.MCM-UA-1-0.r1i1p1f1.mon.zg.198101-200512.AC.v20201119.nc

d5e86e3680cce1a0004bbe9663d20cf6  CMIP5_demo_data/psl_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc

dd810f1de8a5db5cddf6a0ab22717cdc  CMIP5_demo_data/sftlf_fx_ACCESS1-0_amip_r0i0p0.nc

63ea73e990aa7f2f53de8b493eb3e051  CMIP5_demo_data/cmip5.amip.ACCESS1-0.sftlf.nc

aa2384a8957af8ee6e652fb6a27f1f4a  CMIP5_demo_data/cmip5.historical.GISS-E2-H.sftlf.nc

2c402715c026eb1d39ec8e450857ef6e  CMIP5_demo_data/ts_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc

ce9aa736ee548e3dc20749de95b6f3fb  CMIP5_demo_timeseries/historical/atmos/day/pr/pr_day_GISS-E2-H_historical_r6i1p1_20000101-20051231.nc

804a9542e3681917c9da7ba3d62503fa  PCMDIobs2_clims/atmos/rlut/CERES-EBAF-4-0/rlut_mon_CERES-EBAF-4-0_BE_gn_200003-201810.v20200421.AC.nc

0358a26ade74574eb44a4c37fb0c59e1  PCMDIobs2_clims/atmos/rlut/CERES-EBAF-4-1/rlut_mon_CERES-EBAF-4-1_BE_gn_200301-201812.v20200421.AC.nc

d773f68878d213db1c253417f7c7d380  PCMDIobs2_clims/atmos/pr/GPCP-2-3/pr_mon_GPCP-2-3_BE_gn_197901-201907.v20200421.AC.nc

1a75c3fc6ca9f6dc1867898dcc8343e8  PCMDIobs2_clims/atmos/zg/ERA-INT/zg_mon_ERA-INT_BE_gn_198901-201001.v20200421.AC.nc

8534afbd8ab49e7a48b64b1f92c813f9  PCMDIobs2/atmos/mon/rlut/CERES-EBAF-4-1/gn/v20200707/rlut_mon_CERES-EBAF-4-1_BE_gn_v20200707_200301-201812.nc

b6741c3f979b77a23509778f3a28403d  PCMDIobs2/atmos/mon/pr/GPCP-2-3/gn/v20200707/pr_mon_GPCP-2-3_BE_gn_v20200707_197901-201907.nc

fc21b9030f19abb3f752bd0bf8e42c00  PCMDIobs2/atmos/mon/psl/20CR/gn/v20200707/psl_mon_20CR_BE_gn_v20200707_187101-201212.nc

f74b3785195105e5b931b17f3e7eac19  PCMDIobs2/atmos/mon/ts/HadISST-1-1/gn/v20200707/ts_mon_HadISST-1-1_BE_gn_v20200707_187001-201907.nc

1615175d54328d001b817c7e9cb56eeb  PCMDIobs2/atmos/day/pr/GPCP-IP/gn/v20200719/pr_day_GPCP-IP_BE_gn_v20200719_19980101-19981231.nc

4a58b33b2088e4254c82cdce3c7d0eae  PCMDIobs2/atmos/day/pr/GPCP-IP/gn/v20200719/pr_day_GPCP-IP_BE_gn_v20200719_19990101-19991231.nc

eab076619d05c886648f33d507f4a721  misc_demo_data/atm/3hr/pr/pr_3hr_IPSL-CM5A-LR_historical_r1i1p1_5x5_1997-1999.nc

3214c2480d017662d78ae7e50542beaa  misc_demo_data/fx/sftlf.GPCP-IP.1x1.nc

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-1cf409c6cee2> in <module>
      9 for sample in samples[n0:]:
     10     print(sample)
---> 11     good_md5, name = sample.split()

ValueError: not enough values to unpack (expected 2, got 0)

Error is looking like it was caused by having an empty line at the end of data_files.txt, which then include '\n' as a last element of samples list, from https://github.com/CDAT/cdat_info/blob/77f201c846715ef04e2ab7318810c61f1e1c7552/cdat_info/cdat_info_src.py#L359

Possible solution

  1. Remove the empty line at the bottom of the pmp_tutorial_files.txt in the web server (in /var/www/cmec-plots/pmpdata directory). -- clear solution
  2. Remove the empty line when rewriting data_files from downloaded pmp_tutorial_files.txt -- temporary solution
  3. Revise cdat_info to ignore any empty line -- long-term solution

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 10, 2021

Demo 0 issue is now resolved by the above solution 1. In addition, in the web server pmp_tutorial_files.txt is symbolic linked to certain version of the txt file, currently to pmp_tutorial_files.v20210406.txt but may need to update to later version in the future.

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 10, 2021

Demo 1 issue investigation note:

  • mean clim driver pulls obs info from obs_info_dictionary.json which looks like it is not aligned with currently downloading demo data.
  • Demo 1 currently in master is last updated in March and directory structure of demo data has changed in April, probably reason that new directory structure has not been propagated to the demo 1.

Possible solution:

  1. Update obs_info_dictionary.json, or
  2. Revise mean climate driver to pull obs info from obs catalogue JSON in this directory.

@gleckler1 Could you please share your insight which of above makes more sense to you?

Edit to add: Rather taking either of above option, I updated to download demo_data/obs4MIPs, that the current obs_info_dictionary.json can work with.

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 11, 2021

Note for continuing repair:

  • For a test of a new branch 798_ljw_demo1_fix_from_787_ljw_pre-commit-hook-compliant, from the web server side I updated pmp_tutorial_files.txt to the most recent one so Demo 0 downloads files under the obs4MIPS_* directories for reference dataset.
  • While latest was included in the pmp_tutorial_files.txt file, but there was no symbolic linked directories generated in the web server, which trigers error in Demo 1a because downloaded files have became dummy file without any content.

Possible solution:

  1. I need to manually generate latest symbolic linked directories under the obs4MIPs_* directories in the web server. -- However the directory in the web server does not allow me to write... (contacted Tony to re-fix that).
  2. Or, I might have to specify version date in the pmp_tutorial_files.txt file.

Edited to add: Resolved by above option 2.

This was linked to pull requests Nov 11, 2021
@acordonez
Copy link
Collaborator

@lee1043 @gleckler1 Regarding the mean climate notebook obs_info_dictionary.json issue, I recall that we updated the obs dictionary to use the obs4mips datasets, but we didn't have a sample data hash file on the pcmdi server that included all the obs4mips data. Does that exist now?

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 15, 2021

@acordonez yes, I've updated data in the pcmdi server so if you run the current mean clim notebook it should download obs4mips data.

@acordonez
Copy link
Collaborator

@lee1043 Thanks it looks like the new sample data is working for me. I caught one dataset that uses the "latest" path while the rest use "v20210727" (demo_data/obs4MIPs_PCMDI_daily/NASA-JPL/GPCP-1-3/day/pr/gn/latest/pr_day_GPCP-1-3_PCMDI_gn_19961002-20170101.nc). Would it be possible for them to all use the same convention? Or does daily precip have to be "latest"?

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 15, 2021

@acordonez thanks for checking it is working and nice catch for about the "latest". I think the original intention was to use the "latest" symbolic linked directory but somehow they were not existing in the pcmdi web server for monthly datasets, which resulted downloading just dummy files. I think it is okay for now as long as the demo is working, but it should be made as consistent. I think that question might be better answered by @gleckler1

@acordonez
Copy link
Collaborator

@lee1043 thanks for that reply. I have updated the parameter file templates and notebooks in branch "798_ao_update_nb". Not sure if I should open a PR for this branch, or do you want to merge it into your "787_ljw_pre-commit-hook-compliant" branch?

@lee1043
Copy link
Contributor Author

lee1043 commented Nov 15, 2021

@acordonez thanks for revising. Either is okay, but merging to master may help closing this issue quickly.

@acordonez acordonez linked a pull request Nov 15, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants