[Doc]: are there some xcdat test files (that can be predownloaded) ? #277

jypeter · 2022-07-22T14:28:03Z

Describe your documentation update

I wonder if there are xCDAT (or xarray) test files that can be (pre)downloaded and can be used for :

testing xCDAT with known (and local) files
examples and tutorials
having local data files that you can use when you have no network or low bandwidth

I'm thinking of (something like) the cdms2/vcs test data

I think these files are the ones listed in CDMS Sample Dataset and they are still online!

pochedls · 2022-07-28T21:17:46Z

I like this idea, but I'm wondering how this be implemented in a way that is easy to maintain. Perhaps we could add some functionality to directly download (e.g., from ESGF) example netCDF files (e.g., xcdat.get_test_data())?

I was curious about what xarray does – it seems like they generate toy data rather than providing data.

Should this be a discussion item?

jypeter · 2022-07-29T09:30:06Z

This is the up-to-date link for toy data you mentioned, but I'd rather have data coming from actual netCDF files than toy data generated in memory!

Some not-too-big test data files could come from ESGF, the way I've done it in #284, but we also need a way to get other static/known test data files:

subset (e.g a few time steps) of real ESGF data, because you don't want huge files with all the time steps when you have lots of time steps, or vertical levels. A script using xcdat to download and then save a subset of ESGF data (e.g first 10 time steps, and just a few pressure or depth levels of Northern Hemisphere) would be a useful example anyway
data with some known errors (e. g. [Bug]: open_dataset should handle missing bounds on ORCA grid more gracefully #284, or incorrectly masked data, or incorrect metadata, ...) that you want to be sure xcdat can handle, and also provide example scripts to show how to correct the files and save corrected files

I have just checked that cartopy mostly generates toy data on the fly for its examples, but iris uses a directory with actual data files (the way vcs and cdms2 did)

>>> import iris
>>> help(iris.sample_data_path)
sample_data_path(*path_to_join)
    Given the sample data resource, returns the full path to the file.

    .. note::

        This function is only for locating files in the iris sample data
        collection (installed separately from iris). It is not needed or
        appropriate for general file access.

>>> iris.sample_data_path("E1_north_america.nc")
'/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/iris_sample_data/sample_data/E1_north_america.nc'

ls -lh /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/iris_sample_data/sample_data/
total 24M
-rw-rw-r-- 2 jypeter lsce 110K Jun 25  2020 A1B.2098.pp
-rw-rw-r-- 2 jypeter lsce 1.8M Jun 25  2020 A1B_north_america.nc
-rw-rw-r-- 2 jypeter lsce  28K Jun 25  2020 air_temp.pp
-rw-rw-r-- 2 jypeter lsce  34K Jun 25  2020 atlantic_profiles.nc
-rw-rw-r-- 2 jypeter lsce 3.5M Jun 25  2020 colpex.pp
-rw-rw-r-- 2 jypeter lsce 110K Jun 25  2020 E1.2098.pp
-rw-rw-r-- 2 jypeter lsce 1.8M Jun 25  2020 E1_north_america.nc
drwxr-xr-x 2 jypeter lsce 4.0K Sep 10  2021 GloSea4/
-rw-rw-r-- 2 jypeter lsce 662K Jun 25  2020 hybrid_height.nc
-rw-rw-r-- 2 jypeter lsce 7.5M Jun 25  2020 NAME_output.txt
drwxr-xr-x 2 jypeter lsce 4.0K Sep 10  2021 NEMO/
-rw-rw-r-- 2 jypeter lsce 2.0M Jun 25  2020 orca2_votemper.nc
-rw-rw-r-- 2 jypeter lsce 1.7M Jun 25  2020 ostia_monthly.nc
-rw-rw-r-- 2 jypeter lsce  26K Jun 25  2020 polar_stereo.grib2
-rw-rw-r-- 2 jypeter lsce 110K Jun 25  2020 pre-industrial.pp
-rw-rw-r-- 2 jypeter lsce  19K Jun 25  2020 rotated_pole.nc
-rw-rw-r-- 2 jypeter lsce 163K Jun 25  2020 SOI_Darwin.nc
-rw-rw-r-- 2 jypeter lsce 243K Jun 25  2020 space_weather.nc
-rw-rw-r-- 2 jypeter lsce 514K Jun 25  2020 toa_brightness_stereographic.nc
-rw-rw-r-- 2 jypeter lsce 3.3M Jun 25  2020 uk_hires.pp
drwxr-xr-x 2 jypeter lsce  12K Sep 10  2021 UM/
-rw-rw-r-- 2 jypeter lsce 2.4K Jun 25  2020 wind_speed_lake_victoria.pp

tomvothecoder · 2022-08-01T23:14:40Z

Thanks for this @jypeter. This has been discussed and was in-mind, although a GH issue was not opened for it.

I explored a possible implementation similar to xarray. xarray uses a GH repo (https://github.com/pydata/xarray-data) to host test datasets, and provides xarray.tutorial methods to open up the test datasets using a package called pooch.

We didn't pursue this idea since xarray supports direct download of data using OpenDAP. However, I think this idea is worthwhile because it standardizes and streamlines the testing processes with easy access to the same real-world datasets.

jypeter · 2022-08-02T08:30:56Z

Hmmm, I had a quick look at the pooch GH page. It looks really nice and fancy but:

it may be an overkill for our purpose, from the end user point-of-view. But xCDAT could indeed use it behind the scene! Or possibly just use requests
specifying the input files seems a bit complicated, but it's OK if it only happens behind the scene. The end user should only have to specify a file name, and some xCDAT function should provide the path (either the directory where the file is located, or a full path)
you have to be careful where the data files are located! I'm not too sure about a cache that usually depends on the user login or something. When, like me, you install a python distribution for multiple users (where the person installing can write, but other users can't), it's convenient to have files installed in a fixed sub directory of the distribution's lib directory. And I hate default cache locations in hidden sub-directories of the users' home dir. We have nightly backups of the the home dirs at LSCE, and we archive the interns' home dir when they are finished. I don't want to have backups of hidden test files!
See also the Clarifying the 'under the hood' data download (and other GIS stuff) SciTools/cartopy#1325 ongoing issue about file location and cache problems

Having a dedicated python package with just the data could also be an easy solution: e.g. basemap-data-hires

jypeter · 2022-08-02T09:36:54Z

Another data sample example from xoa

>>> import xoa

>>> xoa.show_data_samples()
gdp-6203641.csv hycom.gdp.u.nc hycom.gdp.v.nc hycom.gdp.h.nc croco.south-africa.surf.nc hycom.cfg croco.cfg gdp.cfg mercator.cfg argo.cfg croco.south-africa.zonal.nc croco.south-africa.meridional.nc ibi-argo-7900573.nc argo-7900573.nc

>>> xoa.get_data_sample('hycom.gdp.u.nc')
'/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples/hycom.gdp.u.nc'

> du -sh /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples
1.1M    /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples

>ls -lh /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples
total 1.1M
-rw-rw-r-- 2 jypeter lsce  92K Feb 25 09:56 argo-7900573.nc
-rw-rw-r-- 2 jypeter lsce  305 Feb 25 09:56 argo.cfg
-rw-rw-r-- 2 jypeter lsce  714 Feb 25 09:56 croco.cfg
-rw-rw-r-- 2 jypeter lsce  61K Feb 25 09:56 croco.south-africa.meridional.nc
-rw-rw-r-- 2 jypeter lsce 190K Feb 25 09:56 croco.south-africa.surf.nc
-rw-rw-r-- 2 jypeter lsce  61K Feb 25 09:56 croco.south-africa.zonal.nc
-rw-rw-r-- 2 jypeter lsce  43K Feb 25 09:56 gdp-6203641.csv
-rw-rw-r-- 2 jypeter lsce   73 Feb 25 09:56 gdp.cfg
-rw-rw-r-- 2 jypeter lsce  487 Feb 25 09:56 hycom.cfg
-rw-rw-r-- 2 jypeter lsce 174K Feb 25 09:56 hycom.gdp.h.nc
-rw-rw-r-- 2 jypeter lsce 173K Feb 25 09:56 hycom.gdp.u.nc
-rw-rw-r-- 2 jypeter lsce 173K Feb 25 09:56 hycom.gdp.v.nc
-rw-rw-r-- 2 jypeter lsce  71K Feb 25 09:56 ibi-argo-7900573.nc
-rw-rw-r-- 2 jypeter lsce  195 Feb 25 09:56 mercator.cfg

durack1 · 2022-08-10T00:02:05Z

@tomvothecoder was there a plan to have a test suite with just the kind of (few timesteps) data that @jypeter was describing? It seems that CDAT was using the sample_data subdir which enabled testing in the CI envs, similar to what iris appears to do (#277 (comment) above)

jypeter · 2022-08-12T08:35:14Z

Note: see example usage of vcs.sample_data + '/tas_mo.nc' in #310 (comment)

jypeter · 2023-12-14T16:00:03Z

I have added an Easy to use datasets section to my python page, with test/tutorials datasets from several packages

@tomvothecoder It seems that xarray uses xarray.tutorial.load_dataset. Maybe xcdat could have a similar xcdat.tutorial.load_dataset pointing to some useful sample CMIP6 data (and possibly the equivalent CMIP5 data, if somebody wants to make a CMIP5/CMIP6 comparison example)

jypeter added the type: docs Updates to documentation label Jul 22, 2022

tomvothecoder mentioned this issue Aug 17, 2022

[Doc]: Calculating Daily Climatology and Departures from Monthly Time Series needs to use daily input #303

Closed

jypeter mentioned this issue Sep 9, 2022

Mention (provide) some actual test files on the website cf-convention/cf-convention.github.io#257

Closed

JonathanGregory mentioned this issue Sep 9, 2022

Provision of netCDF files for the netCDF header examples shown in the Conventions? cf-convention/cf-conventions#348

Open

tomvothecoder added this to To do in Future Releases via automation Sep 16, 2022

tomvothecoder added this to To do in Documentation & User Engagement Sep 16, 2022

tomvothecoder removed this from To do in Future Releases Sep 16, 2022

tomvothecoder mentioned this issue Oct 19, 2022

[Doc]: Add general introduction and demo notebook for xcdat #372

Closed

tomvothecoder modified the milestone: Documentation & User Engagement Sep 27, 2023

tomvothecoder added the good-first-issue Good first issue for new contributors label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: are there some xcdat test files (that can be predownloaded) ? #277

[Doc]: are there some xcdat test files (that can be predownloaded) ? #277

jypeter commented Jul 22, 2022

pochedls commented Jul 28, 2022

jypeter commented Jul 29, 2022

tomvothecoder commented Aug 1, 2022 •

edited

Loading

jypeter commented Aug 2, 2022

jypeter commented Aug 2, 2022

durack1 commented Aug 10, 2022

jypeter commented Aug 12, 2022

jypeter commented Dec 14, 2023

[Doc]: are there some xcdat test files (that can be predownloaded) ? #277

[Doc]: are there some xcdat test files (that can be predownloaded) ? #277

Comments

jypeter commented Jul 22, 2022

Describe your documentation update

pochedls commented Jul 28, 2022

jypeter commented Jul 29, 2022

tomvothecoder commented Aug 1, 2022 • edited Loading

jypeter commented Aug 2, 2022

jypeter commented Aug 2, 2022

durack1 commented Aug 10, 2022

jypeter commented Aug 12, 2022

jypeter commented Dec 14, 2023

tomvothecoder commented Aug 1, 2022 •

edited

Loading