Skip to content

nocollier/intake-ilamb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ILAMB intake catalog

Intake is a lightweight set of python tools for loading and sharing data in data science projects. This code provides an yaml-style intake catalog of the reference data we use for ESM model benchmarking in the International Land Model Benchmarking (ILAMB) effort.

Usage

In order to use this catalog, first install intake. Nothing from this package needs installed. You can simply load the catalog in your python script by pointing to the remote file on Github.

import intake
cat = intake.open_catalog("https://raw.githubusercontent.com/nocollier/intake-ilamb/main/ilamb.yaml")
print(cat)

You can see a list of all the data source entries by typing print(list(cat)) or you can treat the catalog as a dictionary. Here I start to reference an entry by opening a square bracket and starting to type gpp afterwhich I hit my tab key. This will show you all the sources that start with these characters allowing simple searches by variable name. When possible we have used the CMOR names of variables.

cat['gpp<TAB>
            gpp | FLUXCOM    
            gpp | FLUXNET2015
            gpp | WECANN     

By selecting a key relating to the dataset we wish to use, we get back an intake data source object (the variable src below). The source objects have instructions embedded in them for automatically downloading the data when you use the read() method.

src = cat['gpp | FLUXCOM']
%time gpp = src.read()  # 9.84 [s]

The first time you read a dataset on a given system, you will be downloading the data. The above read() took almost 10 seconds. However, on subsequent calls, intake manages a cache and the data will be read much faster and locally from your system. Each data source in this catalog will return a xarray dataset to you that you can then use in your analysis scripts.

<xarray.Dataset>
Dimensions:      (time: 408, nb: 2, lat: 360, lon: 720)
Coordinates:
  * time         (time) object 1980-01-16 12:00:00 ... 2013-12-16 12:00:00
  * lat          (lat) float64 89.75 89.25 88.75 88.25 ... -88.75 -89.25 -89.75
  * lon          (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
Dimensions without coordinates: nb
Data variables:
    time_bounds  (time, nb) object 1980-01-01 00:00:00 ... 2014-01-01 00:00:00
    gpp          (time, lat, lon) float32 9.969e+36 9.969e+36 ... 9.969e+36
Attributes:
    title:         FLUXCOM (RS+METEO) Global Land Carbon Fluxes using CRUNCEP...
    version:       1
    institutions:  Department Biogeochemical Integration, Max Planck Institut...
    source:        Data generated by Artificial Neural Networks and forced wi...
    history:       \n2020-08-25: downloaded source from ftp://ftp.bgc-jena.mp...
    references:    \n@ARTICLE{Jung2017,\n  author = {Jung, M., M. Reichstein,...
    comments:      \ntime_period: 1980-01 through 2013-12; temporal_resolutio...
    convention:    CF-1.8

This allows you to write analysis scripts utilizing ILAMB data that will run anywhere with a connection to the internet without the need to setup and separately download the data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages