Skip to content

Docker Image for indexing into a datacube

License

Notifications You must be signed in to change notification settings

whatnick/datacube-index

Repository files navigation

Datacube Index

https://github.com/opendatacube/datacube-index/workflows/Lint%20and%20Test%20Code/badge.svg?branch=master

This is a collection of python applications and a helper docker image used to index data into a datacube using odc-tools.

The functionality is exposed in form of various <storage backend>-to-dc utilities which accept URI/GLOB parameters and product name(s) to index into a default datacube. These utilities include:

  1. s3-to-dc : Index from S3 storage to a Datacube database.
  2. thredds-to-dc : Index from Thredds server to a Datacube database.

It has code to perform the follow steps:

  1. Crawl S3 to find datasets using s3-find and produce a generator.
  2. Crawl Thredds using Thredds Crawler with NCI specific defaults (overrideable).
  3. Index dataset YAML's found into datacube using generator/list equivalent of dc-index-from-tar while skipping the tar file.

Usage in Production

Production deployments of OpenDataCube typically have follow on steps to a new product or new datasets for an existing product getting indexed. These steps are outlined below:

  1. Use OWS Update ranges to update layer extents for products in OWS managed tables in a separate container.
  2. Use Explorer Summary generation to generate summaries.
  3. The 3-containers are tied together by an Airflow DAG using a K8S Executor.
  4. Utilities in the 3 parts of the datacube applications/library ecosystem are tied together by custom Python scripts.