Skip to content

Commit

Permalink
Use pooch.os_cache and pkg_resources in datasets (#140)
Browse files Browse the repository at this point in the history
With Pooch 0.7.0, the recommended way of loading the registry file is
with `pkg_resources` (see fatiando/pooch#120). It's also better to use
the default cache location so users can more easily clean up unused
files. Because this is system specific, add the
`harmonica.datasets.locate` function to return the cache folder
location.
  • Loading branch information
leouieda authored and santisoler committed Jan 21, 2020
1 parent fd1f44d commit 6d5d049
Show file tree
Hide file tree
Showing 10 changed files with 68 additions and 31 deletions.
10 changes: 6 additions & 4 deletions .azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ jobs:
CONDA_REQUIREMENTS: requirements.txt
CONDA_REQUIREMENTS_DEV: requirements-dev.txt
CONDA_INSTALL_EXTRA: "codecov"
HARMONICA_DATA_DIR: "$(Agent.TempDirectory)/.harmonica/data"

strategy:
matrix:
Expand Down Expand Up @@ -116,8 +117,8 @@ jobs:
# Copy the test data to the cache folder
- bash: |
set -x -e
mkdir -p $HOME/.harmonica/data/master
cp -r data/* $HOME/.harmonica/data/master
mkdir -p ${HARMONICA_DATA_DIR}/master
cp -r data/* ${HARMONICA_DATA_DIR}/master
displayName: Copy test data to cache
# Install the package
Expand Down Expand Up @@ -167,6 +168,7 @@ jobs:
CONDA_REQUIREMENTS: requirements.txt
CONDA_REQUIREMENTS_DEV: requirements-dev.txt
CONDA_INSTALL_EXTRA: "codecov"
HARMONICA_DATA_DIR: "$(Agent.TempDirectory)/.harmonica/data"

strategy:
matrix:
Expand Down Expand Up @@ -200,8 +202,8 @@ jobs:
# Copy the test data to the cache folder
- bash: |
set -x -e
mkdir -p ~/.harmonica/data/master
cp -r data/* ~/.harmonica/data/master
mkdir -p ${HARMONICA_DATA_DIR}/master
cp -r data/* ${HARMONICA_DATA_DIR}/master
displayName: Copy test data to cache
# Install the package that we want to test
Expand Down
5 changes: 3 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ env:
# PyPI password for deploying releases (TWINE_PASSWORD)
- secure: "ufJKNS+JGD3klJglfML+4gerCOntHlfLX8M1zb3wyf6QbOfqZncvijh5ChVSAsqMeqKJG+hHVUgRN0pPgOyhsCetgeS/QdYsehEZO+UZeni+xdaAZgG/pz0pzxDuIOOTu9CImKIn+hnMSo3cnoO9fASem1c77XvLHs6BcShYpUZTRElvDGcvWlU2aZMA2qO9rVBpRBgr8GK6uLdXqV6yzJznWlQVRSJmpVEVfdNr5cbtgy7gxf2IBL2TXEWzzqwcJc3/bkzGFCwqgJ3aouIeHeWuNJEW23BjfIj6Da7ibsC7cPwS0u96MbBTBNOVInlh6Zy7xQvJpzgYBPTE3P3+HMUF7wS6HVkGyn86kh0JMfTwUjSul9SxGDCSn1JWnH5Ya7S5rGekYPcxE+2gKAt3BHjPM8xIDo6fIPH8zWLXl7xQJDHe/TWc3GRUD2OB+y9fWy+xvuXz7DI41oSHMSAEw6Ob2x7dGPbiSGvWmK0Z6Nm1sBVP/3xGhfhuH48587cbVEU27MAIUCzyJYi2750z3LP5pDP4HGiJsiH/kZLTBTgjxKM0m6+A/quAlARUsiXyf5z9fAqed13EcB/0x8YqqGVHNC79+7eIhEWwPU1h1+NeiTqK2wznzvawdfDtFHtquLC0pqWNq+r76walp+wBG0+jGpK+D/orUoW/x0oVrGo="
- TWINE_USERNAME=Leonardo.Uieda
- HARMONICA_DATA_DIR="$HOME/.harmonica/data"
# The files with the listed requirements to be installed by conda
- CONDA_REQUIREMENTS=requirements.txt
- CONDA_REQUIREMENTS_DEV=requirements-dev.txt
Expand All @@ -49,8 +50,8 @@ matrix:
# Setup the build environment
before_install:
# Copy sample data to the verde data dir to avoid downloading all the time
- mkdir -p $HOME/.harmonica/data/master
- cp -r data/* $HOME/.harmonica/data/master
- mkdir -p $HARMONICA_DATA_DIR/master
- cp -r data/* $HARMONICA_DATA_DIR/master
# Get the Fatiando CI scripts
- git clone --branch=1.1.1 --depth=1 https://github.com/fatiando/continuous-integration.git
# Download and install miniconda and setup dependencies
Expand Down
21 changes: 9 additions & 12 deletions data/examples/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,21 @@
Sample Data
===========

Harmonica provides some sample data for testing through the :mod:`harmonica.datasets`
module. The sample data are automatically downloaded from the `Github repository
<https://github.com/fatiando/harmonica>`__ to a folder on your computer the first time
you use them. After that, the data are loaded from this folder. The download is managed
by the :mod:`pooch` package.
Harmonica provides some sample data for testing through the
:mod:`harmonica.datasets` module.


Where is my data?
-----------------

The data files are downloaded to a folder ``~/.harmonica/data/`` by default. This is the
*base data directory*. :mod:`pooch` will create a separate folder in the base directory
for each version of Harmonica. For example, the base data dir for v0.1.0 is
``~/.harmonica/data/v0.1.0``. If you're using the latest development version from
Github, the version is ``master``.
The sample data files are downloaded automatically by :mod:`pooch` the first
time you load them. The files are saved to the default cache location on your
operating system. The location varies depending on your system and
configuration. We provide the :func:`harmonica.datasets.locate` function if you
need to find the data storage location on your system.

You can change the base data directory by setting the ``HARMONICA_DATA_DIR`` environment
variable to a different path.
You can change the base data directory by setting the ``HARMONICA_DATA_DIR``
environment variable to a different path.


Available datasets
Expand Down
1 change: 1 addition & 0 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Datasets
.. autosummary::
:toctree: generated/

datasets.locate
datasets.fetch_gravity_earth
datasets.fetch_geoid_earth
datasets.fetch_topography_earth
Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dependencies:
- scipy
- pandas
- numba
- pooch
- pooch>=0.7.0
- verde
- xarray
# Development requirements
Expand Down
1 change: 1 addition & 0 deletions harmonica/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# pylint: disable=missing-docstring
from .sample_data import (
locate,
fetch_gravity_earth,
fetch_topography_earth,
fetch_britain_magnetic,
Expand Down
43 changes: 33 additions & 10 deletions harmonica/datasets/sample_data.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,45 @@
"""
Functions to load sample datasets used in the Harmonica docs.
"""
import os

import pkg_resources
import xarray as xr
import pandas as pd
import pooch

from ..version import full_version

POOCH = pooch.create(
path=["~", ".harmonica", "data"],
REGISTRY = pooch.create(
path=pooch.os_cache("harmonica"),
base_url="https://github.com/fatiando/harmonica/raw/{version}/data/",
version=full_version,
version_dev="master",
env="HARMONICA_DATA_DIR",
)
POOCH.load_registry(os.path.join(os.path.dirname(__file__), "registry.txt"))
with pkg_resources.resource_stream(
"harmonica.datasets", "registry.txt"
) as registry_file:
REGISTRY.load_registry(registry_file)


def locate():
r"""
The absolute path to the sample data storage location on disk.
This is where the data are saved on your computer. The location is
dependent on the operating system. The folder locations are defined by the
``appdirs`` package (see the `appdirs documentation
<https://github.com/ActiveState/appdirs>`__).
The location can be overwritten by the ``HARMONICA_DATA_DIR`` environment
variable to the desired destination.
Returns
-------
path : str
The local data storage location.
"""
return str(REGISTRY.abspath)


def fetch_geoid_earth():
Expand All @@ -40,7 +63,7 @@ def fetch_geoid_earth():
longitude.
"""
fname = POOCH.fetch("geoid-earth-0.5deg.nc.xz", processor=pooch.Decompress())
fname = REGISTRY.fetch("geoid-earth-0.5deg.nc.xz", processor=pooch.Decompress())
data = xr.open_dataset(fname, engine="scipy").astype("float64")
return data

Expand Down Expand Up @@ -68,7 +91,7 @@ def fetch_gravity_earth():
longitude.
"""
fname = POOCH.fetch("gravity-earth-0.5deg.nc.xz", processor=pooch.Decompress())
fname = REGISTRY.fetch("gravity-earth-0.5deg.nc.xz", processor=pooch.Decompress())
# The heights are stored as ints and data as float32 to save space on the
# data file. Cast them to float64 to avoid integer division errors.
data = xr.open_dataset(fname, engine="scipy").astype("float64")
Expand Down Expand Up @@ -99,7 +122,7 @@ def fetch_topography_earth():
geodetic latitude and longitude.
"""
fname = POOCH.fetch("etopo1-0.5deg.nc.xz", processor=pooch.Decompress())
fname = REGISTRY.fetch("etopo1-0.5deg.nc.xz", processor=pooch.Decompress())
# The data are stored as int16 to save disk space. Cast them to floats to
# avoid integer division problems when processing.
data = xr.open_dataset(fname, engine="scipy").astype("float64")
Expand Down Expand Up @@ -136,7 +159,7 @@ def fetch_britain_magnetic():
data : :class:`pandas.DataFrame`
The magnetic anomaly data.
"""
return pd.read_csv(POOCH.fetch("britain-magnetic.csv.xz"), compression="xz")
return pd.read_csv(REGISTRY.fetch("britain-magnetic.csv.xz"), compression="xz")


def fetch_south_africa_gravity():
Expand Down Expand Up @@ -166,6 +189,6 @@ def fetch_south_africa_gravity():
The gravity data.
"""
fname = POOCH.fetch("south-africa-gravity.ast.xz")
fname = REGISTRY.fetch("south-africa-gravity.ast.xz")
columns = ["latitude", "longitude", "elevation", "gravity"]
return pd.read_csv(fname, sep=r"\s+", names=columns, compression="xz")
12 changes: 12 additions & 0 deletions harmonica/tests/test_sample_data.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
"""
Test the sample data loading functions.
"""
import os

import numpy.testing as npt

from ..datasets.sample_data import (
locate,
fetch_gravity_earth,
fetch_geoid_earth,
fetch_topography_earth,
Expand All @@ -12,6 +15,15 @@
)


def test_datasets_locate():
"Make sure the data cache location has the right package name"
path = locate()
assert os.path.exists(path)
# This is the most we can check in a platform independent way without
# testing appdirs itself.
assert "harmonica" in path


def test_geoid_earth():
"Sanity checks for the loaded grid"
grid = fetch_geoid_earth()
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ numpy
scipy
pandas
numba
pooch
pooch>=0.7.0
xarray
verde
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"scipy",
"pandas",
"numba",
"pooch",
"pooch>=0.7.0",
"xarray",
"verde",
]
Expand Down

0 comments on commit 6d5d049

Please sign in to comment.