Skip to content

Commit

Permalink
Merge branch 'master' of github.com:marcwie/sedac-gpw-parser
Browse files Browse the repository at this point in the history
  • Loading branch information
marcwie committed Mar 24, 2020
2 parents 6123366 + d47615d commit 64a12ab
Showing 1 changed file with 71 additions and 2 deletions.
73 changes: 71 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ A few system-wide packages are required for this package to work (this is due to

For other operating systems and linux flavours please do `pip install cartopy` before installing `sedac-gpw-parser` and make sure that `cartopy` is installed correctly.

To properly download and prepare the data you need to have `unzip` installed. This can be done in Ubuntu by typing `sudo apt-get install unzip`.

# Installation

This package only works with `python3`. To install just type:
Expand All @@ -34,6 +36,8 @@ python setup.py install
```

`download-sedac-gpw-data.sh` downloads the necessary data. The script prompts for your EarthData login credentials. Note that the script temporarily writes your password in plain text to `~/.netrc` in your home folder. However, the file is removed right after the scripts succesfully exits or in case of a keyboard interrupt.

**Note**: In some cases `download-sedac-gpw-data.sh` has proven to be error prone. See [below](#known-issues) for how to retrieve the input files manually.

`python -m "sedac_gpw_parser.run"` prepares the data for later use for each of the 245 countries that are present in the data-set. For each country it creates three files:

Expand All @@ -44,12 +48,77 @@ python setup.py install
3. If you want to work with the population data by, e.g., doing further analysis and evaluation, you can get a 2d `numpy` array of the data and the ranges of covered latitudes and longitudes by using the following snippet:
```python
from sedac_gpw_parser import population
pop = population.Population(country_id=276)
pop = population.Population(country_id=250)
population_array = pop.population_array()
latitudes = pop.latitude_range()
longitudes = pop.longitude_range()
```

4. Note that `country_id=250` in the above example returns the data for *France*. If you want to know the `id` of a certain country you can use
```python
from sedac_gpw_parser import utils
utils.id_lookup("france")
```
which generates the output:
```
France : 250
```
You can also do more fuzzy searches:
```python
from sedac_gpw_parser import utils
utils.id_lookup("unit")
```
which gives you:
```
United Arab Emirates : 784
United Kingdom of Great Britain and Northern Ireland : 826
United Republic of Tanzania : 834
United States of America : 840
United States Virgin Islands : 850
United States Minor Outlying Islands : 908
```

4. If you want to plot the data for a specific country you can use the following snippet:
```python
from sedac_gpw_parser import utils
plt = plot.Plot(country_id=250)
plt.plot()
```
You can use `plt.plot(show=True)` instead of `plt.plot()` if you want to display the figure in a `jupyter notebook`.

# Known issues

1. For some reason the script `download-sedac-gpw-data.sh` has proven to be error prone on some systems. Instead of using the script you can prepare your working directory like so:
```
mkdir workdir
```
Then open your browser and log in to https://urs.earthdata.nasa.gov/. Don't close your browser and keep logged in.
Follow the two links below to download the necessary input files:

- https://sedac.ciesin.columbia.edu/downloads/data/gpw-v4/gpw-v4-population-count-rev11/gpw-v4-population-count-rev11_2020_30_sec_asc.zip
- https://sedac.ciesin.columbia.edu/downloads/data/gpw-v4/gpw-v4-national-identifier-grid-rev11/gpw-v4-national-identifier-grid-rev11_30_sec_asc.zip

Make sure to download both files into your `workdir`.

You should then be able to `cd` into your `workdir` and run `download-sedac-gpw-data.sh` to extract the files into the required structure.

You can also extract both `.zip`-files manually and confirm that the extracted file structure looks like so:
```
workdir/
├── gpw-v4-national-identifier-grid-rev11_30_sec_asc
│   ├── gpw_v4_national_identifier_grid_rev11_30_sec_1.asc
│   ├── gpw_v4_national_identifier_grid_rev11_30_sec_1.prj
│   ├── ...
│   └── gpw_v4_national_identifier_grid_rev11_lookup.txt
├── gpw-v4-national-identifier-grid-rev11_30_sec_asc.zip
├── gpw-v4-population-count-rev11_2020_30_sec_asc
│   ├── gpw_v4_population_count_rev11_2020_30_sec_1.asc
│   ├── gpw_v4_population_count_rev11_2020_30_sec_1.prj
│   ├── ...
└── gpw-v4-population-count-rev11_2020_30_sec_asc.zip
```
Most tools for unarchiving create such a directory structure automatically by extracting files into a folder with the same name as the `.zip`-archive.

# Design principle & objectives
- Be as light as possible on RAM since storing the entire dataset into RAM is expected to cause
trouble on high performance clusters (that usually have very low memory) and older hardware
Expand All @@ -74,7 +143,7 @@ python setup.py install
4 5709 4875,4893
4 5710 4876,4892

The first line as a header. In each following line:
The first line is a header. In each following line:

- The first entry is the id of the corresponding input file (between 1 and 8)
- The second entry is the line number that contains relevant data (starting from 0 after the file header)
Expand Down

0 comments on commit 64a12ab

Please sign in to comment.