Merge branch 'master' of github.com:marcwie/sedac-gpw-parser

marcwie · Mar 24, 2020 · 64a12ab · 64a12ab
2 parents 6123366 + d47615d
commit 64a12ab
Showing 1 changed file with 71 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -12,6 +12,8 @@ A few system-wide packages are required for this package to work (this is due to
 
 For other operating systems and linux flavours please do `pip install cartopy` before installing `sedac-gpw-parser` and make sure that `cartopy` is installed correctly.
 
+To properly download and prepare the data you need to have `unzip` installed. This can be done in Ubuntu by typing `sudo apt-get install unzip`.
+
 # Installation
 
 This package only works with `python3`. To install just type:
@@ -34,6 +36,8 @@ python setup.py install
  ```
 
  `download-sedac-gpw-data.sh` downloads the necessary data. The script prompts for your EarthData login credentials. Note that the script temporarily writes your password in plain text to `~/.netrc` in your home folder. However, the file is removed right after the scripts succesfully exits or in case of a keyboard interrupt. 
+
+ **Note**: In some cases `download-sedac-gpw-data.sh` has proven to be error prone. See [below](#known-issues) for how to retrieve the input files manually.
 
  `python -m "sedac_gpw_parser.run"` prepares the data for later use for each of the 245 countries that are present in the data-set. For each country it creates three files:
 
@@ -44,12 +48,77 @@ python setup.py install
 3. If you want to work with the population data by, e.g., doing further analysis and evaluation, you can get a 2d `numpy` array of the data and the ranges of covered latitudes and longitudes by using the following snippet:
  ```python
  from sedac_gpw_parser import population
- pop = population.Population(country_id=276)
+ pop = population.Population(country_id=250)
  population_array = pop.population_array()
  latitudes = pop.latitude_range()
  longitudes = pop.longitude_range()
  ```
+
+4. Note that `country_id=250` in the above example returns the data for *France*. If you want to know the `id` of a certain country you can use 
+ ```python
+ from sedac_gpw_parser import utils
+ utils.id_lookup("france")
+ ```
+ which generates the output:
+ ```
+ France : 250
+ ```
+ You can also do more fuzzy searches:
+ ```python
+ from sedac_gpw_parser import utils
+ utils.id_lookup("unit") 
+ ```
+ which gives you:
+ ```
+ United Arab Emirates : 784
+ United Kingdom of Great Britain and Northern Ireland : 826
+ United Republic of Tanzania : 834
+ United States of America : 840
+ United States Virgin Islands : 850
+ United States Minor Outlying Islands : 908
+ ```
+
+4. If you want to plot the data for a specific country you can use the following snippet:
+ ```python
+ from sedac_gpw_parser import utils
+ plt = plot.Plot(country_id=250)
+ plt.plot()
+ ```
+ You can use `plt.plot(show=True)` instead of `plt.plot()` if you want to display the figure in a `jupyter notebook`.
+
+# Known issues
 
+1. For some reason the script `download-sedac-gpw-data.sh` has proven to be error prone on some systems. Instead of using the script you can prepare your working directory like so:
+ ```
+ mkdir workdir
+ ```
+ Then open your browser and log in to https://urs.earthdata.nasa.gov/. Don't close your browser and keep logged in.
+ Follow the two links below to download the necessary input files:
+
+ - https://sedac.ciesin.columbia.edu/downloads/data/gpw-v4/gpw-v4-population-count-rev11/gpw-v4-population-count-rev11_2020_30_sec_asc.zip
+ - https://sedac.ciesin.columbia.edu/downloads/data/gpw-v4/gpw-v4-national-identifier-grid-rev11/gpw-v4-national-identifier-grid-rev11_30_sec_asc.zip
+
+ Make sure to download both files into your `workdir`. 
+
+ You should then be able to `cd` into your `workdir` and run `download-sedac-gpw-data.sh` to extract the files into the required structure. 
+
+ You can also extract both `.zip`-files manually and confirm that the extracted file structure looks like so:
+ ```
+ workdir/
+ ├── gpw-v4-national-identifier-grid-rev11_30_sec_asc
+ │   ├── gpw_v4_national_identifier_grid_rev11_30_sec_1.asc
+ │   ├── gpw_v4_national_identifier_grid_rev11_30_sec_1.prj
+ │   ├── ...
+ │   └── gpw_v4_national_identifier_grid_rev11_lookup.txt
+ ├── gpw-v4-national-identifier-grid-rev11_30_sec_asc.zip
+ ├── gpw-v4-population-count-rev11_2020_30_sec_asc
+ │   ├── gpw_v4_population_count_rev11_2020_30_sec_1.asc
+ │   ├── gpw_v4_population_count_rev11_2020_30_sec_1.prj
+ │   ├── ...
+ └── gpw-v4-population-count-rev11_2020_30_sec_asc.zip
+ ```
+ Most tools for unarchiving create such a directory structure automatically by extracting files into a folder with the same name as the `.zip`-archive.
+
 # Design principle & objectives
 - Be as light as possible on RAM since storing the entire dataset into RAM is expected to cause
  trouble on high performance clusters (that usually have very low memory) and older hardware
@@ -74,7 +143,7 @@ python setup.py install
   4 5709 4875,4893
   4 5710 4876,4892
 
- The first line as a header. In each following line:
+ The first line is a header. In each following line:
 
  - The first entry is the id of the corresponding input file (between 1 and 8)
  - The second entry is the line number that contains relevant data (starting from 0 after the file header)