A Django app to extract, refine and publish Risk Management Plan (RMP) data collected by the U.S. Federal Environment Protection Agency.
Requires PostgreSQL, because our ETL process relies heavily on django-postgres-copy
.
Python-related dependencies for this project are managed via pipenv.
Below are the steps to set up a local server on a Mac. These instructions have been tested on the latest releases of macOS Mojave (10.14), High Sierra (10.13) and macOS Sierra (10.12).
Open your terminal application, and type in each of these commands in the order specified.
Xcode is a large suite of software development tools and libraries, provided by Apple. We only need some of these tools (e.g., the GCC compiler), which is included in the subset Xcode called the Command Line Tools:
xcode-select --install
You'll then see a prompt that looks like this:
Select "Install", then chill for a few minutes.
Homebrew is an un-official package manager for Macs. It helps us install and configure software that you can't find on the App Store.
You may already have it installed. Let's check by updating to the latest version:
brew update
If you get get something like this:
Updated Homebrew from bb038c7048 to ff3cede96f.
Updated 2 taps (homebrew/core, homebrew/cask).
==> New Formulae
i2pd opensubdiv tdlib
==> Updated Formulae
cgal ✔ go logstash
cmake ✔ godep mariadb
But if you get this:
brew: command not found
Then you need to install it like this:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
You will see a prompt that looks like this:
==> This script will install:
/usr/local/bin/brew
/usr/local/share/doc/homebrew
/usr/local/share/man/man1/brew.1
/usr/local/share/zsh/site-functions/_brew
/usr/local/etc/bash_completion.d/brew
/usr/local/Homebrew
==> The following new directories will be created:
/usr/local/bin
/usr/local/etc
/usr/local/include
/usr/local/lib
/usr/local/sbin
/usr/local/share
/usr/local/var
/usr/local/opt
/usr/local/share/zsh
/usr/local/share/zsh/site-functions
/usr/local/var/homebrew
/usr/local/var/homebrew/linked
/usr/local/Cellar
/usr/local/Caskroom
/usr/local/Homebrew
/usr/local/Frameworks
Press RETURN to continue or any other key to abort
So then press RETURN, and enter your password for your user account on your Mac.
pyenv
helps you manage different versions of Python running on the same machine.
We can install it with homebrew:
brew install pyenv
The default Unix shell for macOS is bash. We can confirm that this default is still intact:
echo "$SHELL"
And here is what you should see:
/bin/bash
We need to add a few lines of code to a file named .bash_profile
, which is a configuration file that runs whenever a user starts their shell environment.
First, we need to an environment variable, which is value stored in your shell environment that can be used by software running within that environment. The specific environment variable we need to set is PYENV_ROOT
, which should point to the directory where pyenv stores its data:
echo -e 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
Then we run the command to initialize pyenv at the end of the profile, as directed by pyenv's docs:
echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bash_profile
In order for this change to take effect, restart your shell. Do this by closing your current Terminal application window and opening a new one.
Now we install five additional dependencies for Python, recommended by pyenv, plus mdbtools
, another tool we need for extracting the rmp data.
brew install openssl readline sqlite3 xz zlib mdbtools
One of these libraries, zlib
, requires a couple of additional steps, per Homebrew's instructions:
export LDFLAGS="-L/usr/local/opt/zlib/lib"
And:
export CPPFLAGS="-I/usr/local/opt/zlib/include"
PostgreSQL is an open-source relational database manager, which is required for this project.
brew install postgresql
Then, we use a shortcut provided by homebrew for starting PostgreSQL.
brew services start postgresql
We have to create a database for our user profile:
createdb `whoami`
Then create a super user named "postgres", which is the typical configuration.
createuser -s postgres
pipenv
has recently gained a lot of traction as tool to help Python developers manage workflows related to virtual environments, package installation and dependency management. As such, it is now the tool recommended by python.org for managing application dependencies.
brew install pipenv
This will create a local copy of project directory in your present working directory.
git clone https://github.com/J4502-FS18/django-rmp-data.git
Navigate into the project folder:
cd django-rmp-data/
Similar to how we set an environment variable for our shell environment, we need to set a few environment variables particular to our project environment. These include secrets, such as database connection credentials, which we store in a .env
file in the project directory.
The rule for generating this .env
file are already defined in the Makefile
in this repo. So you just need to run one command:
make env
Then use pipenv
to set up your virtual environment and install all necessary dependencies (including the correct version of Python and Django):
pipenv install
Unless you already have Python 3.6 installed, you will get a prompt like this:
Warning: Python 3.6 was not found on your system...
Would you like us to install CPython 3.6.6 with pyenv [Y/N]:
Then you type Y
and hit enter.
If this doesn't work, then fall back to installing the necessary version of Python:
pyenv install 3.6.6
After all of the project dependencies are instally, you can initiate your virtual environment:
pipenv shell
We need to create a database in our local PostgreSQL cluster:
createdb rmp
Then create all of the database tables:
python manage.py migrate
First, download sample data that we've made available for this project:
curl --request GET --url 'https://s3.us-east-2.amazonaws.com/rmp-sample-data/rmp.zip' > data/rmp.zip
Then unzip the download:
unzip data/rmp.zip -d data/
Then load it into your local instance:
python manage.py loadrmpdata
At long last, we are ready to start the Django server:
python manage.py runserver
Because your team has forked our repository, you will occasionally need to catch up to the latest version of our source code.
To do this, you'll add a new "remote" to your local copy of the repo. A remote is just a URL pointing to a git repo. Your cloned fork already has one remote on it called "origin", which points to the location of your fork of our repo on GitHub.
We're going to add a new remote called "upstream" (since it's upstream of your fork's history), which points to our original repo. Here's how we do that:
git remote add upstream https://github.com/rji-futures-lab/django-rmp-data.git
You'll only need to run the above command once. Then, whenever you need to get our latest changes, you can pull them down like this:
git pull upstream master
The new changes you've pulled down might include changes to our data models, which also need to be propagated to the database in your local PostgreSQL cluster. This is called a database migration in Django parlance.
You should only ever have to run this single command:
python manage.py migrate
It's safe to migrate
even if no new migrations have been added.