Forked from https://github.com/BuzzFeedNews/2015-08-immigrant-detention
This repository contains the data and code supporting several passages in the BuzzFeed News article, "Vast Disparities By Nationality In Immigration Jailings," published August 25, 2015.
- Run
pip install -r requirements.txt
to install the necessary packages (recommended to use a virtual environment) - Because the raw data occupies 5.3 gigabytes of storage, this repository contains a compressed version of it. To decompress the data, run
make data
from this repository's root directory.
- After setup above, run each cell of the python notebook (it will take some time given the large dataset). This saves a file called
custody_by_nationality_year.csv
in thenotebooks/
directory - Run
python analysis_by_year.py
. This reads the CSV saved above into a Pandas dataframe and does further analysis on the data. This saves another CSV,region_year_breakdown.csv
, that has the final analysis results that was used back in 2015. Note: I checked columns from the previous results and the results from this repo and they are the same.
- No longer remove criminal cases (removed the filter
(cases_with_first_proceeding["CASE_TYPE"] == "RMV")
when creating theselected_cases
dataframe) - No longer remove unaccompanied children (removed the filter
(cases_with_first_proceeding["CASEPRIORITY_CODE"] != "UC")
when creating theselected_cases
dataframe) - Include all countries (not just those with fewer than 20K cases between 2003 and 2014 like in original analysis)
- They only look at cases with criminal charges. I keep those in there (removed the line
non_crim_selected_cases = selected_cases[~selected_cases["has_criminal_charge"]].copy()
) - Removed the code for the logistic regression, not used here
- Renamed columns in the output file to be more clear
- Grouped by year
- Further analysis done in
analysis_by_year.py
year
: the year that the hearing date happenedregion
: the region of the world, mapped from the countrytotal
: total number of cases for that region/yeardetained
: total number of detained cases for that region/yearnc
: total cases that are Not detained, criminal charge for that region/yeardc
: total cases that are Detained, criminal charge for that region/yearrc
: total cases that are Detained then released, criminal charge for that region/yearnn
: total cases that are Not detained, no criminal charge for that region/yeardn
: total cases that are Detained, no criminal charge for that region/yearrn
: total cases that are Detained then released, no criminal charge for that region/yeartotal_year
: total cases that year across all regions that yeardetained_year
: total number of detained cases across all regions that yearpct_criminal_detained
: % of criminal cases that are detainedpct_noncriminal_detained
: % of non-criminal cases that are detainedpct_detained
: % of cases that were detained for that region/yearpct_all_proceedings
: % of all cases for that year that were attributed to that regionpct_all_detained
: % of all detained cases for that year that were attributed to that region
The analyses below use data from the Department of Justice's "CASE" database, maintained by the Executive Office of Immigration Review (EOIR). The database, which tracks immigration proceedings, was obtained by BuzzFeed News through a Freedom of Information Request. The data cover all cases whose records were created or updated between Jan. 1, 2000 and Jan. 15, 2015.
For a code-free overview, read this general methodology.