Skip to content

A tool for collating immigration data from the Transactional Records Access Clearinghouse (TRAC)

Notifications You must be signed in to change notification settings

josephburkhart/TRAC-Collation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 

Repository files navigation

TRAC-Collation

This repository contains a tool for collating data published by the Transactional Records Access Clearinghouse (TRAC) in their immigration toolkit.

Requirements

  • selenium 4.17.0 (earlier might work but no guarantees)
  • pandas 2.2.0 (earlier will probably work)
  • tqdm

Usage

  1. Set up an environment with pandas and selenium (for conda instructions, see here).
  2. Download/locate the webdriver for your browser of choice - currently Firefox, Chrome, Edge and Safari are supported. Add the webdriver's path to your environment variables (in the case of Firefox, for additional instructions see here).
  3. Clone the repository, or just download collate.py.
  4. Navigate to the TRAC webpage that you want to collate data from (to see if your tool is supported, check below). Note the URL and the names of the axes you want to collate.
  5. collate.py can be run from an IDE or the command line:
  • To run in an IDE, open collate.py, navigate to STANDALONE_PARAMS, and set the following values:
    • browser: the name of the browser you downloaded that you downloaded the webdriver for.
    • url: the address of the TRAC webpage that you want to collate data from, including the https:// - for example, https://trac.syr.edu/phptools/immigration/mpp4/.
    • filename: the name of the HDF file (including the .hdf extension) you want the collated data to be saved in. Currently, only HDF file output is supported.
    • axes: the names of the three data axes you want to collate. In the final output dataest, values from the first two will be used as hierarchical indices, while values for the third will be used as columns. Support for more than three axes might be added later.
  • To run from the command line, ensure that your conda environment is active and that collate.py is in your current directory. There are three ways to run collate.py from the command line:
    • python collate.py runs the script with the standalone parameters.
    • python collate.py <options> runs the script with options. The user will then be prompted for the arguments individually. Options are:
      • --browser=<name>: name of the browser to use. Valid names are Firefox, Chrome, Edge, and Safari.
      • --headless: use the browser in headless mode. (This option is not required.)
      • -h or --help: show usage details. (This option is not required.)
    • python collate.py <options> <arguments> runs the script with options and arguments. Arguments are:
      • url: full address of the TRAC webpage
      • file: name or full path of the output file. Equivalent to filename in STANDALONE_PARAMS.
      • axes: Comma-separated list of the names of the axes of interest. Note that the list must be enclosed in "" if any names include spaces.

Note: even when collate.py is used with supported TRAC tools, it is possible that it may occasionally throw StaleElementReferenceException or NoSuchElementException when the DOM changes unexpectedly or an element takes a while to load. The code has been structured to greatly limit such problems, but if you are still having trouble you can try the following:

  • Try re-running on a faster internet connecection.
  • Try re-running at a time when the TRAC servers are likely to have a low load (e.g., weekends, weekday evenings).
  • Try re-arranging your names in the axes parameter so that the third axis is the one with the greatest number of values. This will decrease both execution time and the required number of interaction events (i.e., clicks and waits), so it will decrease the number of opportunities for an element to not load in properly.

Which TRAC tools can I use this with?

Supported

Automated interaction with the following tools should be fully supported.

Not yet supported

Automated interaction with the following tools is not supported. Some of these tools simply have an additional menu and are otherwise similar to the fully supported tools above, so you might have some success using collate.py with them. Others have totally different interfaces, and will not work at all with collate.py. Full support for these tools might be added later.

Disclaimer

This repository is intended to make federal immigration data more accessible to students, researchers, and journalists. It is not affiliated with, supported by, or recognized by TRAC. Always make sure to cite your sources properly. If you use this tool, I would appreciate an acknowledgement, but no citation is necessary.

About

A tool for collating immigration data from the Transactional Records Access Clearinghouse (TRAC)

Topics

Resources

Stars

Watchers

Forks

Languages