Skip to content

Commit

Permalink
general updates to list and small changes to downloader
Browse files Browse the repository at this point in the history
  • Loading branch information
btskinner committed Jan 18, 2018
1 parent 876a838 commit 79c7a40
Show file tree
Hide file tree
Showing 3 changed files with 230 additions and 11 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Use this script to batch download [Integrated Postsecondary Education Data System (IPEDS)](https://nces.ed.gov/ipeds/) files. The downloaded files are not unzipped or processed in any way. This script simply saves you the trouble of having to point and click your way through the data center.

(You can also download database files [here](https://nces.ed.gov/ipeds/Section/accessdatabase/), but you need MS Access to open them.)

Only those files listed in `ipeds_file_list.txt` will be downloaded. The default behavior is to download each of the following files into their own subdirectories:

1. Data file
Expand All @@ -18,6 +20,8 @@ You can also choose to download other data versions and/or program files:

The default behavior is to download **ALL OF IPEDS**. If you don't want everything, modify `ipeds_file_list.txt` to include only those files that you want. Simply erase those you don't want, keeping one file name per row.

I try to keep `ipeds_file_list.txt` updated, but if I've missed a file or haven't updated in a while, just add the name of the file or files, one to a line. The downloading script ignores lines starting with hashes (`#`), so you can add notes or better section headers to the file if you want.

You also have the option of whether you wish to overwrite existing files.
If you do, change the `overwrite` option to `TRUE`. The default behavior is
to only download files listed in `ipeds_file_list.txt` that have not already been downloaded.
Expand Down Expand Up @@ -85,16 +89,16 @@ out_dir = '.'

## Data size

As of 1 April 2017, downloading all IPEDS files (setting all options to `TRUE`) requires approximately 1.52 GB of disk space. Granted, you probably don't need both regular and Stata versions of the data files (which are the bulk of the directory size). Here are the approximate subdirectory file sizes if you download all data files from all years:
As of 18 January 2018, downloading all IPEDS files (setting all options to `TRUE`) requires approximately 1.64 GB of disk space. Granted, you probably don't need both regular and Stata versions of the data files (which are the bulk of the directory size). Here are the approximate subdirectory file sizes if you download all data files from all years:

|Subdirectory|Approximate Size|
|:--|:-:|
|`./data`|790.4 MB|
|`./dictionary`|17.4 MB|
|`./sas_prog`|5.2 MB|
|`./spss_prog`|5.0 MB|
|`./stata_data`|693.7 MB|
|`./stata_prog`|5.8 MB|
|`./data`|852 MB|
|`./dictionary`|20 MB|
|`./sas_prog`|6 MB|
|`./spss_prog`|5 MB|
|`./stata_data`|755 MB|
|`./stata_prog`|6 MB|

## Combine

Expand Down
8 changes: 4 additions & 4 deletions downloadipeds.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
##
## <PROJ> Batch download IPEDS files
## <FILE> downloadipeds.R
## <AUTH> Benjamin Skinner
## <AUTH> Benjamin Skinner (@btskinner)
## <INIT> 21 July 2015
## <REVN> 01 April 2017
## <REVN> 18 January 2018
##
################################################################################

Expand Down Expand Up @@ -112,9 +112,9 @@ countdown <- function(pause, text) {
## RUN
## =============================================================================

## read in files; remove blank lines
## read in files; remove blank lines & lines starting with #
ipeds <- readLines('./ipeds_file_list.txt')
ipeds <- ipeds[ipeds != '']
ipeds <- ipeds[ipeds != '' & !grepl('^#', ipeds)]

## data url
url <- 'https://nces.ed.gov/ipeds/datacenter/data/'
Expand Down
Loading

0 comments on commit 79c7a40

Please sign in to comment.