Load Libraries:
- caret
- data.table
- dplyr
- sets
- scales
- tidyr
- stringr
- After downloading Data folder from specified OneDrive:
.
├── AHRQ_pipeline
│ ├── compress_pharma.py
│ └── merge_AHRQ.Rmd
└── Data
├── AHRQ
│ ├── COUNTY
│ ├── TRACT
│ └── ZIP
└── GA_Pharmacy_Data_gp_fsq
- Load patient data from a CSV file. (select your desired geographic level & years from AHRQ SDOHD).
- Load years of AHRQ SDOH data from CSV files.
- Load a CSV file containing feature names for AHRQ SDOH variables.
- Merge AHRQ data from multiple years into a single data frame.
- Pad ZIP codes and STATEFIPS codes with leading zeros for consistency.
- Optionally perform imputation for missing values.
Merge the preprocessed AHRQ data with patient data using crosswalk variables: STATEFIPS ZIPCODE YEAR