Load Libraries:
- caret
- data.table
- dplyr
- sets
- scales
- tidyr
- stringr
- After downloading Data folder from specified OneDrive: . ├── AHRQ_pipeline │ ├── compress_pharma.py │ └── merge_AHRQ.Rmd └── Data ├── AHRQ │ ├── COUNTY │ ├── TRACT │ └── ZIP └── GA_Pharmacy_Data_gp_fsq
- Load patient data from a CSV file. (select your desired geographic level & years from AHRQ SDOHD).
- Load years of AHRQ SDOH data from CSV files.
- Load a CSV file containing feature names for AHRQ SDOH variables.
- Merge AHRQ data from multiple years into a single data frame.
- Pad ZIP codes and STATEFIPS codes with leading zeros for consistency.
- Optionally perform imputation for missing values.
Merge the preprocessed AHRQ data with patient data using crosswalk variables: STATEFIPS ZIPCODE YEAR