Skip to content

Greater Plains Collaborative Reusable Unified Study Environment (GROUSE)

License

Notifications You must be signed in to change notification settings

kumc-bmi/grouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Greater Plains Collaborative Reusable Unified Study Environment (GROUSE)

The Greater Plains Collaborative (GPC), led by KUMC Medical Informatics, is a PCORI Clinical Data Research Network (CDRN). As explained in the GROUSE executive summary:

... we seek to expand the data completeness of our patients’ health care processes and outcomes and understand the information gain for our complete and comprehensive population by comparing correlations between the Medicare and Medicaid claims data with the data in our CDRN that includes the electronic health record and billing data from each of our component health systems, clinical registry data (e.g. hospital tumor registries), private payer claims data, and patient-reported outcomes, as available, and work with our GPC and CDRN investigators to answer specific cohort questions to achieve our overarching aims.

This software supports the project as follows:

  1. GPC sites will send the NewWave-GDIT (Research Data Distribution Center under contract with CMS) encrypted beneficiary identifiers (e.g., SSN, HICs, name, DOB) along with a randomly generated ID for each of their patients.
  2. NewWave-GDIT will generate the finder files that contain mappings between the random IDs and CMS IDs and send the crosswalk files and CMS data back to KUMC.
    • grouse_tables.csv summarizes the CMS data
    • staging supports decrypting/loading the CMS data into Oracle from the encrypted archives.
  3. KUMC will integrate the crosswalk and CMS data with individual site EMR data (limited data set) through linking the random IDs. This will allow KUMC to achieve record linkage of patients’s EMR and claims data without obtaining or retaining actual patient level PII from individual GPC sites.
    • deid supports creating a de-identified copy of the CMS tables.
    • site_integration supports integrating site EMR data
  4. The merged dataset will be de-identified and made available to project team members for running analyses or using the i2b2 client.
    • etl_i2b2 supports building an i2b2 star schema from (de-identified) CMS data.

GROUSE record linkage diagram

[//]: # TODO: add obfuscate.py, which supports step 1.

Platform

The platform used at the University of Kansas Medical Center Division of Medical Informatics includes: