Medically-informed data preprocessing for machine learning
Documentation | Build Status |
---|---|
The purpose of PreprocessMD.jl
is to provide a suite of functions for preprocesing biomedical data.
The scope of this package is medical data preprocessing, so
we develop functions that are specific to biomedical research but general enough for widespread use.
These tools are developed for the OMOP Common Data Model1,
especially the MIMIC-IV demo set2.
Following the definitions of Hu et al.3, we consider data preprocessing to include project-level data manipulations, as opposed to the upstream data cleaning (e.g., error-corrections and standardizations) that is typically performed over an entire database, and the downstream data preparing (e.g., labelling and classification), which might vary across any number of analyses within a project.
An example pipeline is available in the documentation.
Planned features for PreprocessMD.jl include:
- Summaries and feasibility checks
- Feature extraction
- Variable derivation
- Data imputation
- Dimension reduction
Footnotes
-
Wu, Hulin, Jose Miguel Yamal, Ashraf Yaseen, and Vahed Maroufy, eds. Statistics and Machine Learning Methods for EHR Data: From Data Extraction to Data Analytics. CRC Press, 2020. ↩