Dementia

Table of Content

Repository Structure
Datasets
Codebooks
Imputation and Scaling
Colro Code
Run the code properly
Advanced ML Evaluation Metrics
Requirements

This is the repository for the dementia project

Repository Structure

Input Folder: Folder Containing main input datasets used for processing
Code Folder: Folder containing main codes
Distribution Folder: Folder containing distribution plots of all features
visualization_tools Folder: Folder for Google Facets & what-if tool

Datasets:

Copy of Dementia_baseline_questionnaire_V1.xlsx: Beqaa Questionnaire
pooled_data.xlsx: pooled data - xlsx format
pooled_data.csv: pooled data - csv format
pooled_new.csv: pooled data after replacing the erroneous values in ANIMALS_2 with 1s

Codebooks

numeric.xlsx: numeric features and their statistics
textual.xlsx: textual features and their statistics
missing_40_codebook.csv: Code-book for the features having greater than 40% missing values.
missing_codebook_lessthan_40.csv Code-book for the features having less than 40% missing values.
num_rows_missing_codebook.csv: Code-book for the number of rows with $i$ missing entries, as indicated below:
erroneous_codebook.csv: Code-book for the erroneous values, their percentage, and the cut-off based on which we defined what values are erroneous, for each feature
jump_informant_columns.csv: Code-book for all columns that require jumps to other columns along with the informant ones. The columns are sorted by increasing order of missing values. For each column we indicate whether it is INFORMANT or not. A description is also included for these columns.

Please write a loop that goes over all the rows, and produces the following: for $i= 1$ to $n$ where $n$ is the total number of columns, produce x_i, the number of rows with $i$ missing entries.

Imputation and Scaling

Imputed missing values of all numeric/ordinal features using KNN
Imputed missing values of all categorical features by replacing all nans with a new category.

Color-Code

The yellow color stands for carer, the red color stands for patient

Advanced ML Evaluation Metrics

Quantifying 'risk' as being the probability of the positive class
Mean Empirical Risk Curves
Precision/Recall Curves at Top K
Probability of Mistake per model per Frequent Pattern Using FP-growth technique
- Use percentiles(distribution) to "itemize"/"categorize" values inside column
Jaccard Similarity curves between model pairs
All models used here are shallow models - Could incorporate Deep Learning Models soon.
SMOTE used to oversample training data before testing (DO NOT APPLY SMOTE ON THE WHOLE DATA THEN DO A TRAIN-TEST SPLIY)

Run the Code properly

cd Code

# creating meta data first
python create_features_meta.py
python create_features_with_categories.py

# then, creating numeric & textual
python create_numeric.py
python create_textual.py

# produce code-books
python create_codebooks.py

Requirements

XlsxWriter 1.2.8
numpy 1.18.5
tensorflow 1.12.0
seaborn 0.9.0
xgboost 1.0.2
matplotlib 3.0.3
scipy 1.4.1
pandas 0.24.0
scikit_learn 0.23.2

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
Advanced-ML-Eval		Advanced-ML-Eval
Code		Code
Reptile_orig_data		Reptile_orig_data
Reptile_validation_data		Reptile_validation_data
Reptile_validation_data_filtered		Reptile_validation_data_filtered
advanced_evaluation		advanced_evaluation
input		input
input2paper		input2paper
maml_dementia_nodataleakage		maml_dementia_nodataleakage
original_datasets		original_datasets
output		output
report		report
.DS_Store		.DS_Store
AdvancedEvaluation.py		AdvancedEvaluation.py
AdvancedEvaluationDL.py		AdvancedEvaluationDL.py
AdvancedEvaluationUpdatedDementia.py		AdvancedEvaluationUpdatedDementia.py
README.md		README.md
all_results_original.csv		all_results_original.csv
all_results_original_with_model_name.csv		all_results_original_with_model_name.csv
all_results_original_with_model_validation.csv		all_results_original_with_model_validation.csv
all_results_validation.csv		all_results_validation.csv
compile_all_results.py		compile_all_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dementia

Table of Content

Repository Structure

Datasets:

Codebooks

Imputation and Scaling

Color-Code

Advanced ML Evaluation Metrics

Run the Code properly

Requirements

About

Releases

Packages

Contributors 2

Languages

hiyamgh/dementia

Folders and files

Latest commit

History

Repository files navigation

Dementia

Table of Content

Repository Structure

Datasets:

Codebooks

Imputation and Scaling

Color-Code

Advanced ML Evaluation Metrics

Run the Code properly

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages