Skip to content

princyiakov/chronic_kidney_disease_progression

Repository files navigation

Chronic Kidney Disease Progression

chronic-kidney-disease-stages

Chronic kidney disease (CKD) is a condition in which your kidneys gradually lose their ability to help your body remove waste and fluid from your blood. When this happens, harmful wastes and fluids begin to build up in your body, making you feel unwell and out of balance. Although chronic kidney disease (CKD) is not curable, treatment can help slow its progression, control symptoms and enable you to live a full life.

Our Goal

Using Patient past records we will predict whether a patient will progress in CKD staging or not.

Notebook File

Here is the link to our notebook

Patient Demographics

We will be understanding the patients demographics based on Gender, Age, Race and Stage Progression

Gender

genderdistribution

Age

agedistribution

Race

racedistribution

Stage Progression

Overall Stage Progression

stageprogression

Stage Progression According To Gender

stageprogressiongender

Stage Progression According To Race

stageprogressionrace

How Were We Able To Predict The Progression

Approach I Machine Learning

Since I was trying Survival Analysis first time since college, I wanted to first check how the data reacts with the machine learning and explored various models.

Feature Engineering

  • For blood works and blood pressure:
    • For blood works like glucose, creatine, systolic and diastolic blood pressure, I took mean value of the observations and calculated the duration based on first and last day of observation
  • For medicines:
    • I grouped the patient ids and medicines and calculated the total dose by multiplying each medicines doses and their start and stop date difference and provided a summation of the doses based on each medicine

Feature Selection

I used ExtraTreesClassifier to find the features holding the maximum importance. featureimp

HeatMap Representation : heatmap

Handling Imbalanced Data

I used SMOTE to improve the imbalanced data

Model Evaluation and Selection

The following were the performance metrics of the models

------------Logistic Regression Model-----------------

Performance metrics Score
Accuracy 0.6222222222222222
Precision 0.47619047619047616
Recall 0.625
F1 Score: 0.5405405405405405

-------------------XGBoost Model-----------------------------

Performance metrics Score
Accuracy 0.7111111111111111
Precision 0.6
Recall 0.5625
F1 Score 0.5806451612903225

------------Random Forest Model-------------

Performance metrics Score
Accuracy 0.6555555555555556
Precision 0.5185185185185185
Recall 0.4375
F1 Score 0.47457627118644063

--------------------KNN Model-----------------------------

Performance metrics Score
Accuracy 0.6555555555555556
Precision 0.5135135135135135
Recall 0.59375
F1 Score 0.5507246376811593

Evaluation using Giskard

I wanted to see how my model works realtime by changing the data, so I used Giskard for inspection. Here is a speak peak of my exploration. giskardinspect

Do you want to check if Gender changes the prediction for whole dataset? Metamorphic tests to help us in the process

metamorphic

Here is a list of all tests

alltests

Approach II Survival Analysis

Gender

Using Kaplan Meier Estimator, we could see the survival curve based on gender gendersa

And the Log rank test with p-value 0.02 which is less than 0.05, helped me conclude that we can reject the null hypothesis and conclude that gender of a person plays a significant role in the progression of CKD

According to the curve male will progress in the staged of CKD more than females

Race

racesqa

In log rank test, the p-value of 0.05 helped us conclude that we can reject the null hypothesis and conclude that race of a person plays a significant role in the progression of CKD

According to the curve Hispanics progress in CKD stages more as compared to other races

Cox Regression helped me understand the relation with progression and other variables better

coxop

coxplot

Here are the conclusions of the readings
  • Being Hispanic increases your chances of progression in CKD stages are 734%
  • Males have 59% higher chances of progressing as compared to female
  • Glucose level plays an important role. Higher glucose level leads to 8% chances of progression
  • Higher creatine level leads to 7% chance of progressing in CKD stages
  • Higher SBP leads to 2% chances of progressing in CKD Stages
  • Higher HGB levels decreases the chances of progressing by 3%

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published