- Data Source and Description: "https://www.kaggle.com/datasets/mathchi/diabetes-data-set"
- Pregnancies: Number of times pregnant
- Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Blood Pressure: Diastolic blood pressure (mm Hg)
- Insulin: 2-hour serum insulin (mu U/ml)
- BMI: Body mass index (weight in kg/(height in m)^2)
- Diabetes Pedigree Function: diabetes pedigree function
- Age: age (years)
- Outcome: 1 = positive, 0 = negative
The outcome variable has 500 negative and 268 positive. In the machine learning models used in this project, counter measures such as SMOTE were used in the k nearest neighbor model, random forest model, elastic net model and gradient boosting method. And class weight assigning method was used in the ANN model.
Recall | Precision | F1 | Mean AUC | Max AUC | |
KNN | 0.7358 | 0.600 | 0.661 | 0.7895 | 0.8716 |
RF | 0.6038 | 0.6809 | 0.640 | 0.8131 | 0.8940 |
ENet | 0.6604 | 0.6034 | 0.6305 | 0.8332 | 0.9045 |
GBM | 0.7170 | 0.6441 | 0.6786 | 0.8358 | 0.9022 |
ANN | 0.5849 | 0.6458 | 0.6139 | 0.8257 |