Skip to content

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants (Sakho, Malherbe and Scornet; 2024)

License

Notifications You must be signed in to change notification settings

artefactory/smote_strategies_study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants.

Repository for Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants paper.

In praticular, you will find code to reproduce the paper experiments as well as an nice implementation of our new and efficient strategy for your projects.

⭐ Table of Contents

⭐ Getting Started

If you want to reproduce our paper experiments:

  • the notebooks here and here reproduce the experiments
  • thise code contains implementation the protocols used for the numerical experiments of our article.

In order to use our MGS strategy:

  • this notebook illustrates how to use it
  • the strategy is implemented here

⭐ Data sets

The data sets of used for our article should be dowloaded inside the data/externals folder. The data sets are available at the followings adresses :

Table 2 from the paper :

Strategy None CW RUS ROS NM1 BS1 BS2 SMOTE CV SMOTE MGS (d+1)
CreditCard (0.2%) $0.966$ $0.967$ 0.970 $0.935$ $0.892$ $0.949$ $0.944$ $0.947$ $0.954$ $0.952$
Abalone (1%) $0.764$ $0.748$ $0.735$ $0.722$ $0.656$ $0.744$ $0.753$ $0.741$ $0.791$ 0.802
Phoneme (1%) $0.897$ $0.868$ $0.868$ $0.858$ $0.698$ $0.867$ $0.869$ $0.888$ 0.924 $0.915$
Yeast (1%) $0.925$ $0.920$ $0.938$ $0.908$ $0.716$ $0.949$ $0.954$ 0.955 $0.942$ $0.945$
Wine (4%) $0.928$ $0.925$ $0.915$ $0.924$ $0.682$ $0.933$ $0.927$ $0.934$ $0.938$ 0.941
Pima (20%) $0.798$ 0.808 $0.799$ $0.790$ $0.777$ $0.793$ $0.788$ $0.789$ $0.787$ $0.787$
Haberman (10%) $0.708$ $0.709$ $0.720$ $0.704$ $0.697$ $0.723$ $0.721$ $0.719$ $0.742$ 0.744
MagicTel (20%) $0.917$ $0.921$ $0.917$ 0.922 $0.649$ $0.920$ $0.905$ $0.921$ $0.919$ $0.913$
California (1%) $0.887$ $0.877$ $0.880$ $0.883$ $0.630$ $0.885$ $0.874$ $0.906$ $0.916$ 0.923

⭐ Acknowledgements

This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.

Artefact LPSM

If you find the code usefull, please consider citing us :

@article{sakho2024theoretical,
  title={Theoretical and experimental study of SMOTE: limitations and comparisons of rebalancing strategies},
  author={Sakho, Abdoulaye and Scornet, Erwan and Malherbe, Emmanuel},
  journal={arXiv preprint arXiv:2402.03819},
  year={2024}
}

About

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants (Sakho, Malherbe and Scornet; 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages