Skip to content

Detecting Healthcare Fraud by aggregating multiple machine learning models, cleaning data using the advice of industry experts, engineering fraud-relevant features, and employing rebalancing and HyperTransformation Tuning approaches

Notifications You must be signed in to change notification settings

datatodavid/FraudDetection

Repository files navigation

Machine Learning & Detecting Fraudulent Healthcare Providers


Project Goals

The goal of this project is to analyze and predict the fraudulence of healthcare providers in the well-known Kaggle data set, linked above.

Classification analysis using this methodology can be particularly of use to Health Insurance companies as well as to public health advocates. With the proper approach, an assessor can ascertain not only which providers are engaged in fraudulent activities, but also avoid erroneously classifying companies as fraudulent, thus - in the case of insurance companies - saving a great deal of money in the process.

In order to achieve this goal, we will dig deeply into the data and apply a variety of Machine Learning Classification techniques, including such classic models as Logistic Regression and Support Vector Classification as well as involving more modern models, such as CatBoost or LightGBM.


What's in this Repo?

Here you will find 3 notebooks of particular note: I. Data Exploration, II. Data Preparation, and III. Machine Learning Processing. Considering the length of the project, we found it most expedient to separate the three approaches into separate notebooks for easier viewing.

Over the course of the project, we incorporated a variety of standard tools and techniques including Pandas, Numpy, Seaborne, and Matplotlib. Of further note are SKLearn's StandardScaler, PCA, LogisticRegression, KNeighborsClassifier, LinearDiscriminantAnalysis, GaussianNaiveBayes, and GridSearchCV. We also used SVM, CatBoostClassifier, LGBMClassifier. The very end of our project culminated with successful implementation of stacking techniques.


For More

For further discussion of the project, its process, and the full analysis, please consult the blog which - at the time of this writing - is yet forthcoming. Should you have any other questions, please feel free to reach out to either of us.

About

Detecting Healthcare Fraud by aggregating multiple machine learning models, cleaning data using the advice of industry experts, engineering fraud-relevant features, and employing rebalancing and HyperTransformation Tuning approaches

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published