Email Classifier

Email is an efficient and widely used communication method between professionals. However, unsolicited bulk emails, known as spam, clutter inboxes and cause lost productivity. This project builds a classifier to identify and filter spam emails.

Problem and Key Features

Spam email filtering is a common machine learning task. Classification is challenging due to the high dimension feature space of text data and large volume of documents. This project implements a robust voting ensemble classifier and topic modeling to categorize emails.

Key features:

Data preprocessing including tokenization, stopword removal
Hyperparameter tuning using RandomizedSearchCV
VotingClassifier ensemble of Naive Bayes, SVM, Neural Net models
Latent Dirichlet Allocation (LDA) topic modeling to categorize non-spam
Evaluation metrics such as confusion matrix and classification report

Usage

The implementation is an easy to run Jupyter notebook:

Download notebook and encoded email dataset
Ensure key libraries (NumPy, SciKit-Learn) are installed
Run notebook cells end-to-end to preprocess, train, and evaluate model

Performance

The current model achieves 97% accuracy on the test set. This could likely be improved further by tuning model hyperparameters and the number of topics for LDA.

Credits

This project was made with Niharika Bhende, Riddhi Dumre and Vishesh Giyanani

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
email-classifier.ipynb		email-classifier.ipynb
model.py		model.py
requirements.txt		requirements.txt
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl
votingClassifier.pkl		votingClassifier.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Classifier

Problem and Key Features

Usage

Performance

Credits

About

Releases

Packages

Languages

NilayGaitonde/Email-Classifier

Folders and files

Latest commit

History

Repository files navigation

Email Classifier

Problem and Key Features

Usage

Performance

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages