Skip to content

This web app is powered with ML, NLP which identifies whether the message is a spam or not.

License

Notifications You must be signed in to change notification settings

imAniketSharma/Text-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam-Classifier-System

Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically classify such mail as 'Junk Mail'.

Here we will be using the Naive Bayes algorithm to create a model that can classify SMS messages as spam or not spam, based on the training we give to the model.

Being able to identify spam messages is a binary classification problem as messages are classified as either 'Spam' or 'Not Spam' and nothing else. Also, this is a supervised learning problem, as we will be feeding a labelled dataset into the model, that it can learn from, to make future predictions.

Overview

This project has been broken down in to the following steps: Step 1.1: Understanding our dataset

Step 1.2: Data Preprocessing

Step 2.1: Bag of Words (BoW)

Step 2.2: Implementing BoW from scratch

Step 2.3: Implementing Bag of Words in scikit-learn

Step 3.1: Training and testing sets

Step 3.2: Applying Bag of Words processing to our dataset.

Step 4.1: Bayes Theorem implementation from scratch

Step 4.2: Naive Bayes implementation from scratch

Step 5: Naive Bayes implementation using scikit-learn

Step 6: Evaluating our model

Step 7: Conclusion