SpamSlayer 💾 :

Problem Statement 💼:

The overwhelming influx of spam messages in email and SMS inboxes significantly hinders communication and productivity. These unsolicited messages often advertise unwanted products, spread misinformation, or attempt phishing scams.

This project aims to develop a robust classifier system to distinguish between legitimate messages (ham) and spam using Natural Language Processing (NLP) and machine learning algorithms. Naive Bayes, a popular probabilistic classifier, will be employed to analyze message content and identify spam indicators based on word frequency and patterns. Additionally, a voting classifier will be implemented to combine the predictions from multiple machine learning models, potentially including Naive Bayes, for enhanced spam detection accuracy.

By leveraging NLP techniques and machine learning algorithms, this classifier will effectively categorize incoming messages, ensuring a cleaner inbox and protecting users from malicious content.

Data Dictionary 📄✏ :

The DataSet is taken from the DataSet Link.

`Content:`

The files contain one message per line. Each line is composed by two columns: v1 contains the label (ham or spam) and v2 contains the raw text. This corpus has been collected from free or free for research sources at the Internet:

A collection of 425 SMS spam messages was manually extracted from the Grumbletext Web site. This is a UK forum in which cell phone users make public claims about SMS spam messages, most of them without reporting the very spam message received. The identification of the text of spam messages in the claims is a very hard and time-consuming task, and it involved carefully scanning hundreds of web pages. The Grumbletext Web site is: [Web Link].
A subset of 3,375 SMS randomly chosen ham messages of the NUS SMS Corpus (NSC), which is a dataset of about 10,000 legitimate messages collected for research at the Department of Computer Science at the National University of Singapore. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. The NUS SMS Corpus is avalaible at: [Web Link].
A list of 450 SMS ham messages collected from Caroline Tag's PhD Thesis available at [Web Link].
Finally, we have incorporated the SMS Spam Corpus v.0.1 Big. It has 1,002 SMS ham messages and 322 spam messages and it is public available at: [Web Link]. This corpus has been used in the following academic researches:

Requirements💻 :

Ensure you have the following dependencies installed:

Python (version 3.12)
Jupyter Notebook || PyCharm
Other dependencies (refer to the requirements.txt)

You can install the required Python packages using:

pip install -r requirements.txt

Setup 💿:

Clone the repository:

git clone https://github.com/SINGHxTUSHAR/SpamSlayer.git
cd SpamSlayer

Create a virtual environment (optional but recommended):

python -m venv venv

Activate the virtual environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```

Contributing 📌:

If you'd like to contribute to this project, please follow the standard GitHub fork and pull request process. Contributions, issues, and feature requests are welcome!

Suggestion🚀:

If you have any suggestions for me related to this project, feel free to contact me at [email protected] or LinkedIn.

License 📝:

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
DataSet		DataSet
IMG		IMG
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
model.pkl		model.pkl
model2.pkl		model2.pkl
nltk.txt		nltk.txt
procfile		procfile
requirements.txt		requirements.txt
setup.sh		setup.sh
spam-classifier.ipynb		spam-classifier.ipynb
vectorizer.pkl		vectorizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpamSlayer 💾 :

Problem Statement 💼:

Data Dictionary 📄✏ :

`Content:`

Requirements💻 :

Setup 💿:

Contributing 📌:

Suggestion🚀:

License 📝:

About

Releases

Packages

Languages

License

SINGHxTUSHAR/SpamSlayer

Folders and files

Latest commit

History

Repository files navigation

SpamSlayer 💾 :

Problem Statement 💼:

Data Dictionary 📄✏ :

Content:

Requirements💻 :

Setup 💿:

Contributing 📌:

Suggestion🚀:

License 📝:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Content:`

Packages