readme.md

Spam Text Classification Project => Full Code

The project was conducted on the KAGGLE platform.

Remove non-text objects such as emojis or numbers and dots.
Make words lowercase: The machine treats the same word with different case as different words.
Stopword Removal: Stopwords are words that do not affect the importance of text in text classification. (ex: the, we, a , will)
Stem: The Bag of Word model i will use in this project will be affected by more frequent occurrences of words. Several words with the same meaning (ex: runnable, running , is run) have been changed to the same.

Get all the words in all texts, count the number of occurrences of each word, and select a specific word (Cluster Word) that occurs most frequently.
Assuming that a total of 1000 cluster words are selected, the number of occurrences of these 1000 words becomes a feature of the classification problem.
Classification proceeds by learning the classifier with the extracted features.
Use CountVectorizer provided by Sklearn.