Natural Language Processing projects

Spam Detection

Vectorizing text messages and training a multinomial Naive Bayes and AdaBoost classifiers to identify text messages as either "spam" or "ham".

Sentiment analysis

Lemmatizing text and using BeautifulSoup to visualize the most common words used in positive and negative reviews. Training a Logistic Regression with a threshold of 0.5 to identify positive and negative reviews. Calculating and identifying the misclassified reviews.

Dimensionality reductions

Using PCA and LSA to reduce dimensions of tokenized text vectors. PCA is used to sort book title keywords into a two-dimensional vector, which shows the keywords having two main axis - o containing social/historical keywords; other - scientific and data-driven.

Article spinner

Modifying articles with randomly selected possible words based on the 2nd order Markov's assumption model. The text is imported from html, converted into tokens, and then - into a trigram dictionary, where every two surrounding words have the possible encountered middle words and their counts (later converted to probabilities) to appear in text.

Packages for Python used:

Pandas
Numpy
NLTK
matplotlib
sklearn
wordcloud
bs4 (BeautifulSoup)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
SpamData		SpamData
__pycache__		__pycache__
electronics		electronics
jpg		jpg
large_files		large_files
LatentSemanticAnalysis.py		LatentSemanticAnalysis.py
NaiveBayes.py		NaiveBayes.py
README.md		README.md
Spam_2.py		Spam_2.py
Spinner03.py		Spinner03.py
all_book_titles.txt		all_book_titles.txt
article_spinner.py		article_spinner.py
extra_reading.txt		extra_reading.txt
lsa.py		lsa.py
nb.py		nb.py
nltk_explore.py		nltk_explore.py
sentiment.py		sentiment.py
spam2.py		spam2.py
spambase.data		spambase.data
spinner01.py		spinner01.py
spinner02.py		spinner02.py
spinner_function_to_prob_dict.py		spinner_function_to_prob_dict.py
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing projects

Spam Detection

Sentiment analysis

Dimensionality reductions

Article spinner

Packages for Python used:

About

Releases

Packages

Languages

cliptic/nlp

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing projects

Spam Detection

Sentiment analysis

Dimensionality reductions

Article spinner

Packages for Python used:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages