News Articles Text Classification and Clustering

In this repository we perform Text Classification and Clustering experiments. Also, generating Word Clouds for each article category.

The input consists of 2225 documents from a news site that corresponds to stories in five local areas from 2004-2005.

Document Categories

Business
Entertainment
Politics
Sport
Tech

First line of each document is the title and the rest is the content of the article.

The whole procedure consists of:

Create a data set of all documents
Text pre-processing
1. Remove special characters, lower case
2. Remove Stopwords
3. Lemmatization
4. Stemming
5. Tokenization
Generate Word Clouds
Vectorization
Classification and Clustering

I also implemented a KNN Classifier using max heap, but it was too slow for this data set.

Word Clouds

Business

Entertainment

Politics

Sport

Tech

Classification

Classifier: MultinomialNB, SVM, RF, KNN

Vectorization: Bag Of Words, Tf-idf

Dimensionality Reduction: PCA, SVD and ICA

Roc curves

Clustering

Clusterer: Kmeans

Vectorization: Bag Of Words, Tf-idf, Word2vec

Dimensionality Reduction: PCA, SVD and ICA

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data_sets		data_sets
news_articles_documents		news_articles_documents
notebooks		notebooks
word_cloud_masks		word_cloud_masks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Articles Text Classification and Clustering

Word Clouds

Business

Entertainment

Politics

Sport

Tech

Classification

Roc curves

Clustering

PCA

SVD

ICA

About

Releases

Packages

Languages

123alikaha/news_articles_text_mining

Folders and files

Latest commit

History

Repository files navigation

News Articles Text Classification and Clustering

Word Clouds

Business

Entertainment

Politics

Sport

Tech

Classification

Roc curves

Clustering

PCA

SVD

ICA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages