Skip to content

chrisjmccormick/LSA_Classification

Repository files navigation

This is a simple text classification example using Latent Semantic Analysis (LSA), written in Python and using the scikit-learn library.

This code goes along with an LSA tutorial blog post I wrote here.

Steps:

  1. [Optional]: Run getReutersTextArticles.py to download the Reuters dataset and extract the raw text. This step has already been performed for you, and the dataset is stored in the 'data' folder.
  2. Run runClassification_LSA.py to apply LSA to the dataset and then test classification accuracy.
  3. Run inspect_LSA.py to gain some insight into what LSA is doing.