Skip to content

Latest commit

 

History

History
9 lines (6 loc) · 658 Bytes

README.md

File metadata and controls

9 lines (6 loc) · 658 Bytes

This is a simple text classification example using Latent Semantic Analysis (LSA), written in Python and using the scikit-learn library.

This code goes along with an LSA tutorial blog post I wrote here.

Steps:

  1. [Optional]: Run getReutersTextArticles.py to download the Reuters dataset and extract the raw text. This step has already been performed for you, and the dataset is stored in the 'data' folder.
  2. Run runClassification_LSA.py to apply LSA to the dataset and then test classification accuracy.
  3. Run inspect_LSA.py to gain some insight into what LSA is doing.