Skip to content

tshi04/SeaNMF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeaNMF

This the implementation of the paper

  • Tian Shi, Kyeongpil Kang, Jaegul Choo and Chandan K. Reddy, "Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations", In Proceedings of the International Conference on World Wide Web (WWW), Lyon, France, April 2018. PDF

Requirements

  • Python 3.5.2
  • argparse

usage:

Data Process

  • Tokenize with NLTK, SpaCy or CoreNLP
  • Remove special characters.
  • Remove stop-words.
  • Edit the argument of data_process.py
  • Run python3 data_process.py to prepare the document-term matrix and vocabulary.

Train

  • Run python3 train.py --help to see the full list of options.

Evaluation

  • Run python3 vis_topic.py to calculate the PMI and visualize the top keywords in each topic.

About

Short Text Topic Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages