Skip to content

robert6217/TKU-homework

Repository files navigation

TKU-homework: Text Analytics

Environment

  • System : Windows 10
  • Python : 3.6.0
  • Java jre : 1.8.0_161
  • Zookeeper : 3.4.11
  • Kafka : 2.11-1.1.0
  • Chromedriver

Flow

  1. Start Zookeeper and Kafka
  2. Create your topics
\kafka\bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic YOUR_TOPIC
  1. Write the Python web crawler with pykafka
  2. Build TF-IDF Model for each article
  3. pipe to the python text analytics

START

python newsCrawler.py p
python newsCrawler.py e
  • Analytics
python analytics.py p
python analytics.py e