Full explanation how to stream twitter data by keyword, using kafka and python on Windows
- Download Java Development Kit 8 from Oracle Website, and install it
- Download Apache Kafka from Apache Kafka Website, and set it up at
C:/
directory - Edit environment variable, and add
C:\<Your Kafka Version>\bin\windows
- Add 2 folder named Kafka and Zookeper at
C:\<Your Kafka Version\data\kafka
and atC:\<Your Kafka Version\data\zookeeper
- Edit the Zookeper Configs File at
C:\<Your Kafka Version>\config\zookeeper.properties
and search fordataDir
and edit it toC:/kafka_2.13-2.6.0/data/zookeeper
- Edit the Server Properties File at
C:\kafka_2.13-2.6.0\config\server.properties
and search forlog.dirs
and change it tolog.dirs=C:/<Your kafka Version>/data/kafka
2. Apply the Twitter Developers Account, to get the API Keys
3. Download XAMPP or MySQL Workbench to view the database
- Open cmd at the directory instalation of zookeper and input
zookeeper-server-start.bat config\zookeeper.properties
- Open another cmd & run Kafka by typing
kafka-server-start.bat config\server.properties
- Create kafka topics by typing
kafka-topics.bat --zookeeper localhost:2181 --create –topic <topic_name> --partitions <numbers_of_partition> --replication-factor 3
- Run
python producers.py
- Run
python consumers.py
- Check MySql database, if enough data is collected, dump the etl by running
python dump.py