Sketchy Data Pipelines

Requirements:

Python 3.x
Kafka 2.1.0
Elasticsearch 6.7.0
Flask
Client APIs and packages pip install -r requirements.txt

Setup:

Start Zookeeper

zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties

Start Kafka

kafka-server-start /usr/local/etc/kafka/server.properties

Create topics

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitter2kafka
kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafka2sketch

Start Elasticsearch
```
elasticsearch
```

Create Elasticsearch index

curl -XPUT "https://localhost:9200/tweets"

Navigate to the 'code' directory and start the services:

python streaming/twitter_to_kafka.py
python streaming/kafka_to_elastic.py
python app/application.py

Initialize the stateful count-min sketch:

curl -v https://127.0.0.1:5000/initialize

Call API endpoints listed as routes in app/application.py to get the data

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
sketch		sketch
streaming		streaming
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sketchy Data Pipelines

About

Languages

jaymindesai/sketchy-data-pipelines

Folders and files

Latest commit

History

Repository files navigation

Sketchy Data Pipelines

About

Topics

Resources

Stars

Watchers

Forks

Languages