Sentiment analysis of twitter tweets powered by
- apache cassandra and apache pulsar local or by
- Astra DB and Astra Streaming and
- spring boot based applications
- consumes a filtered stream of tweets from twitter api
- publishes the tweets to pulsar topic from-twitterapi
- pulsar function
- subscribes to pulsar topic from-twitterapi and does content based routing.
- tweets in english language get published to to-en-sentimentr topic.
- all other tweets get published to to-db topic.
- pulsar sink
- subscribes to pulsar topic to-db and streams/inserts all tweets into cassandra.
twitter-sentimentr-service-en:
- subscribes to to-en-sentimentr topic.
- does tweet sentiment analysis with CoreNLP library
- publishes tweets to to-db topic
- subscribes to to-db topic and sends englisch tweets in realtime via websocket to all connected browsers.
- connects to apache cassandra to query tweets and sentiment over all languages
- visualizes tweets and the calculated sentiment
- please, create a twitter developer account in order to edit your credentials for accessing the twitter api: https://developer.twitter.com/en/apps. remove extension '.TEMPLATE' from the file
- you can adapt twitter filtered stream rule. Define it here. This to define a pattern for tweets to collect from twitter.
- download apache cassandra
- install cassandra
- start cassandra
cd apache-cassandra-4.0.3
bin/cassandra -f
- create keyspace and table
cqlsh
CREATE KEYSPACE twitter WITH replication = { 'class': 'NetworkTopologyStrategy', 'datacenter1': 1 };
CREATE TABLE twitter.tweet_by_lang (
lang text,
createdat text,
id text,
sentiment int,
tweet text,
PRIMARY KEY (lang, createdat, id)
) WITH CLUSTERING ORDER BY (createdat DESC, id DESC);
CREATE TABLE twitter.tweet_by_id (
lang text,
createdat text,
id text,
sentiment int,
tweet text,
PRIMARY KEY (id)
);
exit
- download apache pulsar
- install pulsar
- start pulsar
cd apache-pulsar-2.8.2
bin/pulsar standalone
- create topics
bin/pulsar-admin topics create persistent:https://public/default/from-twitterapi
bin/pulsar-admin topics create persistent:https://public/default/to-en-sentimentr
bin/pulsar-admin topics create persistent:https://public/default/to-db
- list topics
bin/pulsar-admin topics list public/default
- pull the application container images (important to start the apps before you continue with function and sink setup!)
docker pull dieterfl/twitter-sentimentr-en-service:latest
docker pull dieterfl/twitter-ui:latest
docker pull dieterfl/twitter-reader:latest
- review env-file-docker and remove the '.TEMPLATE' extension - ensure spring profile 'default' is configured
- start your applications
sh run-apps-in-docker.sh
- once the applications are up and running try to connect your browser to twitter-ui via browser: https://localhost:8081
- create a connectors folder in the pulsar base folder
mkdir connectors
- download DataStax Apache Pulsar Connector: https://downloads.datastax.com/#apc
- copy the connector into the just created connectors folder
cp PATH-TO-DOWNLOADED-CONNECTOR/cassandra-enhanced-pulsar-sink-1.5.0-nar.nar PULSAR-BASE-FOLDER/connectors
- create the cassandra sink (adapt the command line properties)
bin/pulsar-admin sinks reload
bin/pulsar-admin sinks available-sinks
bin/pulsar-admin sinks create --name tweet-db-sink --sink-type cassandra-enhanced --inputs persistent:https://public/default/to-db --sink-config-file /Users/dieter.flick/Documents/development/workspaces/workspace-datastax/twitter-sentimentr/pulsar-config-files/tweet-db-sink.yml
bin/pulsar-admin sinks list
- download tweet-router function or use the one you have build yourself
- create the tweet-router function (adapt the command line properties)
bin/pulsar-admin functions create --jar /Users/dieter.flick/Documents/development/workspaces/workspace-datastax/twitter-sentimentr/twitter-router-function/target/twitter-router-0.0.1-SNAPSHOT.jar --function-config-file /Users/dieter.flick/Documents/development/workspaces/workspace-datastax/twitter-sentimentr/twitter-router-function/local-function-config.yaml
- check the function status
bin/pulsar-admin functions getstatus --name tweet-router
- Done !!!
- You should now see tweets appearing https://localhost:8081
- create your astra account: https://astra.datastax.com/
- create database with keyspace 'twitter'
- use cql concole and create tables
CREATE TABLE twitter.tweet_by_lang (
lang text,
createdat text,
id text,
sentiment int,
tweet text,
PRIMARY KEY (lang, createdat, id)
) WITH CLUSTERING ORDER BY (createdat DESC, id DESC);
CREATE TABLE twitter.tweet_by_id (
lang text,
createdat text,
id text,
sentiment int,
tweet text,
PRIMARY KEY (id)
);
- download your database secure connect bundle and copy to your-input-files
- create and download a 'database administrator' token csv file
- fill in username, password, client id and client secret from the downloaded csv file in env-file-docker
- create streaming / create tenant
- Create topics from-twitterapi, to-en-sentimentr and to-db
- fill in the full names of your topics in env-file-docker
- get astra streaming connection details
- click connect and get the Broker Service URL and fill in env-file-docker
- create a token and fill in env-file-docker
- pull the application container images (important to start the apps before you continue with function and sink setup!)
docker pull dieterfl/twitter-sentimentr-en-service:latest
docker pull dieterfl/twitter-ui:latest
docker pull dieterfl/twitter-reader:latest
- review env-file-docker and remove the '.TEMPLATE' extension - ensure spring profile 'astra' is configured
- start your applications
sh run-apps-in-docker.sh
- once the applications are up and running try to connect your browser to twitter-ui via browser: https://localhost:8081
- create a sink that ingests data in astra db tweet_by_lang table
namespace = default
sink type = astra db
name = db-sink-1
connect-topic=to-db
database=
keyspace=twitter
table=tweet_by_lang
token=Astra DB token
mapping=lang=value.lang,id=value.id,tweet=value.tweet,createdat=value.createdAt,sentiment=value.sentiment
- Double check the mapping for createdat=value.createdAt
- hit create
- create another sink that ingests data in astra db tweet_by_id table
namespace = default
sink type = astra db
name = db-sink-2
connect-topic=to-db
database=
keyspace=twitter
table=tweet_by_id
token=Astra DB token
mapping=lang=value.lang,id=value.id,tweet=value.tweet,createdat=value.createdAt,sentiment=value.sentiment
- Double check the mapping for createdat=value.createdAt
- hit create
- download tweet-router function or use the one you have build yourself
- create tweet-router (function) for content based routing of tweets
name=tweet-router
namespace=default
upload twitter-router-0.0.1-SNAPSHOT.jar
choose function io/flickd/twitter/pulsar/functions/TweetRouter
choose input topic=from-twitterapi
hit create
- Done !!!
- You should now see tweets appearing https://localhost:8081