Skip to content

difli/twitter-sentimentr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twitter-sentimentr

Sentiment analysis of twitter tweets powered by

alt text

twitter-reader:

twitter-router-function:

  • pulsar function
  • subscribes to pulsar topic from-twitterapi and does content based routing.
  • tweets in english language get published to to-en-sentimentr topic.
  • all other tweets get published to to-db topic.

tweet-db-sink:

  • pulsar sink
  • subscribes to pulsar topic to-db and streams/inserts all tweets into cassandra.

twitter-sentimentr-service-en:

  • subscribes to to-en-sentimentr topic.
  • does tweet sentiment analysis with CoreNLP library
  • publishes tweets to to-db topic

twitter-ui:

  • subscribes to to-db topic and sends englisch tweets in realtime via websocket to all connected browsers.
  • connects to apache cassandra to query tweets and sentiment over all languages
  • visualizes tweets and the calculated sentiment alt text

Prerequisites

Quickstart local environment setup

cd apache-cassandra-4.0.3
bin/cassandra -f
  • create keyspace and table
cqlsh
CREATE KEYSPACE twitter WITH replication = { 'class': 'NetworkTopologyStrategy', 'datacenter1': 1 };
CREATE TABLE twitter.tweet_by_lang (
    lang text,
    createdat text,
    id text,
    sentiment int,
    tweet text,
    PRIMARY KEY (lang, createdat, id)
) WITH CLUSTERING ORDER BY (createdat DESC, id DESC);
CREATE TABLE twitter.tweet_by_id (
    lang text,
    createdat text,
    id text,
    sentiment int,
    tweet text,
    PRIMARY KEY (id)
);
exit
cd apache-pulsar-2.8.2
bin/pulsar standalone
  • create topics
bin/pulsar-admin topics create persistent:https://public/default/from-twitterapi
bin/pulsar-admin topics create persistent:https://public/default/to-en-sentimentr
bin/pulsar-admin topics create persistent:https://public/default/to-db
  • list topics
bin/pulsar-admin topics list public/default
  • pull the application container images (important to start the apps before you continue with function and sink setup!)
docker pull dieterfl/twitter-sentimentr-en-service:latest
docker pull dieterfl/twitter-ui:latest
docker pull dieterfl/twitter-reader:latest
  • review env-file-docker and remove the '.TEMPLATE' extension - ensure spring profile 'default' is configured
  • start your applications
sh run-apps-in-docker.sh
  • once the applications are up and running try to connect your browser to twitter-ui via browser: https://localhost:8081
  • create a connectors folder in the pulsar base folder
mkdir connectors
cp PATH-TO-DOWNLOADED-CONNECTOR/cassandra-enhanced-pulsar-sink-1.5.0-nar.nar PULSAR-BASE-FOLDER/connectors
  • create the cassandra sink (adapt the command line properties)
bin/pulsar-admin sinks reload
bin/pulsar-admin sinks available-sinks
bin/pulsar-admin sinks create --name tweet-db-sink --sink-type cassandra-enhanced --inputs persistent:https://public/default/to-db --sink-config-file /Users/dieter.flick/Documents/development/workspaces/workspace-datastax/twitter-sentimentr/pulsar-config-files/tweet-db-sink.yml
bin/pulsar-admin sinks list
  • download tweet-router function or use the one you have build yourself
  • create the tweet-router function (adapt the command line properties)
bin/pulsar-admin functions create --jar /Users/dieter.flick/Documents/development/workspaces/workspace-datastax/twitter-sentimentr/twitter-router-function/target/twitter-router-0.0.1-SNAPSHOT.jar --function-config-file /Users/dieter.flick/Documents/development/workspaces/workspace-datastax/twitter-sentimentr/twitter-router-function/local-function-config.yaml
  • check the function status
bin/pulsar-admin functions getstatus --name tweet-router

Quickstart powered by astra

CREATE TABLE twitter.tweet_by_lang (
    lang text,
    createdat text,
    id text,
    sentiment int,
    tweet text,
    PRIMARY KEY (lang, createdat, id)
) WITH CLUSTERING ORDER BY (createdat DESC, id DESC);
CREATE TABLE twitter.tweet_by_id (
    lang text,
    createdat text,
    id text,
    sentiment int,
    tweet text,
    PRIMARY KEY (id)
);
  • download your database secure connect bundle and copy to your-input-files
  • create and download a 'database administrator' token csv file
  • fill in username, password, client id and client secret from the downloaded csv file in env-file-docker
  • create streaming / create tenant
  • Create topics from-twitterapi, to-en-sentimentr and to-db
  • fill in the full names of your topics in env-file-docker
  • get astra streaming connection details
  • click connect and get the Broker Service URL and fill in env-file-docker
  • create a token and fill in env-file-docker
  • pull the application container images (important to start the apps before you continue with function and sink setup!)
docker pull dieterfl/twitter-sentimentr-en-service:latest
docker pull dieterfl/twitter-ui:latest
docker pull dieterfl/twitter-reader:latest
  • review env-file-docker and remove the '.TEMPLATE' extension - ensure spring profile 'astra' is configured
  • start your applications
sh run-apps-in-docker.sh
  • once the applications are up and running try to connect your browser to twitter-ui via browser: https://localhost:8081
  • create a sink that ingests data in astra db tweet_by_lang table
namespace = default
sink type = astra db
name = db-sink-1
connect-topic=to-db
database=
keyspace=twitter
table=tweet_by_lang
token=Astra DB token
mapping=lang=value.lang,id=value.id,tweet=value.tweet,createdat=value.createdAt,sentiment=value.sentiment
  • Double check the mapping for createdat=value.createdAt
  • hit create
  • create another sink that ingests data in astra db tweet_by_id table
namespace = default
sink type = astra db
name = db-sink-2
connect-topic=to-db
database=
keyspace=twitter
table=tweet_by_id
token=Astra DB token
mapping=lang=value.lang,id=value.id,tweet=value.tweet,createdat=value.createdAt,sentiment=value.sentiment
  • Double check the mapping for createdat=value.createdAt
  • hit create
  • download tweet-router function or use the one you have build yourself
  • create tweet-router (function) for content based routing of tweets
name=tweet-router
namespace=default
upload twitter-router-0.0.1-SNAPSHOT.jar
choose function io/flickd/twitter/pulsar/functions/TweetRouter
choose input topic=from-twitterapi

hit create