Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
imamitsingh committed Dec 18, 2021
1 parent d7658ad commit 778892b
Showing 1 changed file with 13 additions and 13 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,14 @@
# Telegram Sentiment Analysis

### Sentiment Analysis Approach

1. Flair is a pre-trained embedding-based model. Words with vector representations most similar to another word are often used in the same context. This allows us, to, therefore, determine the sentiment of any given vector, and therefore, any given sentence.

2. Flair uses a pre-trained NLP model instead of a rule-based model to predict the sentiment (positive, negative) of a given sentence.

3. Flair tends to be much slower than its rule-based counterparts like NLTK(VADER) and TextBlob.

Objective
### Objective
Perform sentiment analysis on telegram chat data.

Data Collection
### Data Collection
1. Exported chat messages from the official telegram group of Crypto.com (https://t.me/CryptoComOfficial) from May 1, 2021 to May 15, 2021.
2. The chat message data is in JSON format.
3. result.json contains the chat data. File size is 11.3 MB.

Data Preprocessing
### Data Preprocessing
1. Loading data into a dataframe.
2. Only 2 features are relevant here to solve our problem in hand. Hence, filter Date and Text features from the dataframe.
3. Remove non-english messages from the text data.
Expand All @@ -27,12 +19,20 @@ Data Preprocessing
3. Remove Stopwords.
4. Decontraction (won't -> will not, can't to cannot).

Sentiment Analysis
### Sentiment Analysis
1. Compute sentiment of each text message using Flair library.
2. Flair library provides a sentiment score and sentiment value corresponding to each text message.
For instance, for the following sentence 'The food was great!'. Flair library outputs [POSITIVE (0.9961)].

Visualization
### Sentiment Analysis Approach

1. Flair is a pre-trained embedding-based model. Words with vector representations most similar to another word are often used in the same context. This allows us, to, therefore, determine the sentiment of any given vector, and therefore, any given sentence.

2. Flair uses a pre-trained NLP model instead of a rule-based model to predict the sentiment (positive, negative) of a given sentence.

3. Flair tends to be much slower than its rule-based counterparts like NLTK(VADER) and TextBlob.

### Visualization
1. Plot #1: Number of messages per day
2. Plot #2: Average sentiment per day

Expand Down

0 comments on commit 778892b

Please sign in to comment.