Skip to content

Commit

Permalink
Cleaned repo
Browse files Browse the repository at this point in the history
  • Loading branch information
abdulfatir committed Sep 21, 2018
1 parent dbf69ad commit cfc2205
Show file tree
Hide file tree
Showing 19 changed files with 7 additions and 5 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Sentiment Analysis on Tweets

**Update**(21 Sept. 2018): I don't actively maintain this repository. This work was done for a course project and the dataset cannot be released because I don't own the copyright. However, everything in this repository can be easily modified to work with other datasets. I recommend reading the [sloppily written project report](https://github.com/abdulfatir/twitter-sentiment-analysis/tree/master/docs/report.pdf) for this project which can be found in `docs/`.

## Dataset Information

We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type `tweet_id,sentiment,tweet` where the `tweet_id` is a unique integer identifying the tweet, `sentiment` is either `1` (positive) or `0` (negative), and `tweet` is the tweet enclosed in `""`. Similarly, the test dataset is a csv file of type `tweet_id,tweet`. Please note that csv headers are not expected and should be removed from the training and test datasets.
Expand All @@ -16,7 +18,7 @@ The library requirements specific to some methods are:
* `keras` with `TensorFlow` backend for Logistic Regression, MLP, RNN (LSTM), and CNN.
* `xgboost` for XGBoost.

**Note**: It is recommended to use Anaconda distribution of Python. The [report](https://github.com/abdulfatir/twitter-sentiment-analysis/tree/master/docs/report.pdf) for this project can be found in `docs/`.
**Note**: It is recommended to use Anaconda distribution of Python.

## Usage

Expand Down
8 changes: 4 additions & 4 deletions baseline.py → code/baseline.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

# Classifies a tweet based on the number of positive and negative words in it

TRAIN_PROCESSED_FILE = '../train-processed.csv'
TEST_PROCESSED_FILE = '../test-processed.csv'
POSITIVE_WORDS_FILE = 'dataset/positive-words.txt'
NEGATIVE_WORDS_FILE = 'dataset/negative-words.txt'
TRAIN_PROCESSED_FILE = 'train-processed.csv'
TEST_PROCESSED_FILE = 'test-processed.csv'
POSITIVE_WORDS_FILE = '../dataset/positive-words.txt'
NEGATIVE_WORDS_FILE = '../dataset/negative-words.txt'
TRAIN = True


Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit cfc2205

Please sign in to comment.