Cleaned repo

abdulfatir · Sep 21, 2018 · cfc2205 · cfc2205
1 parent dbf69ad
commit cfc2205
Show file tree

Hide file tree

Showing 19 changed files with 7 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 # Sentiment Analysis on Tweets
 
+**Update**(21 Sept. 2018): I don't actively maintain this repository. This work was done for a course project and the dataset cannot be released because I don't own the copyright. However, everything in this repository can be easily modified to work with other datasets. I recommend reading the [sloppily written project report](https://github.com/abdulfatir/twitter-sentiment-analysis/tree/master/docs/report.pdf) for this project which can be found in `docs/`.
+
 ## Dataset Information
 
 We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type `tweet_id,sentiment,tweet` where the `tweet_id` is a unique integer identifying the tweet, `sentiment` is either `1` (positive) or `0` (negative), and `tweet` is the tweet enclosed in `""`. Similarly, the test dataset is a csv file of type `tweet_id,tweet`. Please note that csv headers are not expected and should be removed from the training and test datasets.  
@@ -16,7 +18,7 @@ The library requirements specific to some methods are:
 * `keras` with `TensorFlow` backend for Logistic Regression, MLP, RNN (LSTM), and CNN.
 * `xgboost` for XGBoost.
 
-**Note**: It is recommended to use Anaconda distribution of Python. The [report](https://github.com/abdulfatir/twitter-sentiment-analysis/tree/master/docs/report.pdf) for this project can be found in `docs/`.
+**Note**: It is recommended to use Anaconda distribution of Python.
 
 ## Usage
 

diff --git a/baseline.py → code/baseline.py b/baseline.py → code/baseline.py
@@ -2,10 +2,10 @@
 
 # Classifies a tweet based on the number of positive and negative words in it
 
-TRAIN_PROCESSED_FILE = '../train-processed.csv'
-TEST_PROCESSED_FILE = '../test-processed.csv'
-POSITIVE_WORDS_FILE = 'dataset/positive-words.txt'
-NEGATIVE_WORDS_FILE = 'dataset/negative-words.txt'
+TRAIN_PROCESSED_FILE = 'train-processed.csv'
+TEST_PROCESSED_FILE = 'test-processed.csv'
+POSITIVE_WORDS_FILE = '../dataset/positive-words.txt'
+NEGATIVE_WORDS_FILE = '../dataset/negative-words.txt'
 TRAIN = True
 
 

diff --git a/cnn-feats-svm.py → code/cnn-feats-svm.py b/cnn-feats-svm.py → code/cnn-feats-svm.py
diff --git a/cnn.py → code/cnn.py b/cnn.py → code/cnn.py
diff --git a/decisiontree.py → code/decisiontree.py b/decisiontree.py → code/decisiontree.py
diff --git a/extract-cnn-feats.py → code/extract-cnn-feats.py b/extract-cnn-feats.py → code/extract-cnn-feats.py
diff --git a/logistic.py → code/logistic.py b/logistic.py → code/logistic.py
diff --git a/lstm.py → code/lstm.py b/lstm.py → code/lstm.py
diff --git a/majority-voting.py → code/majority-voting.py b/majority-voting.py → code/majority-voting.py
diff --git a/maxent-nltk.py → code/maxent-nltk.py b/maxent-nltk.py → code/maxent-nltk.py
diff --git a/naivebayes.py → code/naivebayes.py b/naivebayes.py → code/naivebayes.py
diff --git a/neuralnet.py → code/neuralnet.py b/neuralnet.py → code/neuralnet.py
diff --git a/preprocess.py → code/preprocess.py b/preprocess.py → code/preprocess.py
diff --git a/randomforest.py → code/randomforest.py b/randomforest.py → code/randomforest.py
diff --git a/stats.py → code/stats.py b/stats.py → code/stats.py
diff --git a/svm.py → code/svm.py b/svm.py → code/svm.py
diff --git a/utils.py → code/utils.py b/utils.py → code/utils.py
diff --git a/xgboost.py → code/xgboost.py b/xgboost.py → code/xgboost.py
diff --git a/Plots.ipynb → docs/Plots.ipynb b/Plots.ipynb → docs/Plots.ipynb