This is the repo for Duke IDS 703 NLP Final Project Fall 2023!!!
Group Members: George Wang, Yanzheng Wu, Yi Chen
This project employs two approaches to perform sentiment analysis on movie reviews. The first approach is utilized by Naive Bayes classifier, and moreover a Markov Chain Text Generator was developed to create synthetic movie reviews. The second solution is based on a discriminative neural network, which combines Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs). Our goal is to accurately classify reviews as either positive or negative. Click here to read our report.
The dataset we choosed consists of movie reviews labeled as positive or negative.
https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/data
The CNN & LSTM model offers superior accuracy on real data for sentiment analysis due to their ability to capture nuanced sequential patterns, suffer from longer training times and require significant computational resources. Conversely, Naive Bayes offers greater interpretability and require less computational power, making them suitable for rapid development cycles and resource-constrained environments. However, they may not match the performance of CNN & LSTM models on tasks involving complex data patterns.