This project examined airline-related tweets for positive and negative sentiment. Logistic regression, gradient boosting classifier, and random forest classifier models were trained and compared for performance. Here, the random forest classifier was found to have the most effective performance with 91% accuracy on test data.
Code File (ipynb)
Final Report (pdf)
Visual/Audio Presentation (pptx)
The data for this project was sourced from kaggle.com, here. The dataset dontained over 14k tweets related to 5 US Airlines. The dataset also included 15 columns of data identifying the airline, confidence of sentiment labels, users, and more. A final table summarizing the sentiment findings accross airlines is shared below.