Stress Detection on Social Media (presentation)
Social media is a major platform where people express their worries and stresses across the world. Learn2Relax was built in order to analyze content and identify stress from Reddit dataset by deploying NLP techniques. Word embeddings were pre-trained on unlabeled data and deployed by both discrete and neural supervised models.
-
Clone the GitHub repository
git clone https://github.com/Evaaaaaaa/Learn2Relax.git
-
Change working directory
cd Learn2Relax
-
The model is tested on Python 3.7 with dependencies listed in
requirements.txt
. To install these Python dependencies, please runpip install -r requirements.txt
Or if you prefer to use Conda
conda install --file requirements.txt
Notice: If NLTK package cannot be installed via above command, make sure you have Xcode installed if you use MacOS and you are not using Conda. To install the “minimum version” of Xcode, simply download the Command Line Tools DMG file from here and follow the installation instructions. If you are using Windows/Linux and the installation of NLTK does not work, try
sudo apt-get install python3 python3-pip ipython3 build-essential python-dev python3-dev
then install NLTK package via
pip install nltk
- Install tensorflow for GPU to run BERT model on GPU
pip install tensorflow-gpu==1.15
- Download and install Docker to create a containerized application for the inference demo. If you are new to Docker, here’s a quickstart guide.
- The models use preprocessed data files in the
data/preprocessed
repository. However, if you want to reproduce the tokenization steps from scratch using raw datafiles indata/raw
, you need to install NLTK datasets/modelspython configs/config.py
The live demo is deployed and scaled up online using Google Kubernetes Engine(GKE) and Google Cloud Platforms(GCP). If you want to run the app on your own, below are three options:
-
To run the Streamlit web app locally, make sure dependencies listed in
requirements.txt
are installed then runstreamlit run app.py
If no browser window pops up, point your browser to the External URL and you will be able to see the app as below
-
To create a containerized application locally
-
in the project directory
Learn2Relax
, rundocker build -t learn2relax-streamlit:1.0 .
-
you can run this image as a container via
docker run -p 80:80 learn2relax-streamlit:1.0
Point your internet browser to
localhost:80
to see the app. -
-
To deploy the app via GKE with your own GCP account:
Prerequisites
- Google Cloud SDK
- Kubenetes SDK, run the following command to install
gcloud components install kubectl
- Docker (installation guide in above Additional Setup section)
- GCP account with your GCP project ID
{PROJECT_ID}
, which you can find in the GCP console - Domain name
Workflow
- Dockerize the app
docker build -t gcr.io/{$PROJECT_ID}/learn2relax-streamlit:v3 .
- Test the container locally and point your internet browser to
localhost:80
to see the app
docker run -p 80:80 gcr.io/{$PROJECT_ID}/learn2relax-streamlit:v3
- Push image to Google Container Registry (GCR)
gcloud auth configure-docker
docker push gcr.io/{$PROJECT_ID}/learn2relax-streamlit:v3
- Create a container cluster (in this example with 2 nodes). If you have already created a cluster with the gcloud container clusters, only the last step is necessary.
gcloud config set project {$PROJECT_ID}
gcloud config set compute/zone {$COMPUTE_ZONE}
gcloud container clusters create {$CLUSTER_NAME}
gcloud container clusters get-credentials {$CLUSTER_NAME} --zone {$COMPUTE_ZONE} --project {$PROJECT_ID}
- Use Google-managed SSL certificates for your domain name: Edit the
certificate.yaml
from the cloned repo and change the host name to your respective choice. You will need to create an A record for your chosen host name with the IP address reserved above. Once you edit the certificate.yaml, run
kubectl apply -f certificate.yaml
- Deploy app to GKE: Update the deployment.yaml and replace
PROJECT_ID
with your project ID. Once you update the yaml file, executekubectl apply -f deployment.yaml
- Apply the Ingress Configuration
kubectl apply -f ingress.yaml
You can now head over to the Google Cloud console and under Kubernetes Engine -> Services & Ingress you can see the Ingress being created. Wait for the Ingress to be created before you continue. Once completed, you can now visit your deployed application. For my demo you can visit: https://www.datatranslator.space/
-
Features for the labeled dataset were generated by five different feature extraction methods: unigram TF-IDF, bigram TF-IDF, Word2Vec with TF-IDF as weights and BERT embeddings.
Word2Vec embeddings were also trained with 190k unlabeled Reddit posts. -
After feature extraction, 9 classification models were trained: Logistic Regression, Naive Bayes, SVM, AdaBoost, Gradient Boosting, Decision Tree, Random Forest, XGBoost and BERT.
Best model for each featurization technique and their performances:
Featurization Method | Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
Unigram TF-IDF | Logistic Regression | 84.23% | 82.87% | 90.36% | 86.46% |
Bigram TF-IDF | SVM | 84.23% | 83.24% | 89.76% | 86.38% |
Word2Vec + TF-IDF | XGBoost | 85.23% | 82.80% | 92.77% | 87.50% |
Pretrained Embeddings | Random Forest | 84.56% | 81.58% | 93.37% | 87.08% |
BERT Embeddings | BERT | 92.74% | 92.90% | 94.58% | 93.73% |
Traditional ML Models | BERT | |
---|---|---|
Avg. Training Time | 01.837573 sec | 3 min 48.131239 sec |
Avg. Inference Time | 00.004543 sec | 35.714544 sec |
-
Recall is the most important metric here because we want to best prevent misclassification of stress posts as non-stressful which helps us better understand the stressful contents in social media.
-
BERT is the most robust model with all four metrics the highest.
-
All models are able to provide a confidence score in addition to prediction.
Picture below shows word similarities in the raw dataset
Top 20 frequent words in stressed posts and non-stressed posts are
The labeled data is retrieved from Elsbeth Turcan & Kathleen McKeown.
Predicting Movie Reviews with BERT on TF Hub
Run Streamlit.io on Google Cloud Kubernetes
Step-by-Step Streamlit App Deployment Via GKE