Mental Health in Tech, Natural Language Processing, and Machine Learning

Using data, natural language processing, and machine learning to analyze and improve mental health conversations in the tech community.

Mental Health in Tech App

The results of this project are published at the following URL: https://mental-health-in-tech.herokuapp.com/.

Background

This project looks at different maching learning classification models and compares their ability/accuracy to classify whether or not an employee in the tech community is willing to seek treatment for a mental health condition. The data used for this project comes from the OSMI (Open Sourcing Mental Illness) Mental Health in Tech Survey, which is an annual survey that captures employees' attitudes and opinions about how mental health is talked about and handled in the workplace in the tech industry. The target question from this survey that this project analyzes is "Have you ever sought treatment for a mental health disorder from a mental health professional?" Along with analyzing whether or not an employee would seek treatment, this project also uses natural language processing and sentiment analysis to build a model used to analyze mental health conversations employees have with their employers and then classify those conversations as either positive or negative.

Team

Phil Stubbs

Jenna Nytes

Abby Lemon

Jennifer Cain

Technologies Used

AWS S3
AWS RDS
Postgres
Python
Google Colab
Pandas
PySpark
Flask
Swagger
HTML/CSS
Heroku
TextBlob
VADER

Notebooks

All of the ETL, data analysis, machine learning, natural language processing, and any other data processing task for this project was done in Python inside Google Colab notebooks. The notebooks are stored in Google Drive where everyone on the team can access them from the same location. They are also stored in this repository, but the notebooks in this repository might not be as up to date with the notebook files that are being worked on in Google Drive.

Notebook	Description
cloud_setup.ipynb	This notebook includes the steps for setting up AWS and Google Drive for this project. That is, mounting your Google Drive into the Google Colab runtime to be able to access the GitHub files and the CSV files from AWS S3 inside Google Colab. This notebook only needs to be run once when first setting up this project on your machine.
cloud_etl.ipynb	This file includes all the steps the team went through to extract, transform, and load the data used for this project.
schema.ipynb	This notebook defines the schema and tables for the database using Python SQLAlchemy classes.
natural_language_processsing.ipynb	This notebook includes all of the natural language processing analysis work done for this project, including sentiment analysis and building text classification models.
ml_model.ipynb	This notebook includes the code used to build the machine learning models used to classify whether an employee would be willing to seeek mental health treatment.
predict_2020.ipynb	This notebook includes the code used to help predict the results of the 2020 OSMI Mental Health in Tech Survey

Getting Started

The following section will take you through the steps of setting up this project and getting it running locally on your computer.

If you don't want to set up this project locally and just want to see the deployed app, go to https://mental-health-in-tech.herokuapp.com/.

Clone the repository
Set up AWS and Google Drive
Load csv files from S3 bucket into Pandas dataframes to perform ETL
Create schema and tables

1. Clone the repository

The first step is to clone the project repository to a local directory on your computer.

2. Set up AWS and Google Drive

This project uses AWS to store the data files in a S3 bucket and uses Google Colab notebooks to extract, transform, load, and analyze that data.

To set up AWS and Google Drive for this project, run through the cells in the cloud_setup.ipynb notebook.

3. Load csv files from S3 bucket into Pandas dataframes to perform ETL

After setting up AWS and Google Drive, run through the cells in the cloud_etl.ipynb notebook to load the csv files from the S3 bucket into Pandas dataframes to extract, transform, and eventually load the data into a Postgres database.

4. Create schema and tables

Run through the cells in the schema.ipynb file to create the schema for the database tables and connect to the database to verify that the tables were created.

Database Schema

This project uses a SQL database hosted on AWS using AWS' Relational Database Service (RDS) to store the mental health survey results. For more information about the database structure used for this project, see Database Structure.

API

The data used for this project is stored in a Postgres database that is hosted on AWS using AWS' Relational Database Service (RDS).

To access the survey responses from the database, we built a simple API using Flask.

The API documentation is available in a swagger app. Start here for a basic reference on how to use the endpoints available to query the database for the data you need.

Starting the Flask App

The API and frontend for this project are built using flask. To start the flask application locally:

Change directory to the application folder in this repository.
In the application folder, create a file called config.py that includes the following contents:

DB_USERNAME = 'username'
DB_PASSWORD = 'password'
DB_ENDPOINT = 'endpoint'

Replace username, password, and endpoint with their actual values.

Go up one directory to the root directory of this repository (mental_health_ML).
Run the following command to start the flask server on port 5000.

python application/app.py

Or, you can run the following shell script from the project root directory:

sh run.sh

Navigate to https://localhost:5000 in Chrome (or whatever browser you prefer) to view the app.

Using the pgAdmin app to view and query the database locally

This project uses a Postgres database hosted on AWS RDS to store the data. If you have the pgAdmin app, you can view and query the database.

To connect to the database using pgAdmin:

Open the pgAdmin app.
Right click Servers.
Click Create > Server....
Enter a name for the server. For example, mental_health_tech_db.
Click the Connection tab.
In the Host name/Address field, enter the AWS RDS endpoint URL.
In the Username field, enter the database username.
In the Password field, enter the database password.
Check the Save Password? checkbox.
Click Save.

Deploying the app

The app for this project is deployed to and hosted on Heroku. For more information on hosting with Heroku, see https://devcenter.heroku.com/. To deploy the app, you will need to have the Heroku CLI installed.

Note: This app use heroku-buildpack-python-nltk, which allows the model files used for natural language processing to be uploaded to the Heroku app. This is the same as Heroku's official Python buildpack but also installs any NLTK packages.

Download and Install the Heroku CLI.
If you haven't already, log in to your Heroku account and follow the prompts to create a new SSH public key.

heroku login

Clone the repository

Use Git to clone the repository to your local machine.

heroku git:clone -a mental-health-in-tech
cd mental-health-in-tech

Deploy your changes

git add .
git commit -m "changes"
git push heroku master --no-verify

Issues

If you find an issue while using the app or have a request, log the issue or request here. These issues will be addressed in a future code update.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
Resources		Resources
application		application
docs		docs
research/data_columns		research/data_columns
.gitattributes		.gitattributes
.gitignore		.gitignore
.nltk_packages		.nltk_packages
.prettierrc		.prettierrc
2020_Response_to_Treatment_Prediction.png		2020_Response_to_Treatment_Prediction.png
MentalHealthResearch.md		MentalHealthResearch.md
Procfile		Procfile
README.md		README.md
cloud_etl.ipynb		cloud_etl.ipynb
cloud_setup.ipynb		cloud_setup.ipynb
ml_model.ipynb		ml_model.ipynb
natural_language_processing.ipynb		natural_language_processing.ipynb
predict_2020.ipynb		predict_2020.ipynb
requirements.txt		requirements.txt
run.sh		run.sh
runtime.txt		runtime.txt
schema.ipynb		schema.ipynb
upload_csvs_to_s3_bucket.ipynb		upload_csvs_to_s3_bucket.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mental Health in Tech, Natural Language Processing, and Machine Learning

Table of Contents

Mental Health in Tech App

Background

Team

Technologies Used

Notebooks

Getting Started

1. Clone the repository

2. Set up AWS and Google Drive

3. Load csv files from S3 bucket into Pandas dataframes to perform ETL

4. Create schema and tables

Database Schema

API

Starting the Flask App

Using the pgAdmin app to view and query the database locally

Deploying the app

Issues

About

Releases

Packages

Contributors 4

Languages

abbylemon/mental_health_ML

Folders and files

Latest commit

History

Repository files navigation

Mental Health in Tech, Natural Language Processing, and Machine Learning

Table of Contents

Mental Health in Tech App

Background

Team

Technologies Used

Notebooks

Getting Started

1. Clone the repository

2. Set up AWS and Google Drive

3. Load csv files from S3 bucket into Pandas dataframes to perform ETL

4. Create schema and tables

Database Schema

API

Starting the Flask App

Using the pgAdmin app to view and query the database locally

Deploying the app

Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages