Sentiment Analysis

NLP, Sentiment Analysis, and Topic Analysis on Walkthrough data for Transformation Waco

Result: How cleaning up a form won $2.4 million in TIA grant funding

By restructuring the coaching feedback forms, I was able to recover 1000 work hours or roughly $45,000 in salaries plus create actionable data from previously ignored qualitative data. We were collecting the coaching feedback, but no one was analyzing it. I developed python code to use different machine learning models to analyze all of the fill in the blank questions and showed that the coaching feedback was generally positive, which challenged the assumption that feedback was usually negative. Working with better feedback forms, recovered worked hours and challenged preconceptions, we were able to have focused discussions on calibrated walks. Calbrated walks fed into data collection points for the TIA grant, where my data collection and reporting passed a 13-point data validation check conducted by the University of Texas and ended up winning $2.4 million in bonus income for teachers. $1.2 milliion across 2 years for TW campuses.

(https://www.kcentv.com/article/news/local/waco-isd-boosts-200-teachers-pay-through-teacher-incentive-allotment/500-0d25fe54-c322-4973-b02d-fa81c30ecc8a)

Process

Use Python in Jupyter Notebook to perform a Sentiment Analysis
NLTK to determine positive, neutral, and negative
Review for subjectivity and objectivity
Perform Topic Analysis
Pull results into Tableau for an interactive Dashboard
PowerPoint for sharable file for those outside the organization
Refine data collection process to save hours of work for campus leaders

Problem Statement

An outside organization created a teacher observation form with several questions including 3 fill in the blanks. My best estimate for the time investment for one year of data tracking is that the principals/APs spent 2000 work hours completing the walkthrough form. My team asked for an analysis on the quantitative data but left out the qualitative data. I completed the core part of the analysis in Tableau, but I still wanted to dig into the qualitative data, which was over 5000 text entries to sift through.

Analyzing the Qualitative Data

I wrote code in Jupyter Notebook and ran the text entries through a sentiment analysis and found that the coaching feedback was significantly positive and only slightly subjective in language. Both of those indicators meant the coaches were on track to giving good mentoring advice to the teachers. I spent a lot of time tinkering with what made an appropriate stop word removal list with an education data set. I uploaded the results and made a word cloud in Tableau where they could explore the different positive/negative and objective/subjective words. I also completed 2 versions of a topic analysis, where I found that using 5 words in a row gave a good general sense of some interesting topics discussed in the coaching feedback.

Addressing Form Fatigue

The other key finding was the word count trailed off as the form progressed. After some investigation, I finally attributed the decrease in words to form fatigue. To address form fatigue, I met with the outside team to analyze the questions on the form and to build it better from a data collection view for the new school year. We were able to trim the form down considerably while not sacrificing any of the team’s key data points! I removed many redundant data collection pieces that could be later merged with a static file when an analysis was needed. We easily recovered 1000 work hours in making this adjustment!

Code Coming Soon!

I am working on finding an alternative data set to share my code and work for other educators. Machine Learning and NLP is not commonly used among school campuses. I am working to make advanced analysis techniques more accessible to school districts. I used Jupyter Notebook for this project, but for accessibility, I will add code for Google Colaboratory as it does not require a data environment to be set up and allows for non-technical educators to run the code on their survey data. Most ISDs have access to Google products. (Need to work out how to visually display without taking into Tableau.)

Future Research Areas

Super Duper NLP Repo: https://notebooks.quantumstat.com/
Datasets: https://index.quantumstat.com/
Visualize: https://iclr.cc/virtual_2020/papers.html?filter=keywords
Pick the ML model: https://towardsdatascience.com/automated-machine-learning-using-pycaret-4bb90ab3e2c7 Visualize the math of ML models: https://github.com/Gautam-J/Machine-Learning PowerBI in Jupyter notebook: https://github.com/microsoft/powerbi-jupyter

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis

Result: How cleaning up a form won $2.4 million in TIA grant funding

Process

Problem Statement

Analyzing the Qualitative Data

Addressing Form Fatigue

Code Coming Soon!

Future Research Areas

About

Releases

Packages

Baylex/TW_Survey_NLP

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis

Result: How cleaning up a form won $2.4 million in TIA grant funding

Process

Problem Statement

Analyzing the Qualitative Data

Addressing Form Fatigue

Code Coming Soon!

Future Research Areas

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages