Skip to content

skibum55/csca5028

Repository files navigation

Student Sentiment Analysis

Executive Summary

As education moves away from mandatory classroom attendance, educators need to have a broader toolbox for fostering engagement. Forums and chats provide an opportunity to gauge the learners feelings about the course. This MVP application gives instructors, and their assistants, real-time insight into student sentiment. This fast feedback encourages immediate adaptation in the case of misunderstandings while providing opportunities for future improvement.

Initial Architecture Thoughts

Mermaid was used to create a diagram as code.

Diagram

Quick Start

Install the prerequisites, initialize environment variables and create a dedicated environment with make install. Be patient as there are a number of large data science libraries which need to load.

⚠️ .env needs to be updated with your information from the .env.example

Run the application with make run. The homepage will be available here -> 127.0.0.1:8000. The first run will create a database and load a large public sentiment analysis model from Hugging Face. Any messages not in the database will be collected and analyzed at this point too.

TL;DR

Source code is available at Github (minus credentials.)

Stories Delivered

Development automation ✅

A Makefile was created for ease of task reuse.

Web application basic form, reporting ✅

I chose the python FastTAPI framework as the basis for this application. It provides a great deal of built in features which I would not need to code myself.

Plotly is used for the data visualization. The JavaScript include was an almost codefree starting point. Plotly has excellent examples, a playground and it supports both javascript and python.

Docs

This README serves as the Product development history. The OpenAPI spec is available here. When the application is running, the local url can be used. A Redoc version can be found here.

Data persistence ✅

A SQLite database was chosen for it's simplicity and portability. A DB instance and tables are created on startup if they don't already exist.

Data collection ✅

The data collector module uses the Slack Web API to pull messages from specific channels into the DB. You can run a new collection using the collect endpoint.

Data analyzer ✅

Data analysis is provided by the Flair Natural Language Processing library. It is wrapped by an FastAPI endpoint for inter-process communication. You can run a text analysis using the analyze endpoint. See the docs for usage.

Unit tests ✅

Unit tests can be found in the test directory. They can be run with the command make test . FastAPI has pytest support for built-in mocks and stubs.

Test Success

Rest collaboration internal or API endpoint ✅

External collaboration occurs with the database and Slack API. Internally, the analysis and metrics endpoints are created using asynchronous applications with their own API specifications.

Product environment ✅

Python Venv is used for development isolation. Local credentials are not committed to source control. For Render web services and Github CI/D secrets and variables are used.

Integration tests ✅✅
Continuous integration ✅✅

Github Actions manifest's are used to validate python via linting, code quality and build tasks.

Actions

Load testing

Load tests are via k6.

Test run

Production monitoring instrumenting ✅✅

Grafana Cloud is used for collecting and visualizing performance. A Prometheus metrics endpoint is avaailable here.

Dashboard https://www.google.com/search?channel=fenc&client=firefox-b-1-lm&q=prometheus+requests+per+second

Continuous delivery ✅✅✅
Codespaces & VSCode

Codespaces

Render ✅✅✅✅

A continuous deployment manifest using Github Deploy workflows can be found here. Secrets are set using variables. This action uses a deploy hook to deliver the app to Render. Unfortunately my free account doesn't have capacity to run the flair sentiment analysis library in memory. A successful delivery can be seen in the screenshot below.

Render

Docker ✅✅✅✅

The docker image created in CI can be run with the following commands:

docker pull ghcr.io/skibum55/csca5028:latest
docker run -d -p 8000:8000 --env-file=.dockerenv csca5028:latest

⚠️ .dockerenv needs to be updated with your information from the .dockerenv.example

https://github.com/skibum55/csca5028/pkgs/container/csca5028

Continuous security ✅✅✅✅

Out of respect for the peer reviewers, I have added security scans to the code via Github Security. This ensures I don't spread malicious software inadvertently.

image

image2

Backlog

Code tagged with TODO

Event collaboration messaging ❎❎❎

In this product, the data demands are low enough that we can do all our collection and analysis synchronously. As Slack usage increases, it would make sense to modify our application to use a mq as shown in the architectural diagram. A change to the Slack event driven API would be better than scraping too.

Using mock objects or any test doubles ❎❎

Didn't get to it.

Development branch

Single user development doesn't really need multiple branches. However, the CI/D actions run on every commit. It's a waste of compute, hence energy.

HTML templating

As homepage expands, inline html should be replaced with templates.

Test coverage

Needs to be measured and increased.

Scheduler

As this is an MVP, scheduled collection wasn't a priority. A collection endpoint exists for triggering a manual collection. Research indicates that this FASTAPI Scheduler would be a good fit for adding this feature when needed.

Object relational mapping

For database provider abstraction, an ORM can be used.