Skip to content

Whisper and GCP Compute Runners

Latest
Compare
Choose a tag to compare
@evamaxfield evamaxfield released this 21 Feb 07:00
· 27 commits to main since this release
498b4e1

CouncilDataProject cdp-backend v4.0.0

⚠️ ⚠️ This is a major breaking release. Instance maintainers should update the instance with just update-from-cookiecutter. ⚠️ ⚠️

You should re-read through the SETUP/README.md document as there is some new minor configuration required. Specifically the new PERSONAL_ACCESS_TOKEN and Quote Increase request should be the only things that need to be updated for existing instances.

You should also lower how often your CRON event gather runs prior to running just update-from-cookiecutter. All of the instances maintained by the CDP Core Team will be lowered to running only once per day.


Council Data Project is a backend, frontend, and cookiecutter deployment for creating a whole database, storage system, and website, for archiving, exploring, and tracking municipal council action.

This library, cookiecutter-cdp-deployment ties together multiple projects to make a single deployable infrastructure.

v4.0.0

There are two main changes for this release.

  1. We are swapping out Google Speech-to-Text for OpenAIs Whisper.

Specifically, we are using a forked version called faster-whisper. This new speech-to-text model performs much better (ranging from ~3.6% word-error-rate to ~9% word-error-rate on long audio files).

To use this new model efficiently, we need access to a GPU. Since GitHub Actions do not have GPUs available, we are using a system which spins up a Google Cloud Compute Engine instance, connects to it, runs our job, and then tears it down all in the course of a single GitHub Action workflow. From multiple tests, this should be a reduction in cost and processing time however with this release we will do more testing to get a better estimate.

  1. We have switched from MIT to MPLv2 License.

Unless you are trying to fork our code and take it private, this won't affect you.