Automatic Speech Recognition Demos

Dataset

LibriSpeech: Common English speech benchmark
FLEURS: Multilingual speech
AMI Meetings: Meeting recordings

GCP Speech-to-Text

Jie Jenn

AWS Transcribe

OpenAI-Whisper API

OpenAI-Whisper Open-Source

https://github.com/openai/whisper
- LibriSpeech
- Multilingual
Community Features:
- Transcription and diarization (speaker identification)
  - openai/whisper#264
  - https://colab.research.google.com/drive/1HuvcY4tkTHPDzcwyVH77LCh_m8tP-Qet?usp=sharing
- Streaming (real-time)
  - openai/whisper#2
  - https://betterprogramming.pub/color-your-captions-streamlining-live-transcriptions-with-diart-and-openais-whisper-6203350234ef
- Cpp port (lightweight)
  - https://github.com/ggerganov/whisper.cpp
Limitation:
- Hallucination: openai/whisper#679

HuggingFace

ASR

Picovoice

WER benchmarking
https://github.com/Picovoice/speech-to-text-benchmark

Metrics

Word Error Rate

Word error rate (WER) is the ratio of edit distance between words in a reference transcript and the words in the output of the speech-to-text engine to the number of words in the reference transcript.

Real Time Factor

Real-time factor (RTF) is the ratio of CPU (processing) time to the length of the input speech file. A speech-to-text engine with lower RTF is more computationally efficient. We omit this metric for cloud-based engines.

Model Size

The aggregate size of models (acoustic and language), in MB. We omit this metric for cloud-based engines.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
gcp		gcp
openai		openai
picovoice		picovoice
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition Demos

Dataset

GCP Speech-to-Text

AWS Transcribe

OpenAI-Whisper API

OpenAI-Whisper Open-Source

HuggingFace

Picovoice

Metrics

Word Error Rate

Real Time Factor

Model Size

About

Releases

Packages

Languages

owidjaja/speech-to-text-benchmark

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition Demos

Dataset

GCP Speech-to-Text

AWS Transcribe

OpenAI-Whisper API

OpenAI-Whisper Open-Source

HuggingFace

Picovoice

Metrics

Word Error Rate

Real Time Factor

Model Size

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages