Skip to content

owidjaja/speech-to-text-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Speech Recognition Demos

Dataset

GCP Speech-to-Text

AWS Transcribe

OpenAI-Whisper API

OpenAI-Whisper Open-Source

HuggingFace

Picovoice

Metrics

Word Error Rate

Word error rate (WER) is the ratio of edit distance between words in a reference transcript and the words in the output of the speech-to-text engine to the number of words in the reference transcript.

Real Time Factor

Real-time factor (RTF) is the ratio of CPU (processing) time to the length of the input speech file. A speech-to-text engine with lower RTF is more computationally efficient. We omit this metric for cloud-based engines.

Model Size

The aggregate size of models (acoustic and language), in MB. We omit this metric for cloud-based engines.

About

speech to text benchmark framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%