New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856

Jiltseb · 2024-05-24T09:02:22Z

Hello everyone,

This PR adds a major update to Faster Whisper, bringing both speed and quality improvements!

Speed improvements:

Batching support: Inspired by whisper-x, this update introduces batching support allowing for a 3x speed increase. This implementation builds on whiper-x and supports more run-time arguments and external VAD segments. The batched version now runs at 64x real-time speed, compared to the previous 20x.
Faster feature extraction: We've incorporated torchaudio-based parallel STFT as an alternative to the current implementation from transformers, providing additional speed boosts. With the enable_ta_fe flag, the final version achieves an impressive 104x real-time speed. This is up to 12.5x on average compared to OpenAI implementation!

Using the batched version is straightforward:

from faster_whisper import WhisperModel, BatchedInferencePipeline
#load faster-whisper model in the usual way
model = WhisperModel("medium", device="cuda", compute_type="float16") 

#apply batched pipeline
batched_model = BatchedInferencePipeline(model=model)

#predict using the batched_model
result = batched_model.transcribe("audio.mp3", batch_size=16)

for segment, info in result:
	print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Quality Improvements

Consistency across runs: By setting the model seed, consistency across runs is improved.
Reducing hallucinations: Stricter checks in the inference pipeline reduce unstructured or repeated phrases.
Reliable language detection: A new function detects language more reliably by considering highly confident and random segments, breaking ties to determine the major language.
Code-switching support: Handles audio with multiple languages by detecting language every 30 seconds and dynamically directing data flow. Since the exact language switching position is unknown, this can have an error within a 30 sec segment range.

Language detection Usage:

from faster_whisper import WhisperModel

model = WhisperModel("medium", device="cuda", compute_type="float16")
language_info = model.detect_language_multi_segment("audio.mp3")

Benchmarking:

A. Open source benchmarking:

Open_asr_eval solely consists of short-form audio and the average audio duration is less than 10 sec in general. Hence, using a subset of the YouTube-Commons dataset, we've tested more complex use cases with long-form audio. Whisper-medium model is used (with batch size = 8 for batched versions) for the experiments. Dataset card of youtube-commons-asr-eval is mobiuslabsgmbh/youtube-commons-asr-eval.

Speed (x real-time):

System	Speed GPU	Speed CPU
OpenAI Whisper	8.2x	4.5x
faster-whisper	20.1x	5.6x
HF Whisper (batched)	59.3x	8.4x
Batched Faster-Whisper	104x	14.6x

WER:

System	WER
OpenAI Whisper	15.1
faster-whisper	14.6
HF Whisper (batched)	16.8
Batched Faster-Whisper	13.1

B. Internal dataset:

Since the transcriptions in the open-source dataset are unverified, they can contain various types of errors. Additional internal benchmarking ensures robustness across various scenarios. A smaller test set (84 minutes) with verified ground truth is used for verifying the transcription quality and speed. The test set contains 9 audios ranging from 3 minutes to 13 minutes and various audio types.

System	WER	Speed
OpenAI Whisper	6.8	9.1x
faster-whisper	6.1	17.4x
HF Whisper (batched)	8.2	42.8x
Batched Faster-Whisper	6.5	86.6x

Batched processing speeds up long-form audio without causing an increase in WER. Users can easily switch between sequential and batched Faster Whisper versions based on specific requirements.

Thank you in advance!

Acknowledgements

This is the work done at Mobiuslabs GmbH. Contact Dr. Jilt Sebastian for any queries or requests.

PR: Changes to faster-whisper project for asr v2.1 based on latest faster_whisper (0.9.0)

SDK v3.0 does not work with latest numpy version (1.26.0) and faster whisper won't work if numpy <1.21.6

Updating the base faster-whisper to 0.10.0

… logic

Support for Batched inference and language detection from multiple segments in faster-whisper

Updating the base directory

fixing conflicts

trungkienbkhn

LGTM, please fix a few minor warnings about coding convention

faster_whisper/vad.py

faster_whisper/transcribe.py

review comments

Jiltseb · 2024-07-05T13:23:33Z

Done, We are good to go!

fix usage with english-only models

felixthekraut · 2024-07-08T13:25:24Z

Are there any licensing concerns bringing this in from whisper-x? The whisper-x license is more restrictive than the MIT license faster whisper is under.

Jiltseb · 2024-07-08T13:48:09Z

Are there any licensing concerns bringing this in from whisper-x? The whisper-x license is more restrictive than the MIT license faster whisper is under.

Using batching or HF pipeline is a generic idea on whisper model that supports batching. Only some portions of VAD segmentation are specific to whisper-x @trungkienbkhn Could you please let us know the response from SYSTRAN on this? Would be great if the author of whisper-X can provide a waiver to this.

If there are legal issues, we can switch to Silero or nvidia based open source VAD models.

trungkienbkhn · 2024-07-09T08:48:34Z

Are there any licensing concerns bringing this in from whisper-x? The whisper-x license is more restrictive than the MIT license faster whisper is under.

Using batching or HF pipeline is a generic idea on whisper model that supports batching. Only some portions of VAD segmentation are specific to whisper-x @trungkienbkhn Could you please let us know the response from SYSTRAN on this? Would be great if the author of whisper-X can provide a waiver to this.

If there are legal issues, we can switch to Silero or nvidia based open source VAD models.

I've been researching the BSD-4-Clause license. This license allows the use, copying, and editing of code for development purposes. But you should add this license in the beginning of the code file that used whisper-x's code (vad.py):

# Copyright (c) 2022, Max Bain
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright
...

# The code below is copied from whisper-x (https://github.com/m-bain/whisperX)
# and adapted for faster_whisper.

class SegmentX:
   ...

README.md

Jiltseb · 2024-07-09T09:29:08Z

The developer Max Bain informed via email that:
"Sure you can just use the modified version, just put some attribution in the VAD chunking file / batching section of the readme."

felixthekraut · 2024-07-09T12:28:07Z

The developer Max Bain informed via email that: "Sure you can just use the modified version, just put some attribution in the VAD chunking file / batching section of the readme."

Doesn't the license carry forward to users of faster whisper as well, i.e. the attribution clause will be needed for anyone using this project?

added licensing comments in the doc and the code

felixthekraut · 2024-07-10T14:12:13Z

faster_whisper/vad.py

+# 2. Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+# 3. All advertising materials mentioning features or use of this software


This is going to have to be carried forward into the Faster Whisper license - which is currently MIT. I don't see how this is compatible with MIT.

added formatting checks

Jiltseb · 2024-07-11T09:24:11Z

The developer Max Bain informed via email that: "Sure you can just use the modified version, just put some attribution in the VAD chunking file / batching section of the readme."

Doesn't the license carry forward to users of faster whisper as well, i.e. the attribution clause will be needed for anyone using this project?

Please note that the author of whisper-x has changed the license to BSD-Clause-2. So, with the proper attribution in the code, it will be possible to use it. As per this license, you don’t have to mention the name of developer or software in advertising or marketing materials. I will modify the doc accordingly and update here.

felixthekraut · 2024-07-11T14:00:13Z

The developer Max Bain informed via email that: "Sure you can just use the modified version, just put some attribution in the VAD chunking file / batching section of the readme."

Doesn't the license carry forward to users of faster whisper as well, i.e. the attribution clause will be needed for anyone using this project?

Please note that the author of whisper-x has changed the license to BSD-Clause-2. So, with the proper attribution in the code, it will be possible to use it. As per this license, you don’t have to mention the name of developer or software in advertising or marketing materials. I will modify the doc accordingly and update here.

Thank you for working through this!

update license info

Implement changes in review request

Jiltseb added 30 commits June 9, 2023 13:52

seed, multilingual and fixes

fc54cb9

added languages in tokenizer

84d58fa

multilingual fixes

63bea66

vocabulary extension fix for downloads

b95d694

code fixes for multilingual

a8626bb

Squash long words at window and sentence boundaries

c2ca8d4

added commits specifying changes to original package

9edf960

seed, multilingual and fixes

d008650

added languages in tokenizer

2573982

multilingual fixes

8add326

vocabulary extension fix for downloads

afc3f5c

code fixes for multilingual

dd55c03

Squash long words at window and sentence boundaries

d34780e

added commits specifying changes to original package

9fab8d9

modifications based on review

162fbf0

removed LANGUAGES from tokenizer and added numpy requirements

ca6a2ba

Merge remote-tracking branch 'upstream/master'

0df6953

Merge local master to 'updated_js_v2.1'

988c528

Merge pull request #1 from mobiusml/js_asr_v2.1_pr

443eb86

PR: Changes to faster-whisper project for asr v2.1 based on latest faster_whisper (0.9.0)

Update requirements.txt

6a51407

SDK v3.0 does not work with latest numpy version (1.26.0) and faster whisper won't work if numpy <1.21.6

Merge pull request #2 from SYSTRAN/master

4138e16

Updating the base faster-whisper to 0.10.0

changes to README.md

b906a98

Added BatchedInferencePipeline

0464122

Added language detection from multiple segments and batched inference…

78b5cd7

… logic

added additional packages

f397e37

changes to batched inference based on the review

83895ac

change in silence detection

e1c1699

Merge pull request #3 from mobiusml/batched_asr

b516bc8

Support for Batched inference and language detection from multiple segments in faster-whisper

Merge pull request #4 from SYSTRAN/master

3477d86

Updating the base directory

added logic for torchaudio based feature extraction

95df9eb

MahmoudAshraf97 and others added 4 commits July 5, 2024 14:25

.

2fc6c50

rename chunk_size to chunk_length for consistency

8bdbca0

Merge branch 'master' into master

b94bd93

Merge pull request #24 from MahmoudAshraf97/master

fec8c4e

fixing conflicts

trungkienbkhn reviewed Jul 5, 2024

View reviewed changes

faster_whisper/vad.py Outdated Show resolved Hide resolved

faster_whisper/vad.py Outdated Show resolved Hide resolved

faster_whisper/transcribe.py Outdated Show resolved Hide resolved

faster_whisper/transcribe.py Outdated Show resolved Hide resolved

MahmoudAshraf97 and others added 4 commits July 5, 2024 15:04

review comments

ad080cd

.

aef5869

fixing docstring

9b39b73

Merge pull request #25 from MahmoudAshraf97/master

1dcf0c9

review comments

MahmoudAshraf97 and others added 2 commits July 7, 2024 01:13

fix usage with english-only models

e988ac6

Merge pull request #26 from MahmoudAshraf97/master

b3c1ace

fix usage with english-only models

trungkienbkhn reviewed Jul 9, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

Jiltseb added 3 commits July 10, 2024 12:44

added licensing comments inthe doc and the code

c51b877

Merge pull request #27 from mobiusml/fw_changes

7a90ab8

added licensing comments in the doc and the code

added formatting checks

3fd6f7c

felixthekraut reviewed Jul 10, 2024

View reviewed changes

Merge pull request #28 from mobiusml/fw_changes

6a87d85

added formatting checks

Jiltseb and others added 5 commits July 11, 2024 15:20

update license info

4681caa

Merge pull request #29 from mobiusml/fw_changes

62bb5f0

update license info

.

bb6696b

remove duplicate detect_language function

5e6a426

Merge pull request #22 from MahmoudAshraf97/master

3ffb18f

Implement changes in review request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856

New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856

Jiltseb commented May 24, 2024 •

edited

Loading

trungkienbkhn left a comment

Jiltseb commented Jul 5, 2024

felixthekraut commented Jul 8, 2024

Jiltseb commented Jul 8, 2024

trungkienbkhn commented Jul 9, 2024

Jiltseb commented Jul 9, 2024

felixthekraut commented Jul 9, 2024

felixthekraut Jul 10, 2024

Jiltseb commented Jul 11, 2024

felixthekraut commented Jul 11, 2024

New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856

Are you sure you want to change the base?

New PR for Faster Whisper: Batching Support, Speed Boosts, and Quality Enhancements #856

Conversation

Jiltseb commented May 24, 2024 • edited Loading

Speed improvements:

Quality Improvements

Benchmarking:

A. Open source benchmarking:

B. Internal dataset:

Acknowledgements

trungkienbkhn left a comment

Choose a reason for hiding this comment

Jiltseb commented Jul 5, 2024

felixthekraut commented Jul 8, 2024

Jiltseb commented Jul 8, 2024

trungkienbkhn commented Jul 9, 2024

Jiltseb commented Jul 9, 2024

felixthekraut commented Jul 9, 2024

felixthekraut Jul 10, 2024

Choose a reason for hiding this comment

Jiltseb commented Jul 11, 2024

felixthekraut commented Jul 11, 2024

Jiltseb commented May 24, 2024 •

edited

Loading