Batch processing #1

Blair-Johnson · 2022-12-08T16:59:38Z

Opening PR for merging and conflict resolution. The batch-processing branch introduces a single major modification to the behavior of the model.transcribe() method, which can now accept a list of audio file paths rather than a single audio file path. These files are packed into the batch dimension of the model for transcription, allowing users to achieve better GPU utilization. Audio clips can be different lengths and the internal batch size will be reduced as the transcription of sorter files is completed.

Remaining issues to address:

Test behavior of different transcription options for failure
Modify docstring of batch_transcribe method to reflect modifications
Verify behavior of decode options for different clips
Include some benchmarking results

…to remove.

…advanced profiling.

…ched together. Lots of debugging code to remove.

Blair-Johnson · 2022-12-08T22:08:55Z

Initial benchmarking indicates that batching enables significantly sub-linear scaling at least until batch_size=16 on a NVIDIA A100 80GB. Batching remains "sub-linear" with respect to batch_size=1 afterward, but the time required for a set of batched audio clips begins scaling linearly with additional increases in batch size as the GPU becomes saturated. In this figure, a 214min podcast was batched together with itself for different batch sizes in {1,2,4,8,16} and transcribed in-parallel. The linear reference assumes linear scaling with respect to the batch_size=1 case and is analogous to running consecutive clips serially.

…atch-whisper into batch-processing Merge.

Blair-Johnson and others added 8 commits December 1, 2022 11:01

Initial work on batch processing of audio

db10223

Currently an issue with endless looping behavior.

fae51c0

Currently issues with special tags appearing in output.

af6e16c

incomplete transcripts and special tokens appearing in output

53f3536

Properly functioning on two identical podcasts. Lots of debug prints …

17d0418

…to remove.

Batch processing, no prints. Need to clean up code still and do more …

51824db

…advanced profiling.

First successful test of various lengths of different audio clips bat…

3c4e773

…ched together. Lots of debugging code to remove.

Remove debugging code.

bf394e9

Blair-Johnson and others added 6 commits December 8, 2022 17:44

Merge branch 'main' into batch-processing

a22fc60

Updating documentation.

5f70ae3

Merge branch 'batch-processing' of https://github.com/Blair-Johnson/b…

e1da524

…atch-whisper into batch-processing Merge.

Remove unnecessary import

bc078ea

Update readme with info about batched inference.

46a48cf

More descriptive example for batched inference

bc58a18

Blair-Johnson merged commit fb6159b into main Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing #1

Batch processing #1

Blair-Johnson commented Dec 8, 2022 •

edited

Loading

Blair-Johnson commented Dec 8, 2022 •

edited

Loading

Batch processing #1

Batch processing #1

Conversation

Blair-Johnson commented Dec 8, 2022 • edited Loading

Blair-Johnson commented Dec 8, 2022 • edited Loading

Blair-Johnson commented Dec 8, 2022 •

edited

Loading

Blair-Johnson commented Dec 8, 2022 •

edited

Loading