Fix attention caching to make transcription run 30% faster #370

vickianand · 2022-10-19T13:56:13Z

For a dict.get(a, f()) call, f() is run even when the dict already has a. This is causing the bug in attention caching logic.
So making this change fixes the caching logic, and gets a sweet (expected) speedup of close to 30%.

For a close to 5 min test audio, on an Nvidia Quadro RTX 8000 gpu:

before this fix, the large model took 102.8s, and after this fix it takes 68.9s
before this fix, the medium.en model took 55.1s, and after this fix it takes 39.7s

vickianand · 2022-10-19T13:59:04Z

All thanks to @ritheshkumar95 for finding this.

janyf · 2022-10-19T19:14:25Z

confirming that this really speed up the process, nice catch !

large model slovak language , 16min file on GTX1080 Ti
before change: ~22min
after change: ~12min

akashmjn · 2022-10-19T23:21:41Z

Can repro on my end as well

17:38 wav file
On Quadro RTX 5000 (with --condition_on_previous_text=True, small.en model)

	Runtime (m:s)
Original	01:42
With kv cache fixed	01:24

Really elegant catch! 🙌 learnt something about Python today 🙂

jongwook · 2022-10-20T00:51:56Z

Thank you very much! This was an oversight when I was factoring out the caching part for the open source repo.

nlgtuankiet · 2022-10-20T06:46:37Z

@vickianand @ritheshkumar95
Thank you very much!
I was getting around 0.88-1.1x for large models, now it 1.6x

rjwilmsi · 2022-10-20T17:57:05Z

Wow, this is great news, thank you to all involved. Title says 30% faster on GPU but the speed up is much more on CPU, at least for me.

Ryzen 5 4500U / 6 core laptop CPU. 6m30s youtube video, English (https://www.youtube.com/watch?v=GFu64hnqzVo):

Model	Before PR	After PR
tiny.en	6m19s	2m9s
base.en	15m39s	4m29s
small.en	60m45s	11m18s

Fix attention caching to make it actually work

a5c38a1

vickianand changed the title ~~Fix attention caching to make it around 30% faster~~ Fix attention caching to make transcription run 30% faster Oct 19, 2022

jongwook merged commit 9f70a35 into openai:main Oct 19, 2022

pessimal pushed a commit to pessimal/whisper-for-low-vram that referenced this pull request Oct 25, 2022

Fix attention caching to make it actually work (openai#370)

75455f9

thundergolfer mentioned this pull request Nov 2, 2022

Update to latest openai/whisper, which has big CPU speedups modal-labs/modal-examples#72

Merged

hauntsaninja mentioned this pull request Jan 5, 2023

Dict values not inferred correctly with .get() calls python/mypy#6258

Closed

ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023

Fix attention caching to make it actually work (openai#370)

4117a3a

abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023

Fix attention caching to make it actually work (openai#370)

06091af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix attention caching to make transcription run 30% faster #370

Fix attention caching to make transcription run 30% faster #370

vickianand commented Oct 19, 2022 •

edited

Loading

vickianand commented Oct 19, 2022 •

edited

Loading

janyf commented Oct 19, 2022

akashmjn commented Oct 19, 2022 •

edited

Loading

jongwook commented Oct 20, 2022 •

edited

Loading

nlgtuankiet commented Oct 20, 2022

rjwilmsi commented Oct 20, 2022

Fix attention caching to make transcription run 30% faster #370

Fix attention caching to make transcription run 30% faster #370

Conversation

vickianand commented Oct 19, 2022 • edited Loading

vickianand commented Oct 19, 2022 • edited Loading

janyf commented Oct 19, 2022

akashmjn commented Oct 19, 2022 • edited Loading

jongwook commented Oct 20, 2022 • edited Loading

nlgtuankiet commented Oct 20, 2022

rjwilmsi commented Oct 20, 2022

vickianand commented Oct 19, 2022 •

edited

Loading

vickianand commented Oct 19, 2022 •

edited

Loading

akashmjn commented Oct 19, 2022 •

edited

Loading

jongwook commented Oct 20, 2022 •

edited

Loading