Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attention caching to make transcription run 30% faster #370

Merged
merged 1 commit into from
Oct 19, 2022

Conversation

vickianand
Copy link
Contributor

@vickianand vickianand commented Oct 19, 2022

For a dict.get(a, f()) call, f() is run even when the dict already has a. This is causing the bug in attention caching logic.
So making this change fixes the caching logic, and gets a sweet (expected) speedup of close to 30%.

For a close to 5 min test audio, on an Nvidia Quadro RTX 8000 gpu:

  • before this fix, the large model took 102.8s, and after this fix it takes 68.9s
  • before this fix, the medium.en model took 55.1s, and after this fix it takes 39.7s

@vickianand
Copy link
Contributor Author

vickianand commented Oct 19, 2022

All thanks to @ritheshkumar95 for finding this.

@vickianand vickianand changed the title Fix attention caching to make it around 30% faster Fix attention caching to make transcription run 30% faster Oct 19, 2022
@janyf
Copy link

janyf commented Oct 19, 2022

confirming that this really speed up the process, nice catch !

large model slovak language , 16min file on GTX1080 Ti
before change: ~22min
after change: ~12min

@akashmjn
Copy link
Contributor

akashmjn commented Oct 19, 2022

Can repro on my end as well

17:38 wav file
On Quadro RTX 5000 (with --condition_on_previous_text=True, small.en model)

Runtime (m:s)
Original 01:42
With kv cache fixed 01:24

Really elegant catch! 🙌 learnt something about Python today 🙂

@jongwook jongwook merged commit 9f70a35 into openai:main Oct 19, 2022
@jongwook
Copy link
Collaborator

jongwook commented Oct 20, 2022

Thank you very much! This was an oversight when I was factoring out the caching part for the open source repo.

@nlgtuankiet
Copy link

@vickianand @ritheshkumar95
Thank you very much!
I was getting around 0.88-1.1x for large models, now it 1.6x

@rjwilmsi
Copy link

Wow, this is great news, thank you to all involved. Title says 30% faster on GPU but the speed up is much more on CPU, at least for me.

Ryzen 5 4500U / 6 core laptop CPU. 6m30s youtube video, English (https://www.youtube.com/watch?v=GFu64hnqzVo):

Model Before PR After PR
tiny.en 6m19s 2m9s
base.en 15m39s 4m29s
small.en 60m45s 11m18s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants