Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where are the Whisper models defined? #25

Closed
soupslurpr opened this issue Jun 25, 2024 · 5 comments
Closed

Where are the Whisper models defined? #25

soupslurpr opened this issue Jun 25, 2024 · 5 comments

Comments

@soupslurpr
Copy link

soupslurpr commented Jun 25, 2024

Hi, this app is pretty cool nice job, I was amazed by it's speed even though I read it's using the small Whisper model. For this reason I wanted to explore switching to using onnxruntime for running Whisper in my app Transcribro to see if I can switch to a bigger model while keeping the same speed (currently using tiny q8_0 with whisper.cpp). However, I couldn't find where the code that uses the Whisper model or how to use the Whisper model in onnxruntime. Could you direct me to an example or where this app uses the Whisper model? Thanks!

@JingziC
Copy link
Contributor

JingziC commented Jun 25, 2024

I suppose that this app uses onnx runtime by importing ai.onnxruntime in Java. The code using whisper in onnxruntime might be Recognizer.java.

@niedev
Copy link
Owner

niedev commented Jun 25, 2024

@soupslurpr Thank you for the appreciation, I like your project too. As @JingziC said, Whisper's inference logic is all inside the Recognizer class.

@soupslurpr
Copy link
Author

@niedev Okay I see, but where do you get the Whisper models in onnx format or how do you convert it?

@niedev
Copy link
Owner

niedev commented Jun 26, 2024

To get the whisper models you can just download them in the release of RTranslator 2.0 (all the models that start with "Whisper_".

If you want to convert them by yourself it is complicated, because I used Intel's quantized encoder and decoder (Whisper_encoder.onnx and Whisper_decoder.onnx). Then, from Whisper converted from pytorch to onnx, I extracted the components that generate the kV cache of the encoder (Whisper_cache_initializer.onnx). Then I converted Whisper to onnx with Microsoft Olive, and from there I extracted the components for generating the log-mel (Whisper_initializer.onnx) and the detokenizer (Whisper_detokenizer.onnx).

I could have directly used just the single .onnx model generated by Olive, but that model consumes 1.3GB of RAM, while using all these components separately it consumes:

  • 0.5GB of RAM with arena deactivated (slower inference)
  • 0.9GB of RAM with arena activated (faster inference)

But if you need just Whisper small you can simply use the models in the release of RTranslator that I linked above.

@soupslurpr
Copy link
Author

For now I'll wait for Whisper to get an official example in onnxruntime as I want to easily use other sizes or finetunes if needed. Thanks for the help though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants