Where are the Whisper models defined? #25

soupslurpr · 2024-06-25T00:43:43Z

Hi, this app is pretty cool nice job, I was amazed by it's speed even though I read it's using the small Whisper model. For this reason I wanted to explore switching to using onnxruntime for running Whisper in my app Transcribro to see if I can switch to a bigger model while keeping the same speed (currently using tiny q8_0 with whisper.cpp). However, I couldn't find where the code that uses the Whisper model or how to use the Whisper model in onnxruntime. Could you direct me to an example or where this app uses the Whisper model? Thanks!

JingziC · 2024-06-25T04:14:30Z

I suppose that this app uses onnx runtime by importing ai.onnxruntime in Java. The code using whisper in onnxruntime might be Recognizer.java.

niedev · 2024-06-25T08:45:05Z

@soupslurpr Thank you for the appreciation, I like your project too. As @JingziC said, Whisper's inference logic is all inside the Recognizer class.

soupslurpr · 2024-06-25T16:02:14Z

@niedev Okay I see, but where do you get the Whisper models in onnx format or how do you convert it?

niedev · 2024-06-26T09:18:41Z

To get the whisper models you can just download them in the release of RTranslator 2.0 (all the models that start with "Whisper_".

If you want to convert them by yourself it is complicated, because I used Intel's quantized encoder and decoder (Whisper_encoder.onnx and Whisper_decoder.onnx). Then, from Whisper converted from pytorch to onnx, I extracted the components that generate the kV cache of the encoder (Whisper_cache_initializer.onnx). Then I converted Whisper to onnx with Microsoft Olive, and from there I extracted the components for generating the log-mel (Whisper_initializer.onnx) and the detokenizer (Whisper_detokenizer.onnx).

I could have directly used just the single .onnx model generated by Olive, but that model consumes 1.3GB of RAM, while using all these components separately it consumes:

0.5GB of RAM with arena deactivated (slower inference)
0.9GB of RAM with arena activated (faster inference)

But if you need just Whisper small you can simply use the models in the release of RTranslator that I linked above.

soupslurpr · 2024-06-26T16:39:01Z

For now I'll wait for Whisper to get an official example in onnxruntime as I want to easily use other sizes or finetunes if needed. Thanks for the help though!

soupslurpr closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where are the Whisper models defined? #25

Where are the Whisper models defined? #25

soupslurpr commented Jun 25, 2024 •

edited

Loading

JingziC commented Jun 25, 2024

niedev commented Jun 25, 2024

soupslurpr commented Jun 25, 2024

niedev commented Jun 26, 2024

soupslurpr commented Jun 26, 2024

Where are the Whisper models defined? #25

Where are the Whisper models defined? #25

Comments

soupslurpr commented Jun 25, 2024 • edited Loading

JingziC commented Jun 25, 2024

niedev commented Jun 25, 2024

soupslurpr commented Jun 25, 2024

niedev commented Jun 26, 2024

soupslurpr commented Jun 26, 2024

soupslurpr commented Jun 25, 2024 •

edited

Loading