Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hotwords feature #2070

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jax-explorer
Copy link

hello!
During the transcription process, I often encounter some proprietary or new vocabulary, and Whisper cannot handle it well. I searched for solutions, and the community provided two options:

Fine-tuning the model: This approach is costly, and it's not practical to fine-tune the model every time a new term emerges.

Using initial_prompt: However, initial_prompt only applies to the first window. If specialized terms don't appear at the beginning, this method is ineffective.

Upon reviewing other transcription models, it's common practice to use hotwords. So, I implemented this feature. My approach is to add hotword-related prompts before each transcription window. Since there's a maximum length limit, I occupy the space previously used by the prefix. When the prefix isn't set, hotwords take effect. After testing, it indeed resolved the issue of specialized vocabulary in my scenario.

The following is the community discussion on this issue:
#1477
https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311
https://stackoverflow.com/questions/73833916/how-can-i-give-some-hint-phrases-to-openais-whisper-asr

@jax-explorer
Copy link
Author

@jongwook hello, please check out this pr.

@James-Shared-Studios
Copy link

Would this be a duplicated effort since there is a parameter that serves the same purpose, condition_on_previous_text? if condition_on_previous_text set to True, the previous output of the model is provided as a prompt for the next window. Correct me if I'm wrong. Thank you.

@jax-explorer
Copy link
Author

@James-Shared-Studios This isn't used to add context, it's used to add hot words when some new word or term comes up that makes whisper recognize it. for example:comfyUI is a new word, it is The most powerful and modular stable diffusion GUI and backend.If don't add hotwords, he won't be recognized correctly.

@greduan
Copy link

greduan commented Apr 1, 2024

Have tried it with a video where the following words were misspelled

"Kalichain"
=>
"cl chain"
"cali chain"

"Kalicertif"
=>
"c cerff"
"cl ciff"
"Cali certif"

"Kalismarket"
=>
"C's Market"

"Kalishare"
=>
"Cali share"

"Kalistoken"
=>
"Cali's token"

"kijiji"
=>
"kiji"

And indeed it worked to make it so that these words were no longer misspelled with the following args:

whisper video.opus --hotwords "Kalichain, Kalicertif, Kalismarket, Kalishare, Kalistoken, kijiji, MEXC, Kalissa, FireHustle"

But it didn't work 100%, sometimes they were misspelled. Notably Kalicertif was misspelled as Kalistertif.

@JiweiZh
Copy link

JiweiZh commented Apr 8, 2024

So, by inputting a series of proper nouns through the hotwords method, what is the maximum length that can actually be supported? @jax-explorer

@jax-explorer
Copy link
Author

@JiweiZh It depends on the n_text_ctx value in the model's dims.

@sanghyun-son
Copy link

@jax-explorer Hello, I find this commit very useful and hope this going to be merged soon. Currently, I'm using your forked repository to enjoy this feature. BTW, I have some questions about your implementation.

  1. You say that you occupy spaces for prefix, but I'm not sure where the prefix comes from. Is condition_on_previous_text related to prefix?
  2. Current implementation divide n_ctx by 2 and assign prompt and hotwords evenly. If I want to use hotwords more, is it valid to change n_ctx // 2 to some other numbers? For example, I would not use prompt and use hotwords only if we provide hotwords like below:
if (hotwords := self.options.hotwords) is not None:
    hotwords_tokens = self.tokenizer.encode(" " + hotwords.strip())
    hotwords_tokens = hotwords_tokens[: self.n_ctx]  # Use more hotwords
    tokens = (
        [self.tokenizer.sot_prev]
        + hotwords_tokens
        # + (prompt_tokens[-(self.n_ctx // 2 - 1) :] if self.options.prompt is not None else [])
        + tokens
    )

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants