Add language_bias parameter to detect_language #2004

jbaudanza · 2024-02-06T23:28:35Z

We are doing a lot of multi-lingual transcriptions, especially with language learners. We have found that Whisper will often incorrectly identify the language when a language learner is speaking their target language.

For example, a native Korean speaker will speak English, but with a strong Korean accent, and Whisper will identify the language as Korean, not English.

We've found that adding a language bias of +1.0 to the language learner's target language is enough to nudge whisper in the direction of the user's target language, while still allowing it to properly identify the user's native language, and other languages.

jmgb27 · 2024-05-19T02:11:47Z

Hi @jbaudanza did you have any workarounds for this one? looks like they haven't fixed it yet.

jbaudanza · 2024-05-19T02:21:26Z

Hi @jbaudanza did you have any workarounds for this one? looks like they haven't fixed it yet.

This PR is the workaround.

Alternatively, you could try using a different model to do the language detection part. Maybe try this: https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa I haven't compared the results to Whisper though.

add language_bias parameter

6005065

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add language_bias parameter to detect_language #2004

Add language_bias parameter to detect_language #2004

jbaudanza commented Feb 6, 2024

jmgb27 commented May 19, 2024

jbaudanza commented May 19, 2024

Add language_bias parameter to detect_language #2004

Are you sure you want to change the base?

Add language_bias parameter to detect_language #2004

Conversation

jbaudanza commented Feb 6, 2024

jmgb27 commented May 19, 2024

jbaudanza commented May 19, 2024