Incorrect Hindi Transcription #2

Munikumar09 · 2023-11-28T07:39:54Z

Hi, I am trying to transcribe YouTube audio in Hindi language using the IndicWhisper Hindi model. However, I am getting incorrect Hindi transcriptions.
For example:
YouTube transcription: यह अभ्यास तुम्हें उसी क्षेत्र में कम कर रहे अन्य लोगों से बहुत आगे लाकर खड़ा कर देगा सुबह के 5 घंटे
IndicWhisper transcription: हर दस किया विषाषा क्या एक है वह मैं विषा दिल ए आने के लिए एक अच्छा विषा आपकी रक्षा

audio.mp4

Can anyone help me with this?

The text was updated successfully, but these errors were encountered:

1392001sai · 2024-06-20T11:56:37Z

Hey @Munikumar09, Sorry for the late response,

I cant seem to replicate the issue. I am getting the following transcript from the whisper hindi model.

IndicWhisper Transcription: है यह अभ्यास तुम्हे उसी क्षेत्र में काम कर रहे अन्य लोगो ऐसी बहुत आगे लाकर खड़ा कर देगा सुबह के पांच घंटे

I converted your mp4 file to mp3 and used this code snippet for inference. The model is from the hindi hf checkpoint -

from transformers import pipeline

model_path = "hindi_models/whisper-medium-hi_alldata_multigpu"
device = "cuda"
lang_code = "hi"

whisper_asr = pipeline(
    "automatic-speech-recognition", model=model_path, device=device,
)

# Special case to handle odia since odia is not supported by whisper model
if lang_code == 'or':
    whisper_asr.model.config.forced_decoder_ids = (
        whisper_asr.tokenizer.get_decoder_prompt_ids(
            language=None, task="transcribe"
        )
    )
else:
    whisper_asr.model.config.forced_decoder_ids = (
        whisper_asr.tokenizer.get_decoder_prompt_ids(
            language=lang_code, task="transcribe"
        )
    )

result = whisper_asr("audio.mp3")
print(result["text"])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Hindi Transcription #2

Incorrect Hindi Transcription #2

Munikumar09 commented Nov 28, 2023

1392001sai commented Jun 20, 2024

Incorrect Hindi Transcription #2

Incorrect Hindi Transcription #2

Comments

Munikumar09 commented Nov 28, 2023

1392001sai commented Jun 20, 2024