Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm #659

Merged
merged 3 commits into from
Jan 18, 2023

Conversation

HennerM
Copy link
Contributor

@HennerM HennerM commented Dec 8, 2022

The English normaliser mistakenly replaces 20mm to 20hmm. In this case "mm" is the unit postfix millimetre.

This was caused by treating 10 as number and thus splitting "10" and "mm". The "mm" token was then further replaced with "hmm" according to the English.json mapping.

Removing "mm" from the mapping shouldn't be a problem, since there is already a condition before that that would remove "mm" words entirely from the input.

"mhm" and "mmm" could probably be removed for the same reason.

@jongwook jongwook merged commit ea1c266 into openai:main Jan 18, 2023
@jongwook
Copy link
Collaborator

Thanks. Those replacers are from the post-processing scripts for the CHiME dataset:

https://github.com/kaldi-asr/kaldi/blob/ae8cbe8858f2a66a9b193c82dbe3b0479364165f/egs/chime5/s5/local/wer_output_filter#L19-L21

but I agree it'd be not very relevant to keep it at this point.

zackees pushed a commit to zackees/whisper that referenced this pull request May 5, 2023
ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023
abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants