Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix alignment between the segments and the list of words #1087

Merged
merged 2 commits into from
Mar 13, 2023
Merged

Fix alignment between the segments and the list of words #1087

merged 2 commits into from
Mar 13, 2023

Conversation

guillaumekln
Copy link
Contributor

@guillaumekln guillaumekln commented Mar 13, 2023

This PR fixes the issue described in #1082.

I propose to use a simpler Python loop that iterates over each segment and aggregates the words based on the number of tokens.

@jongwook jongwook merged commit 671ac5a into openai:main Mar 13, 2023
@ryanheise
Copy link
Contributor

I'm testing this PR in relation to the SRT flashing in #1072 (reply in thread) and it looks like it fixes that issue:

Using the test case:

$ ffmpeg -t 29 -i https://audio2.redcircle.com/episodes/6b196013-8672-43d9-be52-4332b3207d93/stream.mp3 test.mp3
$ python -m whisper --model base --output_format all --word_timestamps True test.mp3

The SRT flashing would normally appear on punctuation such as block 27 below, where the underline momentarily disappears and it switches from the words to the segment text, but here they match:

25
00:00:19,380 --> 00:00:19,880
Bronx located on the campus of Ryder University<u> in</u> Lawrenceville, New Jersey.

26
00:00:19,880 --> 00:00:20,640
Bronx located on the campus of Ryder University in<u> Lawrenceville,</u> New Jersey.

27
00:00:20,640 --> 00:00:20,840
Bronx located on the campus of Ryder University in Lawrenceville, New Jersey.

28
00:00:20,840 --> 00:00:20,920
Bronx located on the campus of Ryder University in Lawrenceville,<u> New</u> Jersey.

29
00:00:20,920 --> 00:00:21,400
Bronx located on the campus of Ryder University in Lawrenceville, New<u> Jersey.</u>

Before this PR, there was a flash in block 27 shown below:


25
00:00:19,380 --> 00:00:19,880
on the campus of Ryder University<u> in</u> Lawrenceville, New Jersey. A show once restrained

26
00:00:19,880 --> 00:00:20,640
on the campus of Ryder University in<u> Lawrenceville,</u> New Jersey. A show once restrained

27
00:00:20,640 --> 00:00:20,840
Bronx located on the campus of Ryder University in Lawrenceville, New Jersey.

28
00:00:20,840 --> 00:00:20,920
on the campus of Ryder University in Lawrenceville,<u> New</u> Jersey. A show once restrained

29
00:00:20,920 --> 00:00:21,400
on the campus of Ryder University in Lawrenceville, New<u> Jersey.</u> A show once restrained

@guillaumekln guillaumekln deleted the fix-words-list-alignment branch March 14, 2023 06:23
zackees pushed a commit to zackees/whisper that referenced this pull request May 5, 2023
* Fix alignment between the segments and the list of words

* Ensure the word index does not overflow
ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023
* Fix alignment between the segments and the list of words

* Ensure the word index does not overflow
abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
* Fix alignment between the segments and the list of words

* Ensure the word index does not overflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants