-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word-level timestamps in transcribe()
#869
Merged
Merged
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
8f9357f
word-level timestamps in `transcribe()`
jongwook 46ea501
moving to `timing.py`
jongwook cfd2b81
Merge branch 'main' into word-level-timestamps
jongwook 742d2f4
numba implementation for dtw, replacing dtw-python
jongwook fb12414
Merge branch 'main' into word-level-timestamps
jongwook 80331c0
triton implementation for dtw
jongwook 1d2ed66
add test for dtw implementations
jongwook b61e8f4
triton implementation of median_filter
jongwook 54f2901
a simple word-level timestamps test
jongwook 8ce6277
add scipy as dev dependency
jongwook 812f446
Merge branch 'main' into word-level-timestamps
jongwook cd5191f
installs an older version of Triton if CUDA < 11.4
jongwook f64d8bc
Merge branch 'main' into word-level-timestamps
jongwook 89133bd
Merge branch 'main' into word-level-timestamps
jongwook d4f9399
fix broken merge
jongwook 040aa04
Merge branch 'main' into word-level-timestamps
jongwook 8e2756b
loosen nvcc version match regex
jongwook 6c431c4
find_alignment() function
jongwook ff6cbfd
Merge branch 'main' into word-level-timestamps
jongwook 5fa4356
miscellaneous improvements
jongwook 48537aa
skip median filtering when the input is too small
jongwook 8eb29c3
Expose punctuation options in cli and transcribe() (#973)
ryanheise 6ed4c11
Merge branch 'main' into word-level-timestamps
jongwook 31cd418
fix merge error
jongwook 145f325
fix merge error 2
jongwook 2b079c4
annotating that word_timestamps is experimental
jongwook File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,3 +4,4 @@ tqdm | |
more-itertools | ||
transformers>=4.19.0 | ||
ffmpeg-python==0.2.0 | ||
dtw-python==1.3.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing ? and others. See:
https://github.com/jianfch/stable-ts/blob/5fdaff51caec148c80b8e70311e8c62d07b07146/stable_whisper/stabilization.py#L440-L443
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would actually be convenient to actually re-insert the punctuation tokens so that concatenating all the words is the same as concatenating all the tokens. That would just make processing easier on the consumer end. For reference, Amazon Transcribe includes timestamped punctuation tokens in the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice there is now a TODO comment leaning to the first approach:
I'm not sure if you've already committed to that approach, but I would vote for not removing the punctuation, so that whether a consumer wants to traverse the entire content by token, by word, or by segment, they can and do it in either of these 3 ways and in all the content is there (the concatenation of each result is identical). Otherwise if I consume the results by word, I would need to simultaneously look up one of the other two results to cross reference the, and look for the bits that were omitted from the sequence. Here is how Amazon Transcribe does it, for example.
On the other hand, if that is not persuasive, you might consider instead making it an option whether or not to strip out the punctuation.
(I note also that if you just left the punctuation in, the consumer would still have the ability to filter them out.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions; I've updated so that the punctuation characters are attached to the adjacent words, while keeping the word's timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that looks good. I think if the
prepend_punctuations
andappend_punctuations
parameters were propagated intranscribe()
andcli()
that would be quite helpful, since then I could set them to empty strings to emulate the Amazon Transcribe behaviour.