Replies: 3 comments
-
I also added else statement if there is no word: else:
if i < len(word_boundaries) - 1:
if token_sources[word_boundaries[i-1]] != token_sources[word_boundaries[i+1]]:
token_sources = np.delete(token_sources, word_boundaries[i]) Now it seems that everything is in place, but I may be wrong. |
Beta Was this translation helpful? Give feedback.
-
Hi @Trsa993, thanks for opening this discussion. I found the same issue. In PR #1087 I'm proposing to rewrite this piece of code with basic Python loops. In my opinion it is easier to understand and less error-prone than the Numpy operations. |
Beta Was this translation helpful? Give feedback.
-
From my testing, the segment length is including the special tokens, i.e. tokens after
into this.
|
Beta Was this translation helpful? Give feedback.
-
Hello,
There is a little mismatch with word level timestamps, text is good for segment level, but words can go in current segment from the next segment, and then not appear in next segment. Mismatch is increasing for more segments.
I modified code a little and now it is ok, maybe there is a better solution but I did it like this:
In timing module, function add_word_timestamps:
I hope you understand me. I didn't have time to test it more, tomorrow I will and let you know.
Best regards,
Milos
Beta Was this translation helpful? Give feedback.
All reactions