Function add_word_timestamps in timing, words are in wrong segment #1082

Trsa993 · 2023-03-12T19:53:07Z

Trsa993
Mar 12, 2023

Hello,

There is a little mismatch with word level timestamps, text is good for segment level, but words can go in current segment from the next segment, and then not appear in next segment. Mismatch is increasing for more segments.

I modified code a little and now it is ok, maybe there is a better solution but I did it like this:

In timing module, function add_word_timestamps:

# added -1 so that end of segment always ends at the last index for that segment
segment_lengths = [len(s["tokens"]) - 1 for s in segments] 
token_sources = np.repeat(np.arange(len(segments)), segment_lengths)

# removed np.pad because we need end of word not beggining, so 0 at the beggining is not needed
word_boundaries = np.cumsum([len(w.tokens) for w in alignment])

# added condition if next index in token_sources is different than the previous remove that index, so that next segment starts from the beggining for that segment
for i, timing in enumerate(alignment):
        if timing.word:
            segment = segments[token_sources[word_boundaries[i]]]
            if i < len(word_boundaries) - 1:
                if token_sources[word_boundaries[i]] != token_sources[word_boundaries[i+1]]:
                    token_sources = np.delete(token_sources, word_boundaries[i])

I hope you understand me. I didn't have time to test it more, tomorrow I will and let you know.

Best regards,
Milos

Trsa993 · 2023-03-13T10:07:18Z

Trsa993
Mar 13, 2023
Author

I also added else statement if there is no word:

        else:
            if i < len(word_boundaries) - 1:
                if token_sources[word_boundaries[i-1]] != token_sources[word_boundaries[i+1]]:
                    token_sources = np.delete(token_sources, word_boundaries[i])

Now it seems that everything is in place, but I may be wrong.

0 replies

guillaumekln · 2023-03-13T16:26:47Z

guillaumekln
Mar 13, 2023

Hi @Trsa993, thanks for opening this discussion. I found the same issue.

In PR #1087 I'm proposing to rewrite this piece of code with basic Python loops. In my opinion it is easier to understand and less error-prone than the Numpy operations.

0 replies

KMC07 · 2023-03-13T20:11:22Z

KMC07
Mar 13, 2023

From my testing, the segment length is including the special tokens, i.e. tokens after tokenizer.eot, but the alignment function does not.
I'm not sure if this is the right way but I managed to fix the issue by not including them in the segment lengths as seen by changing these lines.

time_offset = segments[0]["seek"] * HOP_LENGTH / SAMPLE_RATE

segment_lengths = [len(s["tokens"]) for s in segments]

token_sources = np.repeat(np.arange(len(segments)), segment_lengths)

into this.

time_offset = segments[0]["seek"] * HOP_LENGTH / SAMPLE_RATE

segment_tokens = [[t for t in s["tokens"] if t < tokenizer.eot] for s in segments]
segment_lengths = [len(tok) for tok in segment_tokens]

token_sources = np.repeat(np.arange(len(segments)), segment_lengths)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function add_word_timestamps in timing, words are in wrong segment #1082

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Function add_word_timestamps in timing, words are in wrong segment #1082

Trsa993 Mar 12, 2023

Replies: 3 comments

Trsa993 Mar 13, 2023 Author

guillaumekln Mar 13, 2023

KMC07 Mar 13, 2023

Trsa993
Mar 12, 2023

Trsa993
Mar 13, 2023
Author

guillaumekln
Mar 13, 2023

KMC07
Mar 13, 2023