Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch in certain batches #9

Open
sbuser opened this issue Jan 21, 2023 · 3 comments
Open

Shape mismatch in certain batches #9

sbuser opened this issue Jan 21, 2023 · 3 comments

Comments

@sbuser
Copy link

sbuser commented Jan 21, 2023

I've tried for a while here to figure out what is causing this without much success. Batch processing will run for a variety of files but I've come to a group here that throws an IndexError on the 2nd segments of the batch:

File "/app/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 694, in _main_loop
    probs_at_sot.append(logits[:, self.sot_index[i]].float().softmax(dim=-1))
IndexError: index 8 is out of bounds for dimension 1 with size 3

In a normal loop self.sot_index is the same at all indicies:
[8, 8, 8, 8, 8, 8] or [11, 11, 11, 11, 11, 11]

In the batch and segment number that fails it looks like this:

[0, 0, 0, 0, 8, 0]   <-- self.sot_index
0 0 torch.Size([6, 3, 51865])  <-- i, self.sot_index[i], logits.shape
1 0 torch.Size([6, 3, 51865])
2 0 torch.Size([6, 3, 51865])
3 0 torch.Size([6, 3, 51865])
4 8 torch.Size([6, 3, 51865])

I'm not tracking how this is happening. I'm not providing any different languages or an initial prompt, so I'm not understanding the mismatch with sot_index here.

I do see that it hasn't properly transcribed portions of that file from the first segment in the output. I don't see where it would be hanging onto that to cause this problem, but something is broken.

Sorry I'm not of more help on this. I'll keep digging.

@Blair-Johnson
Copy link
Owner

Interesting, I'm happy to help you with debugging this. Do the clips transcribe properly on the official whisper implementation?

@sbuser
Copy link
Author

sbuser commented Jan 21, 2023

It does, yes. Interestingly, not only does it not generate the IndexError, it also does a better job with the transcription itself. Perhaps related to the temperature cascading discussed in the other issue? I'm not sure.

Without changing anything except adding print statements to diagnose this issue, on maybe the 10th run it did actually pass the step it had previously failed (no IndexError) and put in a bunch of garbage in that segment's transcription. I suppose nothing guaranteed the outputs here are deterministic, but that was surprising to me.

In trying to answer this I also found that the fix for no_speech_prob returning an array of all of the probabilities breaks running whisper against a single audio file (when it bypasses all of the batch code).

Edit to clarify on the non-deterministic behavior: - that was probably related to the other files in the batch potentially changing. I'm batching by files size and there were quite a few files with the exact same size so that likely accounts for the differences between runs rather than the model itself being responsible. If so, then it's pretty clear the temperature linking can have a negative effect. Batching certainly has some effect on outcomes because the file is fully and properly transcribed when run by itself.

@JunZhan2000
Copy link

I missed this too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants