-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text: Fix for garbage displayed at end of ssa subtitles #8265
Conversation
This occurs on some mkv files. The subtitles show perfectly in VLC player and parole. These are files encoded from TS to mkv using HandbrakeCLI v1.3.3. The ssa text is terminated by a null (\x00). In these mkv files the null is followed by zero to five bytes of random data. The random data is showing on the screen as junk characters at the end of the subtitle text. The solution is to change SsaDecoder to use readNullTerminatedString instead of readLine. Sample video: https://www.filezzz.com/d/c6f1229633284d1388d21b18ca568230/SSA_subtitle_garbage_at_end.mkv/preview Subtitle examples from above video as seen without this fix: (Swallows pill) Yeah. 0vd You read it yet? - Yeah, I started it. You? DTG1 - It's on my list of many things to do. �O_�
Thanks for the contribution - I'm not sure about this though... SSA is a fundamentally plaintext format (based originally on SubRip), so it seems weird that Handbrake is encoding null-terminated strings instead of line-terminator-terminated. I appreciate the SSA spec is pretty fluid (/non-existent) - but this does feel more like a handbrake bug... If we accept this I think we should keep the It would also be great to add a sample to What do you think? |
I did check on another mkv files not encoded with this HandbrakeCLI, and I found that the text ends with The code I provided will still work if there is no null terminator, readNullTerminatedString will stop at the end of the string. I first thought of your approach with readLine and search for the null in the string but I was not sure if the null character will be found in the string or be discarded by the readLine. Also it occurs to me that this is risky as you are reading random data into a string, and results may be unpredictable since data after the null may or may not be interpreted as a UTF-8 character or perhaps an incomplete UTF-8 character. Adding a sample to SsaDecoderTest: I will need to understand how the tests work and how to run them. Do you have any guidance on that? |
Thinking about this a bit more, it's specifically a problem with SSA-inside-MKV (I think it would very weird if a plaintext, sidecar .ssa file had The 'spec' for this doesn't mention null termination: http:https://matroska.sourceforge.net/technical/specs/subtitles/ssa.html I wonder if we should change Lines 1307 to 1311 in 59b34ed
Fair point - I think this is avoided by doing the truncation in
I think this is also avoided if we truncate in
I would hold off on this for now - if we decide to change But for future reference:
I'll ask for some other opinions on the team about how we want to approach this - thanks for your patience! |
Thanks for your patience on this - I spent some time testing the change I proposed above in So that will be pushed to dev-v2 shortly, and then this PR won't be needed any more and you can probably close it. |
It makes sense for ExoPlayer to try and handle this nicely, since the spec is loosely defined and as you say there's already content out there exhibiting this - but there's a risk it has unforeseen consequences and we have to roll it back. You may want to raise the issue with handbrake as well, and whichever other tools create content like this, since it seems to me the matroska samples should just be correctly sized rather than using null-termination (even if this isn't clear in the spec, it seems common sense). If they push back, perhaps it would be possible to get some clarification from the spec maintainers: https://www.matroska.org/contact.html |
This was reported for SSA/ASS in PR #8265, but it seems to me the SubRip part of the Matroska spec is similarly loose, so this change handles null-terminated strings in both. #minor-release PiperOrigin-RevId: 345452667
I opened a HandBrake issue at HandBrake/HandBrake#3258 . |
This should be fixed by 74bbd53 |
Tested and it looks good. Thanks. |
This was reported for SSA/ASS in PR #8265, but it seems to me the SubRip part of the Matroska spec is similarly loose, so this change handles null-terminated strings in both. PiperOrigin-RevId: 345452667
This occurs on some mkv files. The subtitles show perfectly in VLC player
and parole. These are files encoded from TS to mkv using HandbrakeCLI
v1.3.3.
The ssa text is terminated by a null (\x00). In these mkv files the null
is followed by zero to five bytes of random data. The random data is
showing on the screen as junk characters at the end of the subtitle
text. The solution is to change SsaDecoder to use readNullTerminatedString
instead of readLine.
Sample video:
https://www.filezzz.com/d/c6f1229633284d1388d21b18ca568230/SSA_subtitle_garbage_at_end.mkv/preview
Subtitle examples from above video as seen without this fix:
(Swallows pill) Yeah. 0vd
You read it yet?
things to do. �O_�