Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PNG subtitles in SMPTE-TT (Issue: #1583) #5066

Merged
merged 6 commits into from
Dec 14, 2018

Conversation

szaboa
Copy link
Contributor

@szaboa szaboa commented Nov 6, 2018

No description provided.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 6, 2018

@ojw28
I would appreciate some help on this.

You can see my changes on branch feature/1583_support_png_ttml, I also added a test stream for now:
https://livesim.dashif.org/dash/vod/testpic_2s/img_subs.mpd

The test stream has 3 different subtitle tracks. The first one uses exact pixels inside its regions, which is not supported (as far as I seen) so right know it's not shown, because no region will be parsed and the bitmap width is set by the anonymous region, which is 0 (need to add a default width in this case).

However you should be able to see the second subtitle.
https://livesim.dashif.org/dash/vod/testpic_2s/img_subs_region_percent.xml

It has three cues as you can see in the xml. The thing is that the com.google.android.exoplayer2.text.ttml.TtmlSubtitle#getCues is called only once (not sure how many times this should be called) and it returns all of the cues (3), and these three will be rendered at the same time, so we get overlapped cues as a result.

I tried also to filter these cues inside com.google.android.exoplayer2.text.ttml.TtmlNode#traverseForImage by checking if the cue is active (isActive(long timeUs)) but in that case com.google.android.exoplayer2.text.ttml.TtmlSubtitle#getCues will return 1 cue only, as a result we will have only one rendered and it won't be replaced.

I am trying to understand how this really works, but any help would be appreciated. Also if you have any other observations, please let me know. For example, I am not entirely sure if traverseForImage is correct / needed or rather traverseForText should be modified.

Thanks in advance!

@ojw28
Copy link
Contributor

ojw28 commented Nov 8, 2018

I think you're on the right track. getCues will be called on the root of the tree each time the cues are to be generated. The timeUs parameter passed to getCues is the position for which the cues are to be generated. The traverseX calls then go down the tree gathering the information needed to generate the cues to be output.

  • I think you probably do want traverseForImage to be separate, but I didn't spend any time thinking about it.
  • Yes, you need to have an isActive check, otherwise you'll output everything, rather than just subtitles corresponding to timeUs.
  • The key thing you're missing is that you haven't modified TtmlNode.getEventTimes. This method is supposed to generate the set of times at which the cues change. Currently it's only looking at TAG_P nodes and their descendants. I think you need to modify it to also look at TAG_DIV nodes with non-null imageId values? Once you do this, you should see that getCues is called more than once.

Look for "div" tags when generating the set of times.
@szaboa
Copy link
Contributor Author

szaboa commented Nov 8, 2018

Thank you for the tips!
Pushed the changes, it is working fine now.

Quick questions:

  1. Should I leave the test stream or remove from this branch?
  2. In case of no region is parsed for an image (e.g. the region is defined by pixels, which is not supported, see jdoc of TtmlDecoder#parseRegionAttributes), the default position should be top-left here as well? See TODO in line 313 that says this in TtmlDecoder.java.
  3. Not sure what tests should I write. Should I verify the content based on base64 strings?

@ojw28
Copy link
Contributor

ojw28 commented Nov 9, 2018

  1. Remove the test stream
  2. I was a bit confused about what happens with unsupported regions in the current implementation. Am I right in thinking all text content gets assigned to the anonymous region id (see TtmlDecoder.parseNode, specificall the switch statement case for ATTR_REGION). The generated cue will set everything to "unset", and it'll be left up to whatever UI component ends up drawing it to decide where to place it. Is that correct? If so, can it work in the same way for bitmaps as well? You probably need to modify SubtitlePainter.setupBitmapLayout to do something sensible if cueSize == DIMEN_UNSET to make this work properly.
  3. I would expect some tests in TtmlDecoderTest checking that the right thing is output from the decoder at least. Probably one for the bitmap with unsupported region case as well.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 9, 2018

  1. All right.

  2. In the current implementation if a region is unsupported / cannot be parsed then TtmlDecoder.parseNode builds up a TtmlNode with the anonymous region id (TtmlNode.ANONYMOUS_REGION_ID). So for example if you select the first subtitle track in the provided test stream (which is unsupported), then inside TtmlNode.getCues() whenever is called, the node's region id will by the anonymous one, and it will get back the anonymous TtmlRegion with "unset" attributes from the regionMap. These attributes will be passed to the cue. However if the region's width is Cue.DIMEN_UNSET then I added a default value (0.5f) as fallback, otherwise it won't be rendered. In case of this scenario (unset line, lineAnchor, position, positionAnchor) the SubtitlePainter draws the bitmap in the top-left corner.

So to sum up, the result for unsupported regions looks like this right now:
https://imgur.com/a/06pDEvd
This can remain like this (top-left corner) or we should add some default line and position?

If so, can it work in the same way for bitmaps as well?

As far as I see, yes. It is already working like this.

You probably need to modify SubtitlePainter.setupBitmapLayout to do something sensible if cueSize == DIMEN_UNSET to make this work properly.

Should I add here the 0.5f default width? Which is currently handled in TtmlNode.getCues().

  1. All right.

@ojw28
Copy link
Contributor

ojw28 commented Nov 9, 2018

Regarding the unsupported region: If the decoder doesn't know where a cue should be displayed, it IMO shouldn't fill in any defaults. So I think the correct thing to do is for the decoder to leave everything unset (i.e. not fill in 0.5 for the width). Which is what happens for text, I think.

It should then be up to SubtitlePainter to decide what it wants to do with cues that don't define how they're positioned. You probably want to modify SubtitlePainter.setupBitmapLayout to set a sensible default bitmapRect for this case. I would have thought somewhere toward the bottom would make most sense, which I think is what happens for text. You need to size it so that the bitmap stays in proportion also.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 9, 2018

Yeah, you are right :)

I'll do these changes on the weekend!

@szaboa
Copy link
Contributor Author

szaboa commented Nov 10, 2018

@ojw28
Added two test cases, removed the test stream.
I removed the code which sets default width for the region from TtmlNode.getCues(), however I am not sure if I handled correctly these default values inside SubtitlePainter.setupBitmapLayout(). Please have look on it.

@ojw28 ojw28 changed the title [WIP] #1583 - Support PNG subtitles in SMPTE-TT Support PNG subtitles in SMPTE-TTIssue: #1583) Nov 12, 2018
@ojw28 ojw28 changed the title Support PNG subtitles in SMPTE-TTIssue: #1583) Support PNG subtitles in SMPTE-TT Nov 12, 2018
@ojw28 ojw28 changed the title Support PNG subtitles in SMPTE-TT Support PNG subtitles in SMPTE-TT (Issue: #1583) Nov 12, 2018
@@ -68,6 +68,8 @@
private static final String ATTR_END = "end";
private static final String ATTR_STYLE = "style";
private static final String ATTR_REGION = "region";
private static final String ATTR_IMAGE = "backgroundImage";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - please remove extra blank line

@@ -127,7 +130,7 @@ protected TtmlSubtitle decode(byte[] bytes, int length, boolean reset)
Log.i(TAG, "Ignoring unsupported tag: " + xmlParser.getName());
unsupportedNodeDepth++;
} else if (TtmlNode.TAG_HEAD.equals(name)) {
parseHeader(xmlParser, globalStyles, regionMap, cellResolution);
parseHeader(xmlParser, globalStyles, regionMap, cellResolution, imageMap);
Copy link
Contributor

@ojw28 ojw28 Nov 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - It might be nice for the two "Map" variables to be next to one another (consider this ignorable :)). It would also be nicely consistent with the TtmlSubtitle constructor if this were the case (which takes regionMap and then imageMap as the next argument).

} while (!XmlPullParserUtil.isEndTag(xmlParser, TtmlNode.TAG_HEAD));
return globalStyles;
}

public void parseMetaData(XmlPullParser xmlParser, Map<String, String> imageMap) throws IOException, XmlPullParserException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - We consider metadata to be one word, so should be Metadata rather than MetaData.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be private.

do {
xmlParser.next();
if (XmlPullParserUtil.isStartTag(xmlParser, TtmlNode.TAG_SMPTE_IMAGE)) {
for (int i = 0; i < xmlParser.getAttributeCount(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think looking here does anything. You can just execute what's inside the loop once, I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, totally correct. First I was manually searching for the "id" attribute, then I found out that there is an util method for this. Forgot to remove the loop. Thanks :)

TreeMap<String, SpannableStringBuilder> regionOutputs = new TreeMap<>();
List<Pair<String, String>> regionImageList = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer to call these regionTextOutputs and regionImageOutputs or something, to make it clearer one is the image equivalent of the other.


// Create image based cues
for (Pair<String, String> regionImagePair : regionImageList) {
String base64 = imageMap.get(regionImagePair.second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth handling the case that imageMap.get returns null (i.e. do nothing in this case, because the image ref didn't point to a valid image).

cues.add(
new Cue(decodedByte,
region.position,
Cue.ANCHOR_TYPE_MIDDLE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this definitely correct? As opposed to TYPE_UNSET, which is what's used for the position anchor in the text case below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the SubtitlePainter should decide how to render this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I know why I put this here. SubtitlePainter.setupBitmapLayout() already doing some logic based on the position anchor. And I didn't want to affect that by applying logic like:
cuePositionAnchor == Cue.TYPE_UNSET then set anchor type middle because other subtitle decoders might rely on this, that's why I moved up this into the TtmlNode (inspiring from DvbParser).

Right now for unsupported region I set the position anchor to middle as default, (see your suggestion below) however this is not the case for supported regions, as cuePosition == Cue.DIMEN_UNSET will be false in SubtitlePainter. setupBitmapLayout().

Without applying this position anchor, the vertical position won't be the same as in the dashif player. What do you think, should I leave this anchor assignment inside TtmlNode? Or we could modify the fallback logic for unset position anchors inside setupBitmapLayout, like this:

int y = Math.round(cuePositionAnchor == Cue.ANCHOR_TYPE_END ? (anchorY - height)
        : cuePositionAnchor == Cue.ANCHOR_TYPE_MIDDLE ? (anchorY - (height / 2)) : anchorY);

to

int y = Math.round(cuePositionAnchor == Cue.ANCHOR_TYPE_END ? (anchorY - height)
        : cuePositionAnchor == Cue.ANCHOR_TYPE_START ? anchorY : (anchorY - (height / 2)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think you should modify setupBitmapLayout to do something sensible if things are unset, and output it as unset from the decoder. There aren't any subtitle decoders setting unset anchors at the moment, so adding logic for that case in setupBitmapLayout should be fine (this is also probably why such logic is missing ;)).

@@ -355,6 +355,22 @@ private void setupTextLayout() {
}

private void setupBitmapLayout() {
// Default position
if (cuePosition == Cue.DIMEN_UNSET) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add cuePositionAnchor = ANCHOR_TYPE_MIDDLE to this block?


// Default width
if (cueSize == Cue.DIMEN_UNSET) {
cueSize = 0.5f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way these defaults work can cause a subtitle to be drawn in quite a weird way, I think. This looks like it'll default the cue to occupy 50% of the screen horizontally. Suppose the bitmap subtitle is a very short word, like "Hi". Wont it end up being sized to occupy nearly the entire vertical space in this case :)?

You may need to do something a little more clever, where you define a box within which you want the bitmap to be displayed (e.g. 50% of space horizontally and 10% vertically), and then figure out how to compute the bitmapRect so that it doesn't go outside of that box.

Copy link
Contributor Author

@szaboa szaboa Nov 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right :)

And lets say "Hi" and "Hi, how are you?" should have the same height, that's not case now. I mean the cues should try to have the same height until they can fit, or something like this. I'll try to find a solution for this today.

The other adjustments based on the comments are done, but haven't pushed those yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the general case I don't think it's possible for you to keep the line height the same if the region hasn't been defined. Doing so would require you to know how many lines are in the bitmap, and you don't know that (so really, having the region being defined properly is fairly important). I think fitting withing a box that constraints it to a sensibly small portion of the screen is the best you can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea is to fix the height at 10% as you said and make the bitmap to fit this height (keeping aspect ratio), however if the bitmap goes beyond the box's width (50%) then will be scaled down by its width until hits the "box" (keeping aspect ratio here as well). I think this is an acceptable solution for "one line" images, however if the cue has two lines then the "line height" will be the half of specified box's height (= 5%). I don't see a solution for this particular case, because we don't know all of the bitmap's sizes ahead (e.g. live tv).

Do you think this is a viable solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you describe is the best you can do (short of something crazy like using OCR to figure out how many lines there are - which isn't something we're realistically going to do :)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's easier to add support for regions that are defined by pixels, than using OCR, however nice idea, hehe :)

Thanks for the quick response, I will do the way I described then.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 12, 2018

Adjustments are done based on the comments, please have an other look on them.
Note that I didn't remove the default middle position anchor from TtmlNode, see the comment above.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 13, 2018

I just tested this with a real live stream, and the subs were extra small, I need to investigate why.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 13, 2018

@ojw28
As I wrote earlier, I've tested this implementation with a real live tv channel, and as it turned out, they're sending the images in "full screen" (with transparent area),

 </metadata>
    <styling />
    <layout>
      <region tts:extent="720px 576px" tts:origin="0% 0%" xml:id="Region" />
    </layout>
  </head>
  <body>
    <div smpte:backgroundImage="#Subtitle_2208" region="Region" begin="03:14:28.829" end="03:14:28.868" />
    <div smpte:backgroundImage="#Subtitle_2209" region="Region" begin="03:14:28.868" end="03:14:30.021" />

This leads to a problem, because based on the logic that we discussed yesterday this "full screen" image will be pushed into a smaller box (50% width, 10% height) and the cues will be extremely small, impossible to read. If I manually set the cueSize and cueBitmapHeight to 1.0f for this particular TTML, then it is rendered "correctly", same way as on other players.

Do you think we can add support for regions that are defined by pixels?

Or can you think of any shortcut that could solve this edge case? One idea would be to parse the pixels also, and check the region size against common video resolutions (1920x1080, 1280x720, 720x576 etc), and assume fullscreen if so. Of course this would be checked only in case of image based cues.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 14, 2018

In my opinion the proposed solution won't work, the problem is that videoSize doesn't correlate with the region pixels in TTML subs. So in case of adaptive streams, the video size changes but the sub size (720x546) stays constant. This will result varying cue sizes.

Do you see the other proposed solution viable? To check the region size against common video sizes and assume as full screen in case of a match?

@ojw28
Copy link
Contributor

ojw28 commented Nov 14, 2018

In my opinion the proposed solution won't work, the problem is that videoSize doesn't correlate with the region pixels in TTML subs. So in case of adaptive streams, the video size changes but the sub size (720x546) stays constant. This will result varying cue sizes.

I think that pixel regions defined in TTML captions must be in the context of some window, also in pixels, that's defined one way or another. My proposal above was based on the assumption that the window is defined by the resolution of the original (non-adaptive) video stream to which the TTML captions correspond.

  • For a non-adaptive stream, is my assumption about the window being defined by the resolution of the video stream correct?
  • Is there any mechanism in TTML for specifying the window in terms of pixels? In particular, is it possible to specify the window in pixels within the TTML caption file? If so and in cases where this happens, it should be possible for the decoder to convert pixel defined regions into fraction defined regions.
  • If the window is defined by the original (non-adaptive) video stream, then I'd claim use of such windows stops making sense as soon as there's a server side processing pipeline that's re-transcoding the video into a bunch of other resolutions. If such a pipeline exists, I'd argue that it also needs to be re-transcoding the TTML to use fractional regions so that they're compatible with the different video transcodes, or that it needs to somehow add a window definition in pixels into the TTML, if that's something TTML supports. To re-transcode the video but not the TTML, when there's a dependency between the two, does not seem like a sensible thing for a pipeline to be doing. There becomes a point at which saying the pipeline needs fixing is a legitimate statement.

To check the region size against common video sizes and assume as full screen in case of a match?

I would rather we not do this. It sounds like a hack to work around the serving side not doing the right thing.

Thoughts?

@szaboa
Copy link
Contributor Author

szaboa commented Nov 15, 2018

For a non-adaptive stream, is my assumption about the window being defined by the resolution of the video stream correct?

Correct, but it would be better to handle non-adaptive and adaptive streams together.

Is there any mechanism in TTML for specifying the window in terms of pixels? In particular, is it possible to specify the window in pixels within the TTML caption file? If so and in cases where this happens, it should be possible for the decoder to convert pixel defined regions into fraction defined regions.

The TTML should specify a window indeed, currently I am going through the specs / standards to find an official way to solve this.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 15, 2018

As it turned out, the window shall be present when the regions are defined in pixels.
See specs: https://www.w3.org/TR/ttml-imsc1.0.1/

If the Document Instance includes any length value that uses the px expression, tts:extent SHALL be present on the tt element.

This tts:extent is present in the test stream's TTML from dashif:
https://livesim.dashif.org/dash/vod/testpic_2s/img_subs_region_px.xml

For yet unknown reasons, the Live TV streams from our app which I used to previously to test this, doesn't send this attribute, we are trying to find out why.

However, I would like to propose this solution:

  • In cases where the regions are defined by pixels, we can use the tts:extent to convert to fraction defined regions inside the decoder (probably for text as well?)
  • In an edge where this tts:extent is missing, assume that the cue should be full screen.

What do you think?

@nigelmegitt
Copy link

You should be aware that the handling of images in SMPTE-TT is a little different from IMSC 1.0.1 Image Profile. I'm not clear what format the content is in, but my recommendation would be to go for IMSC support if possible.

IMSC 1.0.1 Image Profile is the same as IMSC 1.1 Image profile in this respect.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 15, 2018

@nigelmegitt

Thank you for pointing out this, however unfortunately I am not aware or fully understand the differences between these profiles. The test stream which I used to add support for images in TTML is from dashif: https://livesim.dashif.org/dash/vod/testpic_2s/img_subs_region_px.xml.

But in our app, a couple of live tv streams have TTML subs with EBU profiles.

I think the main goal (if @ojw28 also agrees) would be to add a basic support for images inside the TtmlDecoder for now, without worrying about the differences between profiles. Which basically would be to parse and render images based on the defined regions (pixel or percentage based).

These basic "features" are the same in different profiles?

@nigelmegitt
Copy link

@szaboa the file you linked is not an IMSC 1.x Image Profile document, because it encodes the images in smpte:image elements, which are prohibited in IMSC but permitted in SMPTE-TT.

Due to issues with smpte:image IMSC Image profile only supports smpte:backgroundImage so these features are not the same in different profiles. Whether or not you consider them to be basic I will leave to you!

@szaboa
Copy link
Contributor Author

szaboa commented Nov 15, 2018

Need to research what is the difference between smpte:image and smpte:backgroudImage as I am not aware of this right now. Do you think if we handle both smpte:image and smpte:backgroundImage then we would have the above mentioned basic "feature" for SMPTE-TT, IMSC, EBU as well? With other words, will these different profiles would be covered?

@nigelmegitt
Copy link

I don't know details of how SMPTE-TT is used in practice, but I'd have thought that handling smpte:backgroundImage in an IMSC Image Profile way is a good start. Obviously it depends on what content people want to be presented.

EBU doesn't use image subtitles at all - all the EBU subtitle specs, and the IMSC Text Profile prohibit use of images altogether.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 15, 2018

@nigelmegitt
"Mixed profiles" in a TTML is a thing? In one of the TTMLs I can see things like:

<ebuttm:documentMetadata>
        <ebuttm:conformsToStandard>urn:ebu:tt:distribution:2014-01</ebuttm:conformsToStandard>
</ebuttm:documentMetadata>

But the images are defined with smpte:image, so that's why I assumed this is an EBU profile with image?

@ojw28
Copy link
Contributor

ojw28 commented Nov 15, 2018

I would hope that the various profiles aren't actually incompatible with one another. If that's the case then I'm not sure "what we should go with" is the right question to be asking, because it's not a choice of one or the other. It seems reasonable that we would accept contributions covering whatever features and/or profiles contributors wish to add support for.

However, I would like to propose this solution ... In cases where the regions are defined by pixels, we can use the tts:extent to convert to fraction defined regions inside the decoder (probably for text as well?)

This sounds good to me.

In an edge where this tts:extent is missing, assume that the cue should be full screen.

Do you have any evidence that this is the best thing to do, beyond it being correct for the streams you happen to want to play? I'm not sure that's a particularly good reason (vs dropping the cues because they're not properly specified).

@nigelmegitt
Copy link

@szaboa can you provide a pointer to the TTML document instance that claims to be an EBU profile and has an image in it? The link from #5066 (comment) does not have this.

It is fine to mix vocabularies from different profiles, but doing so may result in a document that does not conform to any single one of those profiles (while still being a valid TTML document). There is not usually any conformance problem created by adding metadata vocabulary like ebuttm:conformsToStandard to any profile but if the value in such an element is incorrect that would create an inconsistency that some processors might not know how to handle.

The example in #5066 (comment) is a case in point - if a document has that URN in then it is claiming to be an EBU-TT-D document, but if it contains an image element then it is not conformant, so it is lying about itself!

@szaboa
Copy link
Contributor Author

szaboa commented Nov 15, 2018

So, I went through the IMSC and SMPTE specifications; source:
https://www.smpte.org/sites/default/files/st2052-1-2010.pdf
https://www.w3.org/TR/2017/WD-ttml-imsc1-20170131/

The conclusion is the following:

SMPTE:
Both smpte:backgroundImage and smpte:image can be present in TTML.
The smpte:image element is used to define pre-rendered images in TTML's metadata, in base64 encoding.
The smpte:backgroundImage element refers to smpte:image in the same document based on an id if has a #fragment extension. In case of no #fragment extension this refers to an external URI.

IMSC:
Differences comparing with SMPTE-TT:
The smpte:image attribute is not permitted in IMSC.

  • smpte:image SHALL NOT be used.

The smpte:backgroundImage attribute shall reference a PNG datastream. Based on a couple of example TTMLs this attribute has values like "<file_name>.png", so it's not containing a full path.

  • smpte:backgroundImageHorizontal and smpte:backgroundImageVertical SHALL NOT be used.

We do not aim to use these attributes in case of SMPTE at this point, so not a problem.

SMPTE allows background images in p, span, br

Right now we are searching for images only in divs, so not a problem right now.


Currently on this pull request, we only support the SMPTE profile, and only if the smpte:backgroundImage points to smpte:image.

@ojw28
Do you think we can support to fetch the PNGs from external sources? In this way we would have support for the "other half" of SMPTE and also for IMSC, I guess.

Do you have any evidence that this is the best thing to do, beyond it being correct for the streams you happen to want to play? I'm not sure that's a particularly good reason (vs dropping the cues because they're not properly specified).

Unfortunately no evidence, but we can follow-up on this after the profiles are clarified.

@nigelmegitt
Can you please confirm if the conclusion is right here?
By the way EBU profile can be forgotten in this case, because as you said EBU doesn't use image subtitles, probably a couple streams from our apps are mixing vocabularies from SMTP and EBU, but from image point of view it follows the SMTP profile.

And one additional question, in case of IMSC, what is the base URL of the smpte:backgroundImage? Or from where these PNGs can be resolved?


Thank you both for the collaboration!

@ojw28
Copy link
Contributor

ojw28 commented Nov 16, 2018

Do you think we can support to fetch the PNGs from external sources? In this way we would have support for the "other half" of SMPTE and also for IMSC, I guess.

I don't see any reason to do this unless someone actually needs support for it. Having to maintain features no-one is using isn't an attractive proposition for us :).

@szaboa
Copy link
Contributor Author

szaboa commented Nov 16, 2018

@ojw28 I see your point and I agree.
I would say let's keep this pull request simple and add support for what we have right know. In the future I can add support for IMSC if there's a demand for this.


So the only thing that I need to finish here is to parse tts:extent and use that as a reference window to convert pixel defined regions to fraction defined regions. If no tts:extent is present then we should drop the cue?

@ojw28
Copy link
Contributor

ojw28 commented Nov 19, 2018

If no tts:extent is present then we should drop the cue?

Maybe the decoder should output the cue with dimensions set to UNSET, and the painter should opt to do nothing in this case? It's probably fine to drop it in the decoder instead though, if that's easier.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 22, 2018

@ojw28
Pushed the changes based on what we discussed. In case of no tts:extent is found, the decoder will return the cue with UNSET params. I reverted the changes inside the SubtitlePainter, so there is no fallback anymore, it just won't render the region because of the size attribute will be UNSET.

Also added a new test case where the regions are pixel defined.

Test stream (it has sub tracks with both pixel / percentage regions):
https://livesim.dashif.org/dash/vod/testpic_2s/img_subs.mpd

Ready for code review.

@szaboa
Copy link
Contributor Author

szaboa commented Dec 9, 2018

When does this planned to be merged in? :)

@ojw28
Copy link
Contributor

ojw28 commented Dec 9, 2018

We'll try an get this in this week.


if (isActive(timeUs)) {
if (TAG_DIV.equals(tag) && imageId != null) {
regionImageList.add(new Pair<>(resolvedRegionId, imageId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is it OK to return here? Or do we really need to traverse (L252-254) even though we've found an image at this node?

@ojw28
Copy link
Contributor

ojw28 commented Dec 13, 2018

We're ready to merge this except for the one question I just posted on the traversal code. If you could answer either way, that would be great! There's no need to push any update to the PR; we'll add the return during the merge, if your response indicates it makes sense for us to do so.

Thanks!

@szaboa
Copy link
Contributor Author

szaboa commented Dec 13, 2018

I am pretty sure it is OK to return there, I don't see why we would need to continue to traverse.

@getashok2003
Copy link

When this PR is planned to merge, I need this patch for my project

@szaboa
Copy link
Contributor Author

szaboa commented Dec 14, 2018

@getashok2003
Pretty soon I think, however I don't know when the next release is planned which will include this.

@ojw28 ojw28 merged commit 651db91 into google:dev-v2 Dec 14, 2018
ojw28 added a commit that referenced this pull request Dec 14, 2018
ojw28 added a commit that referenced this pull request Dec 19, 2018
@google google locked and limited conversation to collaborators May 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants