Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web interface does not render composed unicode characters correctly #19913

Closed
ronisbr opened this issue Jun 7, 2022 · 17 comments · Fixed by #19990
Closed

Web interface does not render composed unicode characters correctly #19913

ronisbr opened this issue Jun 7, 2022 · 17 comments · Fixed by #19990
Labels
topic/ui Change the appearance of the Gitea UI type/bug

Comments

@ronisbr
Copy link

ronisbr commented Jun 7, 2022

Description

All composed UTF-8 characters, like s̄_b, ṡ_b, etc., are not rendered correctly in Gitea.

Screenshots

See how s̄_b is being rendered. It even shows that there are a hidden unicode characters in this line.

Captura de Tela 2022-06-07 às 14 54 35

Gitea Version

1.16.8

Can you reproduce the bug on the Gitea demo site?

Yes

Operating System

macOS

Browser Version

Safari 15.4

@ronisbr ronisbr added type/bug topic/ui Change the appearance of the Gitea UI labels Jun 7, 2022
@wxiaoguang
Copy link
Contributor

Could you provide a demo on try.gitea.io? Then people can work on it directly.

@ronisbr
Copy link
Author

ronisbr commented Jun 8, 2022

@wxiaoguang
Copy link
Contributor

Thank you. I did a quick test on GitHub https://github.com/wxiaoguang/playground/blob/main/test-utf8.jl

$ curl https://raw.githubusercontent.com/wxiaoguang/playground/main/test-utf8.jl | hexdump
0000000 b9e1 5fa1 0a62 cc73 5f84 0a62
000000c

$ curl https://try.gitea.io/ronisbr/UTF8/raw/branch/main/test.jl | hexdump
0000000 b9e1 5fa1 0a62 cc73 5f84 0a62
000000c

GitHub doesn't report the warning.

@ronisbr
Copy link
Author

ronisbr commented Jun 8, 2022

Yes, it always worked in github.

@zeripath
Copy link
Contributor

zeripath commented Jun 8, 2022

Looks fine for me:

Screenshot from 2022-06-08 10-12-45

@zeripath
Copy link
Contributor

zeripath commented Jun 8, 2022

GitHub doesn't report the warning.

Github is wrong to not report that there is something odd. ë and ë are not the same characters.


The problems stems from Safari's rendering. The code looks like:

<span class="n">s<span class="escaped-code-point" data-escaped="[U+0304]"><span class="char">̄</span></span>_b</span>

The <span class="escaped-code-point" data-escaped="[U+0304]"><span class="char"> is all inline and zero width so the overbar (U+0304) should still be being rendered over the s. Safari is incorrectly rendering this.


Now, how could we fix? Without having access to a Safari browser I'm not sure. Is there any way get Safari to just do the right thing with the spacing of the combining character here?


We do this splitting because it makes writing the escaping/unescaping extremely easy for combining marks:

case unicode.Is(unicode.M, r):
escaped.Escaped = true
escaped.HasMarks = true
if writePos < i {
if _, err = output.Write(bs[writePos:i]); err != nil {
escaped.HasError = true
return
}
}
if err = writeEscaped(output, r); err != nil {
escaped.HasError = true
return
}
writePos = i + size

In order to not do it we'd have to coalesce bytes that can be combined together and emit an escape block for them together eg.

<span class="n"><span class="escaped-code-point" data-escaped="s[U+0304]"><span class="char"></span></span>_b</span>

That coalescing would require us to write the escaper to understand the rules for rendering of combining characters and have the state for handling these.

For example if we had: ̄ (that is [U020] + [U304]) Should that be coalesced as +[U304] or kept as [U304]? How about multiple combining characters e.g.: ē̂ e +U304 + U302? How about when/if we get round to properly dealing with ambiguous characters like с and С (which are not the same as c and C)? If these have combining characters do we coalesce or escape the ambiguous character separately?


If it is possible to get Safari just render the combining character in the right place that would be deeply helpful instead.

@ronisbr
Copy link
Author

ronisbr commented Jun 8, 2022

Hi @zeripath ! Thanks!

Indeed, in Chrome it works fine. It shows the warning (which should be OK) but the symbol is displayed correctly.

@zeripath
Copy link
Contributor

zeripath commented Jun 8, 2022

I guess we should coalesce these.

@zeripath
Copy link
Contributor

zeripath commented Jun 8, 2022

ugh it's even harder because we run this on already rendered data.

@silverwind
Copy link
Member

silverwind commented Jun 8, 2022

If it is possible to get Safari just render the combining character in the right place that would be deeply helpful instead.

I don't think it's possible. Seems Safari's unicode composing does not work across tags, so all joined characters must be rendered as one continuous string, e.g. [char1][char2], not <span>[char1]</span>[char2].

Also, separate issue: This is another false-positive for the Bidi warning, that code seems to be way too aggressive in its detection.

@zeripath
Copy link
Contributor

zeripath commented Jun 8, 2022

It's not a BIDI warning. It's a hidden character warning.

@silverwind
Copy link
Member

So it's definitely a Safari bug: https://jsfiddle.net/9j0zc4su/

Safari rendering:

image

Firefox rendering:

image

@silverwind
Copy link
Member

It's not a BIDI warning. It's a hidden character warning.

Okay, but there's nothing actually "hidden", is there?

@ronisbr
Copy link
Author

ronisbr commented Jun 8, 2022

So it's definitely a Safari bug: https://jsfiddle.net/9j0zc4su/

But, should <span>&#x1F44B;</span>&#x1F3FD; be really merged? In Safari, <span>&#x1F44B;&#x1F3FD;</span> works.

@silverwind
Copy link
Member

So it's definitely a Safari bug: https://jsfiddle.net/9j0zc4su/

But, should <span>&#x1F44B;</span>&#x1F3FD; be really merged? In Safari, <span>&#x1F44B;&#x1F3FD;</span> works.

I think composition should not be influenced by tag boundaries and other browsers seem to agree as well.

@zeripath
Copy link
Contributor

zeripath commented Jun 8, 2022

It's not a BIDI warning. It's a hidden character warning.

Okay, but there's nothing actually "hidden", is there?

Which of these are the same ëëëëëëëëëë?

@silverwind
Copy link
Member

silverwind commented Jun 8, 2022

It's not a BIDI warning. It's a hidden character warning.

Okay, but there's nothing actually "hidden", is there?

Which of these are the same ëëëëëëëëëë?

I'd call them "ambigous characters" and I question whether we should actually warn on them. Hidden is something different in my eyes, like zero width space 😉.

zeripath added a commit to zeripath/gitea that referenced this issue Jun 16, 2022
This PR rewrites the invisible unicode detection algorithm to more
closely match that of the Monaco editor on the system. It provides a
technique for detecting ambiguous characters and relaxes the detection
of combining marks.

Control characters are in addition detected as invisible in this
implementation whereas they are not on monaco but this is related to
font issues.

Close go-gitea#19913

Signed-off-by: Andrew Thornton <[email protected]>
zeripath added a commit that referenced this issue Aug 13, 2022
This PR rewrites the invisible unicode detection algorithm to more
closely match that of the Monaco editor on the system. It provides a
technique for detecting ambiguous characters and relaxes the detection
of combining marks.

Control characters are in addition detected as invisible in this
implementation whereas they are not on monaco but this is related to
font issues.

Close #19913

Signed-off-by: Andrew Thornton <[email protected]>
vsysoev pushed a commit to IntegraSDL/gitea that referenced this issue Aug 28, 2022
This PR rewrites the invisible unicode detection algorithm to more
closely match that of the Monaco editor on the system. It provides a
technique for detecting ambiguous characters and relaxes the detection
of combining marks.

Control characters are in addition detected as invisible in this
implementation whereas they are not on monaco but this is related to
font issues.

Close go-gitea#19913

Signed-off-by: Andrew Thornton <[email protected]>
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
topic/ui Change the appearance of the Gitea UI type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants