Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

center and truncate should be grapheme cluster and unicode width aware #183

Open
Kijewski opened this issue Sep 23, 2024 · 0 comments
Open

Comments

@Kijewski
Copy link
Collaborator

Kijewski commented Sep 23, 2024

Let's have a look at the letter 'ễ' in the surname "Nguyễn". You can either find it as a single composed unicode character U+1EC5. Or decomposed as "e\u{302}\u{303}". When truncating a text, the letter should stay , and not be truncated to ê or e. Or take the emoji "👯‍♂️" (), which is composed of "\u{1f46f}\u{200d}\u{2642}\u{fe0f}", a sequence that must not be split up.

It would be nice if one could make |center and |truncate understand unicode widths (ễ = 1 display character; 👯‍♂️ = 2 display characters), and grapheme clusters. It should be opt in, because the lookup tables are big.

Maybe, instead of modifying the existing functions, new ones should be introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant