Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UBA support #486

Open
avidseeker opened this issue Feb 29, 2024 · 6 comments
Open

UBA support #486

avidseeker opened this issue Feb 29, 2024 · 6 comments

Comments

@avidseeker
Copy link

Unicode Bidirectional Algorithm describes specifications for the positioning of characters in text containing characters flowing from right to left.

Here's an example text:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.

.قِفَا نَبْكِ مِنْ ذِكْرَى حَبِيبٍ ومَنْزِلِ، بِسِقْطِ اللِّوَى بَيْنَ الدَّخُول فَحَوْمَلِ. فَتُوْضِحَ فَالمِقْراةِ لمْ يَعْفُ
رَسْمُها، لِمَا نَسَجَتْهَا مِنْ جَنُوبٍ وشَمْألِ. تَرَى بَعَرَ الأرْآمِ فِي عَرَصَاتِهَا، وَقِيْعَانِهَا كَأنَّهُ حَبُّ
فُلْفُلِ. كَأنِّي غَدَاةَ البَيْنِ يَوْمَ تَحَمَّلُوا، لَدَى سَمُرَاتِ الحَيِّ نَاقِفُ حَنْظَلِ. وُقُوْفًا بِهَا صَحْبِي عَليَّ
مَطِيَّهُمُ، يَقُوْلُوْنَ: لا تَهْلِكْ أَسًى وَتَجَمَّلِ. وإِنَّ شِفائِيَ عَبْرَةٌ مُهْرَاقَةٌ، فَهَلْ عِنْدَ رَسْمٍ دَارِسٍ مِنْ
مُعَوَّلِ؟. كَدَأْبِكَ مِنْ أُمِّ الحُوَيْرِثِ قَبْلَهَا، وَجَارَتِهَا أُمِّ الرَّبَابِ بِمَأْسَلِ. إِذَا قَامَتَا تَضَوَّعَ المِسْكُ
مِنْهُمَا، نَسِيْمَ الصَّبَا جَاءَتْ بِرَيَّا القَرَنْفُلِ. فَفَاضَتْ دُمُوْعُ العَيْنِ مِنِّي صَبَابَةً، عَلَى النَّحْرِ حَتَّى
بَلَّ دَمْعِيَ مِحْمَلِي

and how it should be displayed in less:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At
vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren,
no sea takimata sanctus est Lorem ipsum dolor sit amet.

.قِفَا نَبْكِ مِنْ ذِكْرَى حَبِيبٍ ومَنْزِلِ، بِسِقْطِ اللِّوَى بَيْنَ الدَّخُول فَحَوْمَلِ. فَتُوْضِحَ فَالمِقْراةِ لمْ يَعْفُ
رَسْمُها، لِمَا نَسَجَتْهَا مِنْ جَنُوبٍ وشَمْألِ. تَرَى بَعَرَ الأرْآمِ فِي عَرَصَاتِهَا، وَقِيْعَانِهَا كَأنَّهُ حَبُّ
فُلْفُلِ. كَأنِّي غَدَاةَ البَيْنِ يَوْمَ تَحَمَّلُوا، لَدَى سَمُرَاتِ الحَيِّ نَاقِفُ حَنْظَلِ. وُقُوْفًا بِهَا صَحْبِي عَليَّ
مَطِيَّهُمُ، يَقُوْلُوْنَ: لا تَهْلِكْ أَسًى وَتَجَمَّلِ. وإِنَّ شِفائِيَ عَبْرَةٌ مُهْرَاقَةٌ، فَهَلْ عِنْدَ رَسْمٍ دَارِسٍ مِنْ
مُعَوَّلِ؟. كَدَأْبِكَ مِنْ أُمِّ الحُوَيْرِثِ قَبْلَهَا، وَجَارَتِهَا أُمِّ الرَّبَابِ بِمَأْسَلِ. إِذَا قَامَتَا تَضَوَّعَ المِسْكُ
مِنْهُمَا، نَسِيْمَ الصَّبَا جَاءَتْ بِرَيَّا القَرَنْفُلِ. فَفَاضَتْ دُمُوْعُ العَيْنِ مِنِّي صَبَابَةً، عَلَى النَّحْرِ حَتَّى
بَلَّ دَمْعِيَ مِحْمَلِي

@polluks
Copy link
Contributor

polluks commented Mar 1, 2024

Does your terminal support UBA? Less is not a layout engine.

@avidseeker
Copy link
Author

Yes. I tried it on Tilda, but I don't think it is related to the terminal. The terminal job is rendering whatever CLI program prints to it. Just like in Vim for example, you can paste this then :set rightleft and will change the direction of the text. So my point is the text direction is managed by the program not the terminal.

@gwsw
Copy link
Owner

gwsw commented Mar 7, 2024

I don't read Arabic, but from what I can see, the only difference between what less currently displays and your "how it should be displayed" example is that the Arabic text is right-justified in the latter. Other than that, the same characters appear, written in the same direction, in both cases. Is right-justification the only thing you think should be changed? If so, I don't understand why the (quite complex) Unicode Bidirectional Algorithm was mentioned.

@avidseeker
Copy link
Author

the only difference is right-justification

It's a little bit more complex than that. Unicode has classification for characters to determine whether text should be aligned right or left. Maybe the URL I attached is too technical. Take a look on Unicode Character Property. For example, inserting RTL mark in the beginning of a Latin line, should make that line RTL even though it is written in LTR language, and vice-verse for LTR mark in Arabic text.

Having right-justification is a good start, and would cover most use-cases.

@gwsw
Copy link
Owner

gwsw commented Mar 11, 2024

I think I understand the RTL mark stuff, but I'm confused about why the Arabic text already seems to be displayed in the correct order in less (and for that matter, in cat). For example, the first Arabic character in the file is QAF (UTF-8 bytes D9 82). Since less is ignorant of LTR ordering, I would have expected that character to appear as the leftmost character in the first Arabic line, but it appears as the rightmost character. How is that happening?

@avidseeker
Copy link
Author

You're right. And this is where the terminal comes into play. Terminals with no letter shaping support for Arabic (like Alacritty, st, etc.) actually do the behavior that you describe.

image

But if it does support Arabic, the characters will be shaped correctly starting from the right side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants