Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The design_requirements for Dutch [nld] are misleading #118

Open
moyogo opened this issue Apr 26, 2023 · 5 comments
Open

The design_requirements for Dutch [nld] are misleading #118

moyogo opened this issue Apr 26, 2023 · 5 comments
Assignees

Comments

@moyogo
Copy link
Contributor

moyogo commented Apr 26, 2023

The design_requirements for Dutch [nld] are misleading:

design_requirements:
- To support a combination of ‹ij› with an acute mark, ‹ȷ› with an acute mark should follow ‹í›. Font developers provide automated substitutions in their fonts to make such character recomposition work.

The design requirements should say that "The <j> should lose its dot when combined with a combining acute for when the acute on j is not omitted on stressed lange ij, usually spelled íj but íj́ when possible. Generally, fonts should not add an acute that is not there in the text."

1. The characters used are confusing

The current design requirements say that <ij> U+0133 LATIN SMALL LIGATURE IJ is combined with U+0301 COMBINING ACUTE and that <ȷ> U+0237 LATIN SMALL LETTER DOTLESS J should have an acute when following <í> U+00ED LATIN SMALL LETTER I WITH ACUTE.

<ij> U+0069 U+006A should be used instead of <ij> is U+0133 LATIN SMALL LIGATURE IJ as the letter combination i+j U+0069 U+006A is generally used for the lange ij.

See for example Taalunie, Technische Handleiding: Regels voor de officiële spelling van het Nederlands, 2016, p. 19:

De lettercombinatie i+j (lange ij) gedraagt zich soms alsof het om één enkele letter gaat.

See also Unicode, chapter 7:

Another pair of characters, U+0133 latin small ligature ij and its uppercase version,
was provided to support the digraph “ij” in Dutch, often termed a “ligature” in discussions
of Dutch orthography. When adding intercharacter spacing for line justification, the “ij” is
kept as a unit, and the space between the i and j does not increase. In titlecasing, both the i
and the j are uppercased, as in the word “IJsselmeer.” Using a single code point might sim-
plify software support for such features; however, because a vast amount of Dutch data is
encoded without this digraph character, under most circumstances one will encounter an
<i, j> sequence.

The ȷ U+0237 LATIN SMALL LETTER DOTLESS J is not used in Dutch, j U+006A LATIN SMALL LETTER J is.

2. Substitution of j after í breaks Dutch text

An automated substitution that replaces j by ȷ with acute after í breaks Dutch text.
The Taalunie stress marks spelling rule 5.1 says :

Het klemtoonteken is ´. Als een klinker of tweeklank met twee of meer letters geschreven wordt, krijgen de eerste twee een klemtoonteken.
[...]
Door technische beperkingen vervalt meestal het nadrukteken op de j van een lange ij. Bijvoorbeeld: blíjven kijken!

Which can be translated as:

The stress mark is ´. If a vowel or diphthong is written with two or more letters, the first two letters get a stress mark.
Due to technical limitations, the stress mark on the j is usually omitted from a lange ij. For example: blíjven kijken!

So Dutch text can follow the official spelling rules and omit the acute on the j of stressed lange ij, like in the example provided.

Additionally, this spelling rule was standardized in the 1996 spelling and before that it was common to put the acute only on the first letter of digraphs composed of two different letters.
See for example Jan Renkema, Schrijfwijzer, 1987, p. 159.
Many Dutch speakers still write and many Dutch texts are written with pre-1996 rules.
They use níet instead of níét, góed instead of góéd, zíjn instead of zíj́n, a font should not make either look like they have an additional acute.

There is also the issue of foreign names in Dutch text, like Níjar or Szíj, which would be displayed incorrectly.

@kontur
Copy link
Contributor

kontur commented May 3, 2023

Very interesting, thank you @moyogo! On 1) I agree... the use of the jdotless in the example likely stems from a designer centric view where that letter would be the component used to construct the ij with acute. As for 2) this is new to me. I was under the impression that the "technical limitation" should be circumvented when this is possible. So overall this should be an optional recommendation that also mentions the different styles/orthographies?

@MrBrezina
Copy link
Member

MrBrezina commented May 15, 2023

Sorry for taking so long to get to this. I have a draft which I will push in a moment for your review. It is longer than what you proposed. Hopefully, it helps clarity. What I am still unsure about is this bit where I say:

It is up to the font developers to decide whether they want to treat lange ij as a single unit during tracking or not.

We had some Dutch readers telling us they would prefer for <i><j> to get tracked and others would insist on keeping it a single unit. This:

De lettercombinatie i+j (lange ij) gedraagt zich soms alsof het om één enkele letter gaat.

says that it can “sometimes” behave like a single unit, hence my recommendation above.

I can see three strategies font developers can take:

  • merge <i><j> to <ij> (on a glyph level)
  • dissolve <ij> to <i><j> and <IJ> to <I><J>
  • leave <ij> and <IJ> intact and leave that control to users, i.e. they can use <ij> to keep them as a single unit or <i><j> to track

Each of these then requires a different solution when adding stress on the lange ij. The latter two strategies would work well for multilngual texts.

@moyogo
Copy link
Contributor Author

moyogo commented May 16, 2023

We had some Dutch readers telling us they would prefer for <i><j> to get tracked and others would insist on keeping it a single unit. This:

De lettercombinatie i+j (lange ij) gedraagt zich soms alsof het om één enkele letter gaat.

says that it can “sometimes” behave like a single unit, hence my recommendation above.

I should have quoted the whole paragraph from Taalunie, Technische Handleiding, 2016 (it’s actually online: https://taalunie.org/feeds/download/technische-handleiding-2016-5dcab.pdf/Technische%20Handleiding/original):

De lettercombinatie i+j (lange ij) gedraagt zich soms
alsof het om één enkele letter gaat. Zo worden i en j aan
het begin van een zin of een eigennaam beide als
hoofdletter geschreven: IJmuiden, IJzermonding. Ze staan
in kruiswoordraadsels vaak samen in één vakje. In
naslagwerken of telefoongidsen worden de woorden of
namen die ij bevatten, soms onder de letter y
gealfabetiseerd. In de meeste woordenboeken is er
echter geen sprake van een aparte letter ij, maar wordt ij
geplaatst tussen -ii- en -ik-.

The "sometimes" means it behaves like a single letter in some contexts (beginning of sentences and of proper nouns, or sorted like y in some reference works) and like two letters in others (sorted like i+j in most dictionaries).
I don’t think the Taalunie was refering to users preference for tracking but doing so as a unit has definitely been the norm historically.

I can see three strategies font developers can take:

  • merge <i> to <ij> (on a glyph level)
  • dissolve <ij> to <i> and <IJ> to <I>
  • leave <ij> and <IJ> intact and leave that control to users, i.e. they can use <ij> to keep them as a single unit or <i> to track

Each of these then requires a different solution when adding stress on the lange ij. The latter two strategies would work well for multilngual texts.

Dissolving <ij> and <IJ> defeats their purpose, at least according to the Unicode paragraph quoted before.
Users who want ij kept as a unit can use <ij> and <IJ> safely (unless a font abuses Unicode and dissolves them).

Fonts may provide the same tracking behaviour for <i><j> and <I><J> as a unit, it may be optional or by default, but either way it should be easy to enable or disable.
The Taalunie is pretty clear on the "lange ij" being i+j. Having a ligature letterform for <i><j> is great for some or in some display styles and in handwritten styles, but it’s not what most people are used to seeing in text styles.

Drawing with broad brushes, some Dutch speakers feel strongly that lange ij is a single letter with encoding issues and some other Dutch speakers feel more that it’s a letter combination with a special casing rule. Generalizing a bit, there is a Netherlands and Belgium divide on the issue.

MrBrezina added a commit that referenced this issue May 16, 2023
@MrBrezina
Copy link
Member

Thank you @moyogo, got it. Updated the design requirements one more time. Please, let me know if something does not sound right. I have read it too many times.

@moyogo
Copy link
Contributor Author

moyogo commented May 17, 2023

It looks good to me. Thank you @MrBrezina.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants