Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: remove tables of Unicode property values and aliases #2649

Merged
merged 1 commit into from
Sep 1, 2022

Conversation

michaelficarra
Copy link
Member

Following the acceptance of L2/22-029, Proposal to guarantee stability of spelling of property names, values, and aliases in UCD, we no longer need to keep these tables.

@michaelficarra michaelficarra added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs consensus This needs committee consensus before it can be eligible to be merged. labels Feb 3, 2022
Copy link
Member

@mathiasbynens mathiasbynens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! 🥳

What’s the timeline on getting the accepted proposal reflected “on paper” in the Unicode Standard? We don’t necessarily need to wait for that to happen (given the consensus) but it’d be useful to know.

spec.html Outdated
<emu-note>
<p>For example, `Xpeo` and `Old_Persian` are valid `Script_Extensions` values, but `xpeo` and `Old Persian` aren't.</p>
</emu-note>
<emu-note>
<p>This algorithm differs from <a href="https://unicode.org/reports/tr44/#Matching_Symbolic">the matching rules for symbolic values listed in UAX44</a>: case, <emu-xref href="#sec-white-space">white space</emu-xref>, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the `Is` prefix is not supported.</p>
</emu-note>
<emu-note>
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> and <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this note even needed now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. ES still needs a list of properties, since it only supports an explicit subset.

Suggested change
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>
<p>The spellings of entries in these tables (including casing) were chosen to match the historically first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. Note that the precise spellings in those files are guaranteed to be stable. Additional aliases might be added in future versions of Unicode.</p>

@michaelficarra
Copy link
Member Author

Not sure if we need all the "Unicode property" clarification or if we could just use "property" in most places. I guess same goes for existing usage of "Unicode code point". The qualifier mostly seems unnecessary.

@michaelficarra
Copy link
Member Author

What’s the timeline on getting the accepted proposal reflected “on paper” in the Unicode Standard?

Not sure. @markusicu could probably answer. I was just going to let this PR sit until it happened, unless the committee asks for it to be merged sooner when I ask for consensus.

@markusicu
Copy link
Contributor

What’s the timeline on getting the accepted proposal reflected “on paper” in the Unicode Standard?

Not sure. @markusicu could probably answer. I was just going to let this PR sit until it happened, unless the committee asks for it to be merged sooner when I ask for consensus.

The new policy has been approved by the Unicode Technical Committee and by the Unicode executive officers.
I don't know when it will be published on the website.

spec.html Outdated
@@ -34530,13 +34530,13 @@ <h1>Static Semantics: Early Errors</h1>
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyName| is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the &ldquo;Property name and aliases&rdquo; column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>.
</li>
<li>
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a List of Unicode code points that is a value or value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in the &ldquo;Property value and aliases&rdquo; column of the corresponding tables <emu-xref href="#table-unicode-general-category-values"></emu-xref> or <emu-xref href="#table-unicode-script-values"></emu-xref>.
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a property value or property value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A UnicodePropertyValue should only match a Unicode property value alias, right? Not also a Unicode property alias?
Thus

Suggested change
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a property value or property value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>.
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyValue| is not identical to a property value alias for the Unicode property or property alias given by SourceText of |UnicodePropertyName| listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, where does it say Unicode property alias?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sorry, I think I misread "a property value or property value alias" for "a property alias or property value alias".
You are right, it just says "value or value alias". So not wrong, just redundant: In Unicode parlance, all of these are "aliases". The "value" is the logical thing, and the "aliases" are the symbolic strings for the thing. Thus I think my suggestion is useful despite my brain fart :-}

Example: https://www.unicode.org/reports/tr44/#Property_Value_Aliases

In PropertyValueAliases.txt, the first field contains the abbreviated alias for a Unicode property, the second field specifies an abbreviated symbolic name for a value of that property, and the third field specifies the long symbolic name for that value of that property. These are the preferred aliases. Additional aliases for some property values may be specified in the fourth or subsequent fields.

spec.html Outdated
</li>
</ul>
<emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar>
<ul>
<li>
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a List of Unicode code points that is a Unicode general category or general category alias listed in the &ldquo;Property value and aliases&rdquo; column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, nor a binary property or binary property alias listed in the &ldquo;Property name and aliases&rdquo; column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, nor a binary property or binary property alias listed in the &ldquo;Property name and aliases&rdquo; column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto?

Suggested change
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, nor a binary property or binary property alias listed in the &ldquo;Property name and aliases&rdquo; column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
It is a Syntax Error if the List of Unicode code points that is SourceText of |LoneUnicodePropertyNameOrValue| is not identical to a Unicode property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, nor a binary property name or binary property alias listed in the &ldquo;Property name and aliases&rdquo; column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.

spec.html Outdated
@@ -35656,7 +35656,7 @@ <h1>Runtime Semantics: CompileToCharSet</h1>
<emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar>
<emu-alg>
1. Let _s_ be SourceText of |LoneUnicodePropertyNameOrValue|.
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the &ldquo;Property value and aliases&rdquo; column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, then
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a Unicode property value or property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, then
1. If ! UnicodeMatchPropertyValue(`General_Category`, _s_) is identical to a Unicode property value alias for the General_Category (gc) property listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>, then

spec.html Outdated
1. Let _value_ be the canonical property value of _v_ as given in the &ldquo;Canonical property value&rdquo; column of the corresponding row.
1. Return the List of Unicode code points _value_.
</emu-alg>
<p>Implementations must support the Unicode property value names and aliases listed in <emu-xref href="#table-unicode-general-category-values"></emu-xref> and <emu-xref href="#table-unicode-script-values"></emu-xref>. To ensure interoperability, implementations must not support any other property value names or aliases.</p>
<p>Implementations must support the Unicode property values and property value aliases listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> for the properties listed in <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>. To ensure interoperability, implementations must not support any other property values or property value aliases.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are pointing to the "lastest" Unicode Character Database, which makes this spec "evergreen". However, as a "must support", should you provide a little leniency for recent versions, or the versions as of the release of the ES implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already normatively refer to the latest version of the Unicode standard (for things like whitespace, ID_Start, etc). From the spec,

Additionally, ECMAScript 2017 mandated always using the latest version of the Unicode standard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this was the relevant PR: #620

spec.html Outdated
<emu-note>
<p>For example, `Xpeo` and `Old_Persian` are valid `Script_Extensions` values, but `xpeo` and `Old Persian` aren't.</p>
</emu-note>
<emu-note>
<p>This algorithm differs from <a href="https://unicode.org/reports/tr44/#Matching_Symbolic">the matching rules for symbolic values listed in UAX44</a>: case, <emu-xref href="#sec-white-space">white space</emu-xref>, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the `Is` prefix is not supported.</p>
</emu-note>
<emu-note>
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the files <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> and <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. ES still needs a list of properties, since it only supports an explicit subset.

Suggested change
<p>The spellings of entries in these tables (including casing) were chosen to match the first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. However, because the precise spellings in those files are not guaranteed to be stable, implementations are required to follow this table rather than those files.</p>
<p>The spellings of entries in these tables (including casing) were chosen to match the historically first occurrence of each property in the file <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt"><code>PropertyAliases.txt</code></a> in the Unicode Character Database at the time each entry was added to this specification. Note that the precise spellings in those files are guaranteed to be stable. Additional aliases might be added in future versions of Unicode.</p>

spec.html Outdated
@@ -34530,13 +34530,13 @@ <h1>Static Semantics: Early Errors</h1>
It is a Syntax Error if the List of Unicode code points that is SourceText of |UnicodePropertyName| is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the &ldquo;Property name and aliases&rdquo; column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: You could simplify the tables of properties: List only one name for each property, no further aliases, and just point to PropertyAliases.txt for aliases.

@michaelficarra michaelficarra added the editor call to be discussed in the next editor call label Feb 5, 2022
michaelficarra added a commit to tc39/agendas that referenced this pull request Feb 7, 2022
@michaelficarra michaelficarra removed the editor call to be discussed in the next editor call label Feb 16, 2022
@markusicu
Copy link
Contributor

What’s the timeline on getting the accepted proposal reflected “on paper” in the Unicode Standard?

Not sure. @markusicu could probably answer. I was just going to let this PR sit until it happened, unless the committee asks for it to be merged sooner when I ask for consensus.

The new policy has been approved by the Unicode Technical Committee and by the Unicode executive officers. I don't know when it will be published on the website.

The updated policy has been published today:
https://www.unicode.org/policies/stability_policy.html#Alias_Stability
...
Property aliases, once defined in PropertyAliases.txt, will never be removed, nor will their precise spelling be changed.

Property value aliases, once defined in PropertyValueAliases.txt, will never be removed, nor will their precise spelling be changed.
...

@michaelficarra michaelficarra added has consensus This has committee consensus. and removed needs consensus This needs committee consensus before it can be eligible to be merged. labels Mar 28, 2022
@michaelficarra michaelficarra added editor call to be discussed in the next editor call and removed editor call to be discussed in the next editor call labels Aug 3, 2022
spec.html Outdated Show resolved Hide resolved
@michaelficarra michaelficarra force-pushed the stable-Unicode-property-spellings branch from 3a55d88 to 5f86037 Compare August 4, 2022 01:59
@michaelficarra
Copy link
Member Author

@bakkot rebased and addressed comment

spec.html Outdated Show resolved Hide resolved
@michaelficarra michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Aug 31, 2022
@ljharb ljharb force-pushed the stable-Unicode-property-spellings branch from e7496c3 to 3c29c06 Compare September 1, 2022 17:04
@ljharb ljharb merged commit 3c29c06 into main Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus This has committee consensus. normative change Affects behavior required to correctly evaluate some ECMAScript source text ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants