Data-Driven API #210

sffc · 2018-01-19T09:17:52Z

It can be said that the challenge of providing i18n services can be split into two concepts:

Data: One needs access to a database of locale data.
Logic: Once the data is provided, there needs to be a way to process it.

In the i18n world, as well as in software in general, people like to be able to design their own logic. There are already dozens of wrappers over Ecma 402. It is not hard to find examples of clients who reverse-engineer i18n libraries to "extract" the data out of them; I can provide some examples.

Right now, the Ecma 402 APIs are all "logic" APIs. I suggest that we consider breaking the APIs into the two concepts: data and logic. The existing APIs need not change; I suggest simply adding a new data API, and redefining the spec for the logic functions to be in terms of the data. The data format can be defined by the Unicode specification UTS 35, which is supported by another standards body.

The advantages of doing this include:

Clients can write their own i18n logic on top of Ecma 402's data, without needing to reverse-engineer the built-in logic APIs.
We can make it easy for clients to swap in their own data source to replace the Ecma 402 data.
The specs can be more clear, since the logic API can be a relatively straightforward definition on top of the data API, and the data API can refer to parts of UTS 35.

The API can be as simple as something like Intl.Data.getNumberPattern(locale) or Intl.Data.getDateTimePattern(locale, skeleton). The methods can return a promise or take a callback to allow the user to make an asynchronous pop-in replacement.

The text was updated successfully, but these errors were encountered:

rxaviers · 2018-01-19T15:52:12Z

The theory sounds good, but the practical benefits aren't clear to me. Do you suggest to expose all CLDR data through this API or a subset? If a subset, which one? Could you cite examples/use cases where this is useful please?

rxaviers · 2018-01-19T15:56:21Z

Clarification: I can see value in exposing some data, such as display names. My confusion is basically the scope.

caridy · 2018-01-19T15:56:51Z

The real problem here is backward compatibility. I don't think backward compatibility (forever) is in the charter of UTS 35 or any other i18n data provider, while that is in the DNA of Javascript and the Web. Instead, we are aiming for a set of low-level APIs that can help you to build abstractions that rely on that data that you mentioned, but without exposing the data directly. Yes, it is more complicated, it is less flexible, but it has two very nice effect:

it is always backward compatible
it promotes the usage of good patterns for the web

sffc · 2018-01-19T18:26:07Z

CLDR has a lot of data, and it often has messy fallback rules. I was thinking that our API would be "CLDR++", where we only expose a subset of data useful for JavaScript users and take care of locale fallbacks and other intricacies of CLDR data loading under the hood. And of course if you wanted to use a data source that isn't CLDR, you're welcome to do so as long as you expose the same API.

For stability, if UTS 35 doesn't suffice, I don't see anything necessarily wrong with re-specifying the format of the subset of UTS 35 data that we provide through Ecma 402.

msaboff · 2018-11-29T22:56:21Z

sffc · 2019-03-29T01:05:52Z

@indexzero

indexzero · 2019-03-31T03:01:24Z

Thanks for including me @sffc – would love to help get involved on this issue.

I will admit that I am coming at this from a pragmatic point of view:

We use react-intl extensively. It is king in it's small framework bound domain (see: npmtrends
react-intl expects localeData as do some of their key dependencies:

The intl-{message,relative}format libraries are ponyfills that state their intention to remain up-to-date with ECMA-402 along with some additional features. Whether or not those additional features are good or bad features they illustrate the value of exposing the data in a more granular fashion. That is, there will inevitably be features built on top of Intl APIs that need to access data not currently available.

By empowering that goal we make i18n easier for applications and developers. I have seen an enormous amount of time spent bikeshedding on the most optimal way to deliver CLDR data into browsers to initialize react-intl. It would be interesting to hear from other ecosystem projects which may have similar concerns.

In what ways these ecosystem libraries will need data access remains a question for me. The data access by react-intl and its dependencies is sparse for certain edge cases, yet the library forces consumers to provide all of the CLDR data.

Perhaps reaching out to some of the folks who maintain these libraries is a good next step? Forgive me if you folks have / are already chatting with them.

sffc · 2019-06-03T21:56:34Z

Some more ideas I had.

There are cases where the user wants to provide their own data but use the browser's built-in logic, and vice-versa. If we can define a stable data language, similar to what's provided by LDML, then we can decouple that in JavaScript.

Here's an example of how a programmer could use their own data with the browser's algorithm. They give their data provider to a factory that asynchronously constructs an Intl.NumberFormat using that data provider instead of the browser's default data provider:

const dataProvider = // (user-land object implementing a data provider interface)
const factory = new Intl.Data.Factory(provider);
const fmt = await factory.createNumberFormat("ml", { style: "percent" });

The data provider interface could be as simple as: async get(localeList, xpath) returns the data at the specified xpath and the best matching locale. We would define the space of valid xpaths, which could be similar to LDML. The browser could expose this API:

const { locale, data } = await Intl.Data.defaultProvider.get(
    ["ff", "ar"], "/numbers/decimalFormats@numberSystem=latn/pattern");

If the user wants to provide their own data only when the browser doesn't have the data for that locale, they could write something along the lines of,

class MyDataProvider {
  async get(localeList, xpath) {
    const browserResult = await Intl.Data.defaultProvider.get(localeList, xpath);
    const requested = (typeof localeList === "string") ? localeList : localeList[0];
    if (browserResult.locale !== requested) {
      // call custom data service and return that result
    } else {
      return browserResult;
    }
  }
}

longlho · 2019-09-28T02:35:54Z

Thanks @sffc for redirecting me here. Since @indexzero mentioned react-intl that I happen to maintain (& Dropbox also happen to use as well) I'd like to provide some context here:

formatjs polyfills are still used even on browsers that natively support the features, just to load CLDR data since browsers don't come with all the locales, same thing with currency.
As @indexzero mentioned, we spend a significant amount of effort merging data of the same language, dedupe based on parent locale hierarchy & packing it. Then the polyfills we wrote know how to unpack the data.
Packing/unpacking CLDR data is very crucial to distribution pipeline and is common practice, similar to how momentjs's packing/unpacking IANA data.

I think at a high level what could help the workflow above is:

Expose locale negotiation, so we don't have to bundle things like legacy alias and parent locale (zh-CN -> zn-Hans-CN -> zh-Hans -> zh). This allows us to locate at least the correct language.
Ability to load CLDR data per language (not per locale).
Nice to have: packed data format.

sffc · 2019-09-29T08:03:55Z

See #87 for some discussion on your first bullet ("locale negotiation").

sffc · 2019-09-29T08:16:54Z

My feelings on this issue are going back and forth.

On the one hand, it is nice to give app developers the power to add more data when the browser provides insufficient feature or locale coverage. On the other hand, the design of Intl is for it to be "best-effort" and easy to use (hard to abuse), and this thread has raised several good points that injecting data into Intl at runtime adds a significant amount of complexity.

I know that Chrome is working long-term on dynamically adding data for new locales. I think Firefox has a similar effort. By keeping the data exchange in the browser engine, Intl's handling of CLDR data remains transparent to the user, which seems like a desirable property.

ljharb · 2019-09-29T14:45:49Z

Without the ability to object the data, polyfilling new data requires replacing almost every single Intl method; with that ability, all the methods may be correct already and just need new backing data.

sffc · 2019-10-25T01:31:09Z

Is it possible to have a function detect whether it is being called in a sync or async context? For example, could await Intl.DateTimeFormat() have different behavior than Intl.DateTimeFormat()? @ljharb

I'm just trying to think of unobtrusive ways to add data loading to the API. It would be nice if you could do the following, but it's not clear whether that is possible without breaking the web.

let dtf = await Intl.DateTimeFormat();
console.log(dtf.format(x));

One option @ljharb suggested was something like the following. It doesn't require changing the constructor, but it would give the otherwise immutable Intl.DateTimeFormat object two "states", one where data is present and one where it is not.

let dtf = new Intl.DateTimeFormat();
await dtf.load();
console.log(dtf.format(x));

We could add a new namespace for the async-enabled constructors, like Intl.Async. The new namespace would have all of the same constructors as the Intl namespace, except that they return promises that resolve to "normal" objects.

let dtf = await Intl.Async.DateTimeFormat();
console.log(dtf.format(x));

Or, we could put data loading into the terminal format method. The downside here is that you put async operations into a function that was never async before, so it might be harder to use as a drop-in replacement. For example, if you have to pass your object as an argument to some other function, that function needs to know whether to use the async version of the terminal method.

let dtf = new Intl.DateTimeFormat();
console.log(await dtf.asyncFormat(x));

// problem if you have to pass dtf to a function like this
function doStuffWithDateTimeFormat(dtf) {
  // should this function use .format() or .asyncFormat() ?
}

ljharb · 2019-10-25T02:26:15Z

You can't usefully detect that, no, and if you could it would break use cases where people don't await immediately but still do something with the promise.

If a constructor returns a promise, than instanceof will fail until it's awaited, which would be confusing.

jackhorton mentioned this issue Sep 6, 2018

Provide an access to CLDR database shipped in browser #266

Closed

littledan mentioned this issue Nov 8, 2018

Intl.ListFormat needs control of last comment in English. tc39/proposal-intl-list-format#31

Closed

sffc mentioned this issue Dec 4, 2018

honor hourCycle in conjunction with dateStyle/timeStyle? tc39/proposal-intl-datetime-style#11

Closed

sffc mentioned this issue Mar 19, 2019

Clarify "combining options" support in the spec tc39/proposal-unified-intl-numberformat#26

Closed

sffc added the c: meta Component: intl-wide issues label Mar 19, 2019

This was referenced Mar 19, 2019

DateTimeFormat: add support options.raw (aka pattern) #190

Open

[Investigation] Different ways to do System I/O via Intl.* #153

Closed

sffc added the s: comment Status: more info is needed to move forward label Mar 19, 2019

sffc mentioned this issue Apr 5, 2019

Intl support for Temporal proposal tc39/proposal-temporal#129

Closed

sffc added this to Backlog in Deprecated: ECMA 402 Project Board Apr 18, 2019

sffc mentioned this issue Jul 23, 2019

Address #36 and #37 tc39/proposal-intl-displaynames#42

Merged

sffc mentioned this issue Sep 27, 2019

Recommendations for adding CLDR for new locale/unit/currency #379

Closed

sffc mentioned this issue Oct 20, 2019

Potential problem with data loading tc39/proposal-intl-DateTimeFormat-formatRange#17

Closed

sffc mentioned this issue Nov 20, 2019

Data sources unicode-org/rust-discuss#5

Closed

sffc mentioned this issue Apr 16, 2020

Ergonomic API & Data Providers unicode-org/icu4x#30

Open

sffc mentioned this issue May 6, 2020

Async APIs for Intl loading #434

Open

sffc added the Data Related to locale data label Jun 5, 2020

sffc mentioned this issue Sep 26, 2020

Add ordinal and spellout styles in NumberFormatter tc39/proposal-intl-numberformat-v3#22

Closed

sffc mentioned this issue Nov 6, 2020

startRange vs shared vs endRange tc39/proposal-intl-DateTimeFormat-formatRange#30

Closed

longlho mentioned this issue Dec 3, 2020

Add the API as a Intl.prototype function instead. tc39/proposal-intl-localematcher#2

Open

sffc mentioned this issue Feb 15, 2021

Custom Dictionaries tc39/proposal-intl-segmenter#133

Open

rxaviers mentioned this issue Mar 11, 2021

Ability to specify a custom format for Intl.DateTimeFormat #554

Closed

sffc mentioned this issue Sep 7, 2021

Supporting a growing number of "unit" values in formatOptions #607

Open

sffc mentioned this issue Nov 2, 2021

Set CPT ValueWidth::DATA_GET_ERROR_VALUE to 0 unicode-org/icu4x#1183

Closed

ptomato mentioned this issue Aug 5, 2022

Format a Temporal object into a user-supplied string format js-temporal/proposal-temporal-v2#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-Driven API #210

Data-Driven API #210

sffc commented Jan 19, 2018

rxaviers commented Jan 19, 2018

rxaviers commented Jan 19, 2018

caridy commented Jan 19, 2018 •

edited

Loading

sffc commented Jan 19, 2018

msaboff commented Nov 29, 2018

sffc commented Mar 29, 2019

indexzero commented Mar 31, 2019

sffc commented Jun 3, 2019

longlho commented Sep 28, 2019 •

edited

Loading

sffc commented Sep 29, 2019

sffc commented Sep 29, 2019

ljharb commented Sep 29, 2019

sffc commented Oct 25, 2019

ljharb commented Oct 25, 2019

Data-Driven API #210

Data-Driven API #210

Comments

sffc commented Jan 19, 2018

rxaviers commented Jan 19, 2018

rxaviers commented Jan 19, 2018

caridy commented Jan 19, 2018 • edited Loading

sffc commented Jan 19, 2018

msaboff commented Nov 29, 2018

sffc commented Mar 29, 2019

indexzero commented Mar 31, 2019

sffc commented Jun 3, 2019

longlho commented Sep 28, 2019 • edited Loading

sffc commented Sep 29, 2019

sffc commented Sep 29, 2019

ljharb commented Sep 29, 2019

sffc commented Oct 25, 2019

ljharb commented Oct 25, 2019

caridy commented Jan 19, 2018 •

edited

Loading

longlho commented Sep 28, 2019 •

edited

Loading