Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update language attribute restrictions in API #1193

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yheuhtozr
Copy link

As far as I read the current implementation, the software technically processes hyphened language tags and does legally emit ISO 639-3 as well as ISO 639-1 codes. That means even though Mastodon itself is not ready to support the most of advanced language tags, it has no problem inputting/outputting those codes in API. Thus we can:

  • remove some outdated descriptions limiting to "ISO 639-1 two-letter code", which no more applies
  • accept well-formed BCP 47 language tags, with notes that Mastodon probably ignore additional information

mastodon/mastodon#19302 is a corroboration that a language tag doesn't break the system.

The rationale of this change is discussed in mastodon/mastodon#23541.

Closes mastodon/mastodon#23541.

  • Note: "language subtag" in BCP 47 ≈ ISO 639-1 ∪ ISO 639-3

@vercel
Copy link

vercel bot commented Mar 29, 2023

@yheuhtozr is attempting to deploy a commit to the Mastodon Team on Vercel.

A member of the Team first needs to authorize it.

@nikclayton
Copy link
Contributor

nikclayton commented May 31, 2023

+1 to the general idea that BCP47 language tags are the way to go.

-1 to this specific change, as I think it's a backwards incompatible change (Mastodon servers could now emit a string field that is > 2 characters long, where previously they were documented as only emitting a 2 character string).

A backwards compatible change would be to accept / emit a new language_code field, where the value is an object with type and code fields. I.e.,

"language_code": {
    "type": "...",
    "code": "..." 
},

Valid initial values for type would be iso639-1, iso639-2 (3 letter codes), and bcp47.

So:

"language_code": {
    "type": "iso639-1",
    "code": "en"
},

is equivalent to the current:

"language": "en",

If this field exists then the contents of the language field would be ignored.


Edit to note: The above is an example, not something I've spent a serious amount of design thought on.

@yheuhtozr
Copy link
Author

A backwards compatible change would be to accept / emit a new language_code field, where the value is an object with type and code fields.

@nikclayton Hi, that will be totally fine with me, too. Just for sure (since I'm a stranger), I think it entails additional logic in the Mastodon code, but do you think that will be more feasible?

@andypiper andypiper self-assigned this Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accept BCP 47 language tags in the statuses API
3 participants