Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept BCP 47 language tags in the statuses API #23541

Open
yheuhtozr opened this issue Feb 12, 2023 · 4 comments · May be fixed by mastodon/documentation#1193
Open

Accept BCP 47 language tags in the statuses API #23541

yheuhtozr opened this issue Feb 12, 2023 · 4 comments · May be fixed by mastodon/documentation#1193
Labels
suggestion Feature suggestion

Comments

@yheuhtozr
Copy link

yheuhtozr commented Feb 12, 2023

Pitch

Currently, I see three parameters that take an ISO 639 string in the statuses API.

  • POST /api/v1/statuses: language
  • POST /api/v1/statuses/:id/translate: lang
  • PUT /api/v1/statuses/:id: language

I suggest that we should expand them to accept a well-formed BCP 47 string, or if any compatibility concerns, make another parameter (such as locale?) that can store them. The ActivityPub content property readily accepts BCP 47, so I think this is just a Mastodon API restriction.

Motivation

  1. ISO 639 alone is not a "practical" approximation of concrete languages people perceive

    • In short, a single ISO 639 code may look a reasonable identifier in most of the European region, it often doesn't in other parts of the world. It is either because history, politics (ISO is voted by countries), or simply mismatch of technology and culture. Sometimes multiple subtags combined to be the "idiom" for a user-perceived language.
    • I already found many people reporting various issues in Add more options to language selection #18538
    • But if I'd to add another, for example, "Spanish" variants across the Atlantic are already enough divergent that a random everyday word in one region is a profanity in another (example), that would seriously harm the usability of profanity filter (Scunthorpe problem).
    • Machine translation providers, such as Google Translate or DeepL already support AmE (en-US) vs BrE (en-GB) and Brazillian Portuguese (pt-BR) vs Portuguese Portuguese (pt-PT), which Mastodon API cannot accept (edit: it was already reported in Consider the country when translating with DeepL #22707).
  2. it affects wider audience than Mastodon

    • Other software including Friendica, Pleroma, GoToSocial..., and various client apps rely on Mastodon API for compatibility, so it is restraining a much wider user base from implementing more flexible language selection on their own.
    • Even if Mastodon has difficulty putting it in use anytime soon due to other design issues (though I'd like to see it happen), the API change is a helpful step I think "relatively easy" to do.
@yheuhtozr yheuhtozr added the suggestion Feature suggestion label Feb 12, 2023
@yheuhtozr
Copy link
Author

I wonder if this is an adoptable solution, and if so, whether there is anything I can help with.

@rschiang
Copy link

rschiang commented Jul 8, 2023

I guess we could bring this issue on IRC? There must be a primary discussion space for Mastodon devs, and we’ll need some attention before even convincing the merge.

@yheuhtozr
Copy link
Author

@rschiang Hi, I'm not very familiar with the actual ecosystem of Mastodon development. I am definitely ready to make a some kind of pitch if you can help me into a right place to do.

@yheuhtozr
Copy link
Author

Just from my other comment:

The new ISO 639 is just out with substantial enlargement, so I can back up my opinion below with the official evidence. While the full text is proprietary, please allow me to cite its section 6.2.1:

Where spoken intelligibility between language varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central language variety that both speaker communities understand is a strong indicator that they should nevertheless be considered language varieties of the same individual language.

which essentially means that they publicly admit grouping up several practically unintelligible "dialects" into an identical ISO 639 code (the situation has always existed but now explicitly ratified). From the backstage perspective, this assumes the existence of other methods to specify subdivisions, such as IETF language tag or upcoming ISO 21636 framework, so in such cases we will need the help of combined language tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion Feature suggestion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants