Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preferences don't cascade on translation #30

Closed
parkr opened this issue Aug 18, 2015 · 9 comments · Fixed by #92
Closed

Preferences don't cascade on translation #30

parkr opened this issue Aug 18, 2015 · 9 comments · Fixed by #92
Assignees
Milestone

Comments

@parkr
Copy link
Collaborator

parkr commented Aug 18, 2015

Hey, @nicksnyder! Ran into an issue today you might find interesting. Here's my situation.

I have three files, all for English: en.all.json, en-US.all.json, and en-GB.all.json. The first holds all English translations that are common between all variations of English we support. The second and third files hold translations specific to their respective locales.

A user requests something from us, and sends us the locale en-US. We make a TranslationFunc with preferences en-US, en. Now here's where we run into a problem: if I ask for a translation that is not in en-US.all.json, it returns the translationID instead of looking inside en.all.json for a relevant translation.

This is due to the way bundle.TfuncAndLanguage handles language preferences. When I ask for en-US and the language has any translations, it limits the search to this map, ignoring the remaining language preferences.

I'd like to support the preference fallback at the translation level, instead of the language level. In pseudocode, this is:

translation_id = "my_string"
preferences = %[en-US en]

# Upon request for a translation, iterate through each & return
# if any of the preferences contain the translation.
preferences.each do |pref|
  if translations[pref] && translations[pref][translation_id]
    # A translation was found.
    return translations[pref][translation_id]
  end
end
# If no langs have matching translations, return the translation ID.
return translation_id

What do you think?

@nicksnyder
Copy link
Owner

I have a couple thoughts:

  1. While it may work for English, it is not obvious to me that it is appropriate to mix translations from different locales in general, even if they share a common prefix in their locale identifier.
  2. The whole workflow assumes there exists a master locale, presumably the locale of the development team, from which all strings are translated from. Whenever a string is added to a master locale, we want to make sure that it gets translated into all other locales. If a translation is missing from one locale, I don't want to fall back to another locale, because then the error might not be noticed.

Basically each locale should contain a full copy of all translations, even if some translations are the same across multiple sublocales.

@nicksnyder
Copy link
Owner

@parkr thoughts?

@parkr
Copy link
Collaborator Author

parkr commented Aug 26, 2015

Hey Nick, sorry for my delayed reply. You gave me some good stuff to think over and I had to write some
code to generate tfuncs with a master language ;) Here are my thoughts:

While it may work for English, it is not obvious to me that it is appropriate to mix translations from different locales in general, even if they share a common prefix in their locale identifier.

I agree it would not be an obvious move here. I'm not thinking one part of one page would be in English with scattered Thai or German depending upon what's available. I mean entire resources could move independently. I could have one tfunc generated, and generate a series of different emails, say. The first email has no Thai translation, so the user gets English. The second email does have a translation, so it gets the Thai translation. This is the reality of incrementally shipping translations. We want a first iteration out quickly – do it in English, get our Copy team to sign off on it and ship it. Submit it to your translation vendor and as accurate, verified translations come in, add them, but always fall back to English (our "master" version) in case the more preferred languages don't have translations.

The trade-off here is that we fall back to our least-preferred language instead of an empty string. In my view, the user should get at least something.

The whole workflow assumes there exists a master locale, presumably the locale of the development team, from which all strings are translated from.

I think this is a fair and reasonable assumption for most teams. I can't imagine what you'd translate if you didn't have it in some language. Usually you write in the company's native tongue and you translate from there. Seems very reasonable to me to assume this.

Whenever a string is added to a master locale, we want to make sure that it gets translated into all other locales. If a translation is missing from one locale, I don't want to fall back to another locale, because then the error might not be noticed.

If a translation is missing from one locale, you get an unhelpful error state at the moment anyway: an empty string. If you at least get some copy, even if it's in English, you're better off. It'd be mighty difficult to catch an empty string, but you'll notice the random English sentence in a sea of Russian, German, Dutch, French, or Thai (etc). I think falling back to the master language is more useful than providing "" for a non-existent string.

Basically each locale should contain a full copy of all translations, even if some translations are the same across multiple sublocales.

This creates a huge copy pasta problem. I have one translation for English here and I tweak it a tiny bit in my master en-US.json file, but forget to do so in my th-TH.json file because (hypothetically) I'm new to the project and don't know to do so but we're pressed for time and it's just a harmless copy change anyway. Having a single source for our master locale is key to accuracy.

@parkr
Copy link
Collaborator Author

parkr commented Sep 23, 2015

Any further thoughts on my thoughts?

@nicksnyder
Copy link
Owner

@parkr

The first email has no Thai translation, so the user gets English.

That is a reasonable thing to want to do, so perhaps we need to add a way to opt in to this fallback behavior. I don't think it should be the default behavior though, because another reasonable thing to do is not send the email at all until you have a translation for it. To do this, Tfunc needs to return something that allows the caller to detect a translation is missing, so that is why Tfunc should return the translation id.

If a translation is missing from one locale, you get an unhelpful error state at the moment anyway: an empty string.

Tfunc should return the translation id or non-empty string (which is presumably a translation). If you are observing otherwise, it is a bug which I would like to fix. Looking at the code, I don't see how it could return the empty string ""

This creates a huge copy pasta problem. I have one translation for English here and I tweak it a tiny bit in my master en-US.json file, but forget to do so in my th-TH.json file because (hypothetically) I'm new to the project and don't know to do so but we're pressed for time and it's just a harmless copy change anyway. Having a single source for our master locale is key to accuracy.

I don't quite understand the problem; why would a developer want to edit th-TH.json?

The workflow that I have in mind is:

  1. Developer updates en-US.json to add/edit a translation.
  2. When a translation round is ready, or postcommit hook, run goi18n to extract any new untranslated strings and send them to translators.
  3. Translations get merged back into repo (via goi18n)

Is your workflow different than this?

@iainduncani
Copy link

@nicksnyder nicksnyder added this to the v2 milestone Feb 14, 2018
@parkr parkr closed this as completed Mar 20, 2018
@nicksnyder
Copy link
Owner

Can keep this open. It is something that I am thinking about for v2

@nicksnyder nicksnyder reopened this Mar 20, 2018
@nicksnyder nicksnyder self-assigned this Apr 10, 2018
@nicksnyder
Copy link
Owner

v2 includes logic to do this fallback according to CLDR rules (by importing and using golang.org/x/text/language) See #92

@nicksnyder nicksnyder mentioned this issue Apr 10, 2018
Merged
4 tasks
@nicksnyder
Copy link
Owner

I just tagged 2.0.0.beta.1. Please start using it and report any issues that you have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants