Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blatantly wrong translation of common Swedish word #78

Closed
SafeTex opened this issue Jun 21, 2023 · 4 comments
Closed

Blatantly wrong translation of common Swedish word #78

SafeTex opened this issue Jun 21, 2023 · 4 comments

Comments

@SafeTex
Copy link

SafeTex commented Jun 21, 2023

Hello

The Swedish to English standard model has just given me:

image

There are many translations that we all could query, but very often we can see that a source word can be translated in many ways depending on the context, and that the translation is not right in our particular context

But here we would be hard pushed to imagine that the Swedish word could ever be translated as "reliable". For those who don't read Swedish, the correct translation would be "impatient" or any synonym of "impatient".

"Reliable" in Swedish is "pålitlig" which has some "similarities" with otålig. The clusters "ål" and "lig" and both have the letter "t" in the middle.

I've also checked that memoQ has nothing to do with this using the stand-alone feature

image

Does this give us a clue as to how the MT engine works in some way??? Nevertheless, the error is pretty surprising for a Swedish MT engine as both words are common, and should appear in any large general database.

What do you make of this Tommi?

Thanks

@SafeTex
Copy link
Author

SafeTex commented Jun 21, 2023

Very next line of my job today

image

so I went and checked and I'm indeed using the downloadable Swedish to English MT engine

image

Does this indicate some sort of compiling error in the database???

@SafeTex
Copy link
Author

SafeTex commented Jun 21, 2023

Hello

I think I've worked it out.

It's the capital letter that is causing the problem. We've already discussed words in all uppercase and why they are not translated well but I did not realize that this would have an effect on single words starting with a capital letter and taken out of all other context (word list)

Regards

@TommiNieminen
Copy link
Collaborator

Yeah, looks like short fragments are problematic, especially with rarer words. I see you are using the 2019-12-05 model, that's fairly old, you would probably get better results with opus+bt-2021-04-30, from the Tatoeba-MT-models. The bt in the model name stands for backtranslation, which should make the model more robust with rarer words.

@SafeTex
Copy link
Author

SafeTex commented Jun 21, 2023

Hello Tommi

I had no idea that the models are updated.

Anyway, with the new model, the results are much better on this word list

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants