Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-translatable functionality #70

Open
SafeTex opened this issue Mar 7, 2023 · 2 comments
Open

Add non-translatable functionality #70

SafeTex opened this issue Mar 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@SafeTex
Copy link

SafeTex commented Mar 7, 2023

Hello Tommi and all

In my present job with a lot of Swedish proper nouns for organizations, associations etc. Fiskmö changes words that it can't understand but does not actually translate them, as in:

MT

While I can understand that if the MT engine could translate say 90% of such proper nouns, it might be programmed or tempted to do so, but it's much more debatable here, as fFIskmö has not translated any part of the word(s).

Would it not be better for Fiskmö to leave the word then? On what basis does it change a word without ever translating it? It seems strange, especially in the second example, "Guldsmedsbranschens Leverantörsförening" > "Goldsmedsbrakensförening," for reasons evident to you as a Swedish speaker.

What do you make of this Tommi and others please?

Thanks

@TommiNieminen
Copy link
Collaborator

These are proper nouns that have probably never occurred in the training material, so the NMT system has no clear examples on how to handle them. Ideally the system still learns to identify unseen proper nouns (probably based on features such as capitalization and certain trigger words) and also learns to copy them into the translation in the same form. But the process is fuzzy (by necessity, since proper noun translation is pretty fuzzy, consider e.g. organization names that ARE translated, like the UN etc.) Here the model has learnt a weird mixed behavior, where it corrupts the proper noun while still keeping it in Swedish.

Some kind of named entity recognition combined with an option where you could specify whether entities need to translated or copied into the translation might be a good idea, I'll mark this as a potential improvement (it also has some synergies with the terminology support).

@TommiNieminen TommiNieminen added the enhancement New feature or request label Mar 9, 2023
@SafeTex
Copy link
Author

SafeTex commented Mar 9, 2023

Hello Tommi and all

Just in case you don't know, memoQ also has a "non translatable" feature that is separate from its TB (termbase)
I'm going to send you a non translatable file so you can see its structure.
Ideally, it would be great if Opus could handle such files rather than translators adding "non translatable" terms to Opus one by one.

I know that's asking a lot (again) but if I don't mention it and send you such a file, then there's even less chance of Opus being able to handle such a file.

But as it's a text file, I guess that translators could remove the header and tags if that is what it takes to load such a file in one go into Opus

Regards
Dave

@TommiNieminen TommiNieminen changed the title Fiskmö changes words that it can't understand but does not translate them, making things "worse". Add non-translatable functionality Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants