Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English to Spanish opus-2019-12-04 model creates output with lots of spaces and @s when using the OPUS-CAT plugin on Trados Studio 2022 #100

Closed
shadowplumber opened this issue Jun 24, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@shadowplumber
Copy link

Using OPUS-CAT MT Engine v1.2.0.0, I downloaded the opus-2019-12-04 model (English to Spanish) using OPUS CAT, and then I installed the OPUS-CAT plugin for Trados Studio 2022, and then I created a test project to translate this file: inACountryFoundedByImmigrants.txt.

Once I opened the editor, however, the output has a lot of extra spaces and @ symbols. For example, for this source text segment:
In a country founded by immigrants and characterized by linguistic diversity, ensuring language access is not just a matter of convenience; it's a fundamental human right.

the output was this:
E n u n p a @ @ í s f u n @ @ d @ @ a d o p o r i n @ @ m i g r @ @ a n t e s y c a r @ @ a c @ @ t e r @ @ i z @ @ a d o p o r l a d i v e r @ @ s i d a d l i n g @ @ ü @ @ í @ @ s t i c @ @ a , g @ @ a r a n @ @ t i z @ @ a r e l a c c e s @ @ o a l i d i @ @ o m a n o e s s @ @ ó @ @ l o u n a c u @ @ e s t i @ @ ó n d e c o n v e n i @ @ e n c i a ; e s u n d e r e @ @ c h o h u m @ @ a n o f u n d a m e n t a l .

I tried the translate feature that is built into the OPUS-CAT MT Engine GUI program, and the output was more normal: En un país fundado por inmigrantes y caracterizado por la diversidad lingüística , garantizar el acceso al idioma no es sólo una cuestión de conveniencia ; es un derecho humano fundamental . , so this seems to be some sort of an issue with the plugin.

@TommiNieminen TommiNieminen added the bug Something isn't working label Jul 4, 2024
@TommiNieminen
Copy link
Collaborator

2019-12-04 is an old model, it uses a different tokenization scheme than the newer models, and it seems I forgot to include support for this old tokenization in the new Trados plugin. You can use one of the newer en-es models, they use a different tokenization scheme and they should also have better translation quality.

The old models should probably be retired, but for some language pairs there might only be old models available, so I'll keep supporting them for a while at least. I'll keep this bug in mind for the next plugin update, whenever that happens.

@shadowplumber
Copy link
Author

Ah, I see. Great to know. Thank you for the response and keep up the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants