Weird results when translating english to finnish (using EasyNMT with opus-mt) #55

kauttoj · 2022-02-04T12:50:12Z

While translating English to Finnish using your model via EasyNMT, I noticed something weird. Check this code and the results.

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

text='''Religion and theology is the study of religious beliefs, concepts, symbols, expressions and texts of spirituality.
Programmes and qualifications with the following main content are classified here:
Religious history
Study of sacred books
Study of different religions
Theology
=== Inclusions
Included in this detailed field are programmes for children and young people.'''

print(model.translate(text,target_lang='fi'))

The output is:

'Uskonto ja teologia tutkivat uskonnollisia käsityksiä, käsitteitä, symboleja, ilmaisuja ja tekstejä hengellisyydestä.
Ohjelmat ja tutkinnot, joiden pääsisältö on seuraava:
Uskonnollinen historia
Pyhien kirjojen tutkiminen
Eri uskontojen tutkiminen
Teologia
Suomennos: Michael T. Francis Pinmontagne SUBHEAVEN.ORG
Tähän yksityiskohtaiseen kenttään kuuluvat lasten ja nuorten ohjelmat.'

So "=== Inclusions" is translated into "Suomennos: Michael T. Francis Pinmontagne SUBHEAVEN.ORG".

What is going on here? Is this a problem with Opus-MT model or its EasyMT implementation?

PS. The sample text is from ESCO ontology

The text was updated successfully, but these errors were encountered:

jorgtied · 2022-02-07T20:57:35Z

Yes, that looks a bit weird. The model at huggingface does not seem to handle that kind of input well. At least a newer OPUS-MT model does not do that anymore. You can try it here: https://translate.ling.helsinki.fi/ui/memad
It should be from this model: https://object.pouta.csc.fi/Tatoeba-MT-models/eng-fin/opusTCv20210807+bt-2021-12-08.zip

kauttoj · 2022-02-08T10:55:55Z

Thanks for the reply. I was able to solve the problem by using the new Tatoeba model.

Just in case someone has the same problem, just follow these instructions to convert Tatoeba models into Hugginface format:
https://github.com/huggingface/transformers/tree/master/scripts/tatoeba

Then you can use the model with this code (copied from here):

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_MODEL)
# Initialize the model
model = AutoModelForSeq2SeqLM.from_pretrained(PATH_TO_CONVERTED_MODEL)
# Tokenize text
text = "Hello my friends! How are you doing today?"
tokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors='pt')
# Perform translation and decode the output
translation = model.generate(**tokenized_text)
translated_text = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]
# Print translated text
print(translated_text)

PS. Conversion worked only for "eng-fin" model, while "fin-eng" failed because of some dimension mismatch error: "raise ValueError(f"Hidden size {hidden_size} and configured size {cfg['dim_emb']} mismatched or not 512") KeyError: 'dim_emb'"

Closes Helsinki-NLP#55

PhilLint mentioned this issue Jun 7, 2022

(Big) transformer Tatoeba models UKPLab/EasyNMT#71

Open

ianroberts pushed a commit to ianroberts/Opus-MT that referenced this issue Jan 22, 2024

Add debian buster-backports repository for libprotobuf23

5ee3edf

Closes Helsinki-NLP#55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird results when translating english to finnish (using EasyNMT with opus-mt) #55

Weird results when translating english to finnish (using EasyNMT with opus-mt) #55

kauttoj commented Feb 4, 2022 •

edited

Loading

jorgtied commented Feb 7, 2022

kauttoj commented Feb 8, 2022

Weird results when translating english to finnish (using EasyNMT with opus-mt) #55

Weird results when translating english to finnish (using EasyNMT with opus-mt) #55

Comments

kauttoj commented Feb 4, 2022 • edited Loading

jorgtied commented Feb 7, 2022

kauttoj commented Feb 8, 2022

kauttoj commented Feb 4, 2022 •

edited

Loading