Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad translations using marian-decoder #29

Open
koren-v opened this issue Oct 6, 2020 · 1 comment
Open

Bad translations using marian-decoder #29

koren-v opened this issue Oct 6, 2020 · 1 comment

Comments

@koren-v
Copy link

koren-v commented Oct 6, 2020

Hi, I've loaded the models from the following directory: https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/ru-en
When I tried some of them I often get translation like: "▁Y O O O O O O O O O O O O O O O O O O O O" or "I 'm b@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@ m@@"
Then I tried to load the model from the Hugging Face site: but get pretty similar outputs while using Hugging Face framework gives good translations. Probably something wrong with config.
I launch it using the Marian library. For example:

 echo "привет" | ./marian-decoder -c /path/to/opus_models/opus-2019-12-05-ru-en/decoder.yml

So what can be wrong?

Probably I somehow should do preprocessing and postprocessing?

@jorgtied
Copy link
Member

You need to preprocess the string first using the provided sentence piece model of the source language. Our models don;t support internal sentence piece segmentation. This needs to be done before piping input to the decoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants