Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMT - English - Arabic Model #9

Closed
drkhateeb opened this issue Jan 23, 2021 · 4 comments
Closed

NMT - English - Arabic Model #9

drkhateeb opened this issue Jan 23, 2021 · 4 comments

Comments

@drkhateeb
Copy link

Hello Developers
thanks for these engines, but I did not find Eng-Arabic Model
how to get it?
Regards

Parallel text
http:https://opus.nlpl.eu/MultiUN.php

English-Arabic Moses format
http:https://opus.nlpl.eu/download.php?f=MultiUN/v1/moses/ar-en.txt.zip

Arabic -English TMX
http:https://opus.nlpl.eu/download.php?f=MultiUN/v1/tmx/ar-en.tmx.gz

@TommiNieminen
Copy link
Collaborator

Hi,

There is an English to Arabic MT model available through the Tatoeba-Challenge repository, but unfortunately it is not usable with OPUS-CAT yet. We are working on integrating Tatoeba-Challenge models to OPUS-CAT, so the model will eventually be available (could be fairly soon even).

In principle you can already use Tatoeba-Challenge models by installing them manually (see here, but this particular Arabic model has multiple target variants (acm afb apc apc_Latn ara ara_Latn arq arq_Latn ary arz), and multiple target languages in a single model are not supported yet in OPUS-CAT.

@drkhateeb
Copy link
Author

Thank you for your reply

I have some questions:
1- the (&apos) appears in translated text (Arabic to English) - Trados 2021
how to fix it? - see the screenshot
ags

2- How can I train the model with my own translation memory and my own terms?

3- how to create a new customized model (eng-ar) and (Ar-en)

4- how to donate you to this great job!

contact: [email protected]

@TommiNieminen
Copy link
Collaborator

Thanks for the kind words, let me see if I can answer some of your questions here:

  1. It seems to me that the bilingual corpora that have been used to train the MT model contains the ' XML entity. This is strange, since the corpora are cleaned before training, but sometimes the cleaning fails. I tested the Arabic to English model a bit, and while I also managed to produce the ' in some translations ("Women ' s and girls'"), a normal apostrophe also seems to occur ("From the girls' to the girls'."). So I think the &apos might occur only with this specific phrase, "Women's and girls'". If it does occur so often that becomes a problem, the model might have to be retrained.

2 and 3. It's not possible to train models from scratch (since that would require too much computing power), but the OPUS-MT base models can be fine-tuned (customized). The documentation is a work in progress, but here are instructions for fine-tuning a model with an tmx file: https://helsinki-nlp.github.io/OPUS-CAT/enginefinetune. Another possibility is to use the Fine-tune batch task in Trados (the documentation for that should come next week).

  1. I don't think we can't accept donations due to the legal and bureaucratic consequences, but at this early stage it's very useful to receive feedback, so thanks for that.

@TommiNieminen
Copy link
Collaborator

There is a new release of OPUS-CAT available with English to Arabic models, you can download it from here.

When installing an online model, it takes some time for the model list to download. You will see the English to Arabic models once the text Fetching list of online models, please wait... changes into Downloadable online models.

kuva

The Arabic models are multilingual models (the different varieties of Arabic are treated as different languages), so you have to check the Multilingual models checkbox to see the models.

kuva

-Tommi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants