Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning says: "not enough parallel segments in the tmx" #88

Closed
HMueller007 opened this issue Jul 12, 2023 · 5 comments
Closed

Finetuning says: "not enough parallel segments in the tmx" #88

HMueller007 opened this issue Jul 12, 2023 · 5 comments

Comments

@HMueller007
Copy link

Hi,

when I want to fine-tune the model with a TMX from a (Wordfast) project it says: "not enough parallel segments in the TMX".

It has more than 600 bilingual segments (so about 1300 segments in total if you count source and target language segments separately) from a finished project. Is this really not enough? How many do you need?

@Khalid-kamal
Copy link

It needs at least 1000 TUs

@TommiNieminen
Copy link
Collaborator

Hi, the finetuning needs a bit of data to work on, so there's a minimum requirement of 1000 translation units (pairs of source and target language segments). This is an arbitrary number, and you probably need more than 1000 to have a noticeable effect. If you still want to try it with 600 translation units, you can change the FinetuningSetMinSize setting in the OpusCatMTEngine.exe.config file.

@HMueller007
Copy link
Author

Hi, thanks for the answers @ALL. I actually tried instead the function to upload a source and a corresponding target file derived from the same TM and it worked, it improved the translations even with this small size. But I might also try this other setting, thank you.

@SafeTex
Copy link

SafeTex commented Jul 18, 2023

Hello HMueller007 and all

What I sometimes do to get around this is to import a simple two column TB (glossary) into memoQ for the same job as the translation job I'm doing and then export all that to the TB for the same job. The segments are small of course but they are very relevant to the job and as Opus does not have any TB function at present to instruct the MT engine, this feels like an intuitive way to proceed. This often gets the TB to exceed the minimum number of segments restriction setting

@HMueller007
Copy link
Author

@SafeTex That's a good tip, will try this, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants