Skip to content

Converting bilingual translation files to MT format

Notifications You must be signed in to change notification settings

ymoslem/file-converters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

file-converters

Converting bilingual translation files to the MT format: file.source and file.target

TMX2MT

Converts a TMX translation memory into two files, source and target, for machine translation training.

There are two scripts that do the same thing, one with XML minidom and one with XML ElementTree. Please note that TMX2MT-ElementTree.py is supposed to be faster, and supports segments that contain inline tags.

Whatever script you decide to use, run it in the Terminal/CMD as follows:

python3 <script.py> <tmx_file_name.tmx> <source_lang> <target_lang>

Update

  • Check out the new notebook (here) that uses the xmltodict library. The notebook converts a sophisticated TMX translation memory into two formats:
    • Moses format (text file, one segment per line)
    • GPT fine-tuning format (JSON lines / JSONL)

Questions

If you have questions or suggestions, please feel free to contact me.

About

Converting bilingual translation files to MT format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published