Classifying Parallel Sentences as Machine or Human Translation

Corrosponding blog post can be found here#

The classifier is implemented in the script classifier.py that can be found in the directory code/. The script accepts data partitioned into train and test directories containing the following file names:

source_ht : A text file containing the source sentences that were translated by a human
trans_ht : A text file containing the target sentences translated by a human
source_mt : A text file containing the source sentences that were translated by a machine
trans_mt : A text file containing the target sentences translated by a machine

Each sentence in a given line number in the source file corresponds to the sentence in the same line number in the trans_ht and trans_mt files.

Specifying train and test data:

By default, the script will use the data provided in the directory data_for_code/. To specify which aligned sentence pairs to use as training data use the "-tr" flag followed by the directory where the training data is stored. To specify aligned sentence pairs to use as test data, use the "-te" flag followed by the directory where the test data is stored. With out any specified parameters, the classifer trains on the aligned sentence pairs in data_for_code/train and tests on the aligned sentence pairs in data_for_code/dev.

Specifying the type of classifier:

By default, the classifier uses an Support Vector Machine. To change which type of classifier used, uncomment any line between line numbers 173 - 178 in the classifier.py. As of now, this is not a command line argument.

For any questions or comments, please email me at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
clean_data		clean_data
code		code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying Parallel Sentences as Machine or Human Translation

Corrosponding blog post can be found here#

Specifying train and test data:

Specifying the type of classifier:

About

Releases

Packages

Languages

azpoliak/MTorHT

Folders and files

Latest commit

History

Repository files navigation

Classifying Parallel Sentences as Machine or Human Translation

Corrosponding blog post can be found here#

Specifying train and test data:

Specifying the type of classifier:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages