Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra spaces before and after apostrophes #85

Closed
DominiquePivard opened this issue Jul 3, 2023 · 2 comments
Closed

Extra spaces before and after apostrophes #85

DominiquePivard opened this issue Jul 3, 2023 · 2 comments

Comments

@DominiquePivard
Copy link

The following EN segment:
No explosive or fire projection effects were observed from the batteries during testing;
translates in FR as:
Aucun effet d ' explosion ou de projection d ' incendie n ' a été observé dans les batteries au cours des essais;

image

Any idea why there’s an extra space before and after each apostrophe? It seems to be an isolated case, as I’ve seen many other segments in the same document with apostrophes in FR, but no extraneous spaces.

@SafeTex
Copy link

SafeTex commented Jul 20, 2023

Hello Dominique and Tommi

I have observed the following but have no idea why.

If the French sentence has no final punctuation or a full stop or even a colon at the end, it is translated correctly, for example

Capture with full stop

But if I put a semi-colon at the end, I can then produce Dominique's bug

image

So it possibly has something to do with the semi-colon at the end, however improbable that may sound.

What do you think Tommi?

@TommiNieminen
Copy link
Collaborator

It's weird, probably due to the training corpus containing many instances of extra spaces before and after apostrophes. The neural network has learned to associate certain trigger words/characters with the extra spaces, so they will manifest semi-randomly. You could call it overlearning by the model, interpreting meaningful patterns in what are just pre-processing errors.

You can use a post-edit rule to fix these kinds of problems, in this case just replacing " ' " with "'" would like work, since I can't think of any legit case of an apostrophe occurring with spaces on both sides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants