Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect sentences coordinates #908

Closed
NicolasKieffer opened this issue Apr 15, 2022 · 1 comment
Closed

Incorrect sentences coordinates #908

NicolasKieffer opened this issue Apr 15, 2022 · 1 comment
Labels
bug From Hemiptera and especially its suborder Heteroptera implemented The issue has been implemented

Comments

@NicolasKieffer
Copy link

Sentences sometimes have wrong coordinates.

Sample files used (PDF, TEI & training files) : 60806_R1.zip

Notes:

  • borders are rendered by our application, based on the TEI elements s[coords] values (which are usually correct)
  • GROBID segmentation model have been trained on these PDF (and the fulltext model "recognises the refs correctly")

Case sentence with element <ref> containing char ;

Incorrect coordinates

Exemple 1

PDF (coordinates rendering)

First bugged sentence
image

Group of bugged sentence
image

All sentences of this page
image

TEI (processing)

image

Note : the right part of the ref is no longer in this file (after the ; char)

TEI (training)

image
Note : the entire ref is in this file

Correct coordinates

Exemple 1

PDF (coordinates rendering)

image
image

TEI (processing)

image

TEI (training)

image

Exemple 2

PDF (coordinates rendering)

image
image

TEI (processing)

image

TEI (training)

image

@kermitt2 kermitt2 added the bug From Hemiptera and especially its suborder Heteroptera label Apr 16, 2022
kermitt2 added a commit that referenced this issue Apr 16, 2022
@kermitt2 kermitt2 added the implemented The issue has been implemented label Apr 16, 2022
@kermitt2
Copy link
Owner

Hi @NicolasKieffer !

Many thanks for the very clear and documented issue.

It is fixed in Grobid master, it was just one missing line in the "if" statement :/
The coordinates of the full sentence are correct too now.

<head n="2.1.1" coords="7,90.02,257.53,24.00,10.80;7,144.02,257.53,245.59,10.80">Static Model: 
a single season occupancy analysis</head>
<p>
     <s coords="7,90.02,285.13,391.18,10.80;7,72.02,312.73,464.39,10.80;7,72.02,340.33,467.45,10.80;
                        7,72.02,367.93,465.80,10.80;7,72.02,395.53,272.00,10.80">The MSDOM is a form of the 
     multi-state occupancy model with state uncertainty 
     <ref type="bibr" coords="7,72.02,312.73,118.87,10.80" target="#b59">(MacKenzie et al., 2009;</ref> 
     <ref type="bibr" coords="7,193.90,312.73,97.62,10.80">Nichols et al., 2007)</ref> and is defined below 
     with four states equivalent to the original co-occurrence model 

The fix will be part of next release 0.7.1, which should come in the next days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug From Hemiptera and especially its suborder Heteroptera implemented The issue has been implemented
Projects
None yet
Development

No branches or pull requests

3 participants