-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ere 398 #22
Conversation
and authoredOn. Now works fine with Dens pdf.
This looks OK to me. @OmarMalass please confirm so we can merge this. Thanks. |
import java.util.regex.Pattern; | ||
|
||
public class PractitionerPatterns extends Patterns { | ||
public final Pattern NAME_PREFIX = Pattern.compile("(Dr)\\."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other possible prefixes: (med. prof.), can be a composite prefix like: "Dr. med."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, support of them added.
public class PractitionerPatterns extends Patterns { | ||
public final Pattern NAME_PREFIX = Pattern.compile("(Dr)\\."); | ||
public final Pattern NAME_LINE = Pattern.compile("(Dr\\.)*([a-z A-Z]+)(-){0,1}([a-z A-Z]+)"); | ||
public final Pattern STREET_LINE = Pattern.compile("^[a-z A-Zß]+(\\.{1})?.*[0-9]+"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are German characters that won't be captured by the letter pattern. You can use \w instead.
I purpose using \s to capture whitespace instead of space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\w actually matches [A-Za-z0-9_], not any German character. I've added äöüÄÖÜß to the list of accepted letters, that will do the trick :)
I don't think we should match whitespace here as such character would involve an error or weird behavior in our parsing.
matchAndExtractLine(lines, patterns.PHONE_LINE).ifPresent(this::parsePhoneNumber); | ||
matchAndExtractLine(lines, patterns.CITY_LINE).ifPresent(this::parseAddressLine); | ||
matchAndExtractLine(lines, patterns.STREET_LINE).ifPresent(this::parseStreetLine); | ||
matchAndExtractLine(lines, patterns.NAME_LINE).ifPresent(this::parseNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also consider extracting the doctor's qualifications. Is there a set of possible values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the type of doctor he is, like dentist or general doctor? I would say here let's wait until we need to do it to have a clearer idea of what we need to support. I've heard that dentists will be one of our main user soon,
Practitioner information now filled from extraction to bundle, compatible with DENS pdf1 + both pdf from CGM. Tests of parser + ExtractionToBundleWorkflowTest completed.
Also fixed some fields (including birthdate and authoredOn) in RegexParser, last time I fixed them in Muster16SvgExtractorParser as I didn’t know the regexParser existed. Javadoc added in the extractorParser to indicate it’s deprecated.
I’ve attached in the Jira task the test1.pdf with the number in the practitioner name removed as we shouldn’t support number in this field (or should we?) and as I couldn’t push in the secret repo, just add it in the same folder as test1.pdf.