ins subscriber name validation fixes see #7477 #7478

adunsulag · 2024-06-13T12:34:51Z

This puts in multibyte case insensitive validation on the subscriber previous name fields. Due to unicode character sets having multiple uppercase characters that correspond to a single lowercase and viceversa with many lowercase to a single uppercase, name identifiers are normalized into unicode NFKC form before comparison.

Fixes #7477

This puts in multibyte case insensitive validation on the subscriber previous name fields. It does not fully fix the issue for all unicode names as apparently some unicode character sets can have multiple uppercase characters that correspond to a single lowercase and viceversa with many lowercase to a single uppercase. Some research shows we need to normalize the characters and then do a comparison but need to do more research to see if this is an actual valid assumption or not before implementing. This will almost fix openemr#7477 for nearly all issues, but the edge cases still need to be addressed.

bradymiller · 2024-06-14T08:57:20Z

src/Validators/CoverageValidator.php

+ ($previousName['previous_name_last'] == $values['subscriber_lname']
+ || mb_strtolower($previousName['previous_name_last']) == mb_strtolower($values['subscriber_lname'])
+ )
+ ) {


could just bite bullet and put following function in sanitize.inc.php (this file always including in all scripts):

function mb_is_string_equal_ci($string1, $string2): bool { if ($string1 == $string2) { return true; } $string1_normalized = Normalizer::normalize($string1, Normalizer::FORM_KC); $string2_normalized = Normalizer::normalize($string2, Normalizer::FORM_KC); return mb_strtolower($string1_normalized) === mb_strtolower($string2_normalized) || mb_strtoupper($string1_normalized) === mb_strtoupper($string2_normalized); }

edit: also good make a quick util class for this function

then have a nice function to use for when doing name lookups etc. that will support internationalization (and it will get tested over time)

could also throw bunch of basic unit testing on function at ci akin to this (nice way to test things without needing to manually run them through the api to test):
https://github.com/openemr/openemr/blob/master/tests/Tests/Unit/PasswordVerifyTest.php

I'll look at building out the unit tests and getting this in.

@bradymiller I went ahead and wrote a unit test and implemented the mb_is_string_equal_ci function. Let me know what you think.

Did a bunch of research on the unicode normalization process to make sure we were using the correct normalization form. Implemented a unit test to verify various test cases. Everything appears to be working correctly.

bradymiller · 2024-06-27T06:09:23Z

composer.json

@@ -81,7 +81,8 @@
 },
 "config": {
 "platform": {
- "php": "8.1"
+ "php": "8.1",
+ "ext-intl": "8.1"


Just checking if above line is needed. Generally less is better in these composer.json things.

The Normalizer class is built into the php international extension. Without the extension the functionality will fatal error. I thought it'd be best to be explicit about the dependency than to have it work for some people (you install it automatically w/ docker) and not work for others.

@bradymiller are you ok with keeping this in?

hi @adunsulag , sorry for the delayed reply on this. I generally just use platform to force the build to be compatible with the lowest supported php version (in this case 8.1), so it does not use the php version that is on the server where the build is happening. No need for packages here. Can force dependencies in the require section, but don't do this since the build is generally not done on the actual server.

@bradymiller php-intl isn't a package, its a php extension, you can't actually install it w/ composer, the intent is to warn anyone that does a composer install that their server is not setup properly as it is missing the php extension that OpenEMR relies upon, people can bypass it by doing:

composer install --ignore-platform-reqs

We can remove the warning but if someone is missing the php extension anywhere that extension is used will fail.

bradymiller · 2024-06-27T06:10:18Z

library/sanitize.inc.php

+ $string2_normalized = Normalizer::normalize($string2, Normalizer::FORM_KC);
+ return mb_strtolower($string1_normalized) === mb_strtolower($string2_normalized)
+ || mb_strtoupper($string1_normalized) === mb_strtoupper($string2_normalized);
+}


bradymiller · 2024-06-27T06:13:20Z

src/Validators/CoverageValidator.php

+ // we need to do some normalizations as per this stackoverflow post: https://stackoverflow.com/a/38855868
+ // I'm not incorporating the normalization techniques yet as we need to better understand their implications before
+ // applying the fix
+ // TODO: look at normalizing the multi-byte strings to better handle international languages where multiple unicode characters correspond to the same uppercase or lowercase character


looks like above comments need updating

bradymiller · 2024-06-27T06:17:32Z

Hi @adunsulag , just noted couple minor issues. Otherwise looks great!

stephenwaite · 2024-06-27T12:19:39Z

library/sanitize.inc.php

+/**
+ * Compares a multibyte unicode string identifier in a case insensitive way to see if the two strings
+ * are semantically identical. Note that NFKC will treat several 'similar' semantically meaning texts as the same and so
+ * should be used for identifiers (things such as proper nouns, etc).


hi @adunsulag, should this comment say should not be used?

@stephenwaite no, I believe from my research this is correctly stated. From the unicode website:

Q: Which forms of normalization should I support?
The choice of which to use depends on the particular program or system. NFC is the best form for general text, since it is more compatible with strings converted from legacy encodings. NFKC is the preferred form for identifiers, especially where there are security concerns (see UTR #36). NFD and NFKD are most useful for internal processing.

Also the security concerns are more clearly discussed here:
https://www.unicode.org/reports/tr36/#Text_Comparison

I thought about having two separate functions one to be used for general text and one to be used for identifiers, but I decided to just go with the single function. If you see value in it, I can make the two functions.

NFKC looks good, thanks

adunsulag added the Status: Needs Review label Jun 13, 2024

adunsulag requested a review from bradymiller June 13, 2024 12:34

stephenwaite approved these changes Jun 13, 2024

View reviewed changes

bradymiller reviewed Jun 14, 2024

View reviewed changes

adunsulag added Status: Needs Work and removed Status: Needs Review labels Jun 25, 2024

Implement multi-byte normalization

0837002

Did a bunch of research on the unicode normalization process to make sure we were using the correct normalization form. Implemented a unit test to verify various test cases. Everything appears to be working correctly.

adunsulag added Status: Needs Review and removed Status: Needs Work labels Jun 27, 2024

Remove unused namespace again...

e8a9f73

bradymiller reviewed Jun 27, 2024

View reviewed changes

Remove old TODO comments

8a45fe3

stephenwaite reviewed Jun 27, 2024

View reviewed changes

adunsulag merged commit fdadc05 into openemr:master Jun 28, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ins subscriber name validation fixes see #7477 #7478

ins subscriber name validation fixes see #7477 #7478

adunsulag commented Jun 13, 2024 •

edited

Loading

bradymiller Jun 14, 2024 •

edited

Loading

bradymiller Jun 14, 2024

bradymiller Jun 14, 2024

adunsulag Jun 25, 2024

adunsulag Jun 27, 2024

bradymiller Jun 27, 2024

adunsulag Jun 27, 2024

adunsulag Jun 28, 2024

bradymiller Jun 29, 2024

adunsulag Jun 29, 2024

bradymiller Jun 27, 2024

bradymiller Jun 27, 2024

adunsulag Jun 27, 2024

bradymiller commented Jun 27, 2024

stephenwaite Jun 27, 2024

adunsulag Jun 27, 2024

adunsulag Jun 27, 2024

stephenwaite Jun 27, 2024

ins subscriber name validation fixes see #7477 #7478

ins subscriber name validation fixes see #7477 #7478

Conversation

adunsulag commented Jun 13, 2024 • edited Loading

bradymiller Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradymiller commented Jun 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adunsulag commented Jun 13, 2024 •

edited

Loading

bradymiller Jun 14, 2024 •

edited

Loading