Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for lexical space of xsd:decimal #157

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ajnelson-nist
Copy link
Contributor

@ajnelson-nist ajnelson-nist commented Sep 20, 2022

This PR is being filed to try to diagnose an oddity with SHACL validation of some literals typed as xsd:decimal but not being recognized as xsd:decimal. The behavior arose with the release of pySHACL 0.20.0. (A cross-reference will come momentarily, I needed something to track in the pySHACL repo.)

Before merging this PR:

  • Resolve runtime error from OWL-RL inferencing.
  • Discuss if another test should be added to handle JSON-LD default behaviors.

This patch adds a unit test for `xsd:decimal` values, both in PASS and
XFAIL cases.

There is one issue apparent, left as a TODO in the last test.

Signed-off-by: Alex Nelson <[email protected]>
With the current import of OWL-RL, 6.0.2, this raises a runtime error.

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Examples that referenced this pull request Sep 20, 2022
pySHACL 0.20.0, recently released, includes support for incorporating
ill-typedness of literals in review of SHACL Datatype Constraints.  For
unknown reasons, this is now causing some `xsd:decimal` literals to be
flggged as non-conformant.

This is being discussed further in pySHACL PR 157.

References:
* RDFLib/pySHACL#157

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist
Copy link
Contributor Author

ajnelson-nist commented Sep 20, 2022

The instigator for this PR was seeing new ValidationResults arise for xsd:decimal values in this patch. From the SHACL validation output, I could not tell what was going on.

When I looked at the JSON-LD being validated, I realized that there is a possibly undefined behavior in the JSON-LD specification. This was getting interpreted by RDFLib as a xsd:double long enough to trigger a ValidationResult, but by the time the pySHACL report-graph was being generated, it was being interpreted as a xsd:decimal:

{
    "@type": "xsd:decimal",
    "@value": 48.860346
}

The patch preventing the default behavior (with spec. citation) is here, and the follow-on patch demonstrating the validation results are undone is the third in this PR.

@nicholascar - is this something that needs to be fixed or clarified in RDFLib's JSON-LD parsing code?

I'm hesitant to augment this xsd:decimal test with a JSON-LD "default behaviors" test until I understand whether this is truly an undefined-behaviors corner-case of the specification.

@ashleysommer
Copy link
Collaborator

Hi @ajnelson-nist
In JSON-LD, the lexical value for a Decimal literal must be enclosed in quotation marks.

This structure is ill-typed:

{
    "@type": "xsd:decimal",
    "@value": 48.860346
}

All non-integer numbers in JSON are interpreted by Python as a Float. So when it is read by RDFLib, the discrepency between the @type: xsd:decimal and the @value being a Float, causes the literal to be flagged as Ill-typed.

PySHACL v0.20.0 introduced the feature to check for ill-typed literals, so that is why these errors are now seen when upgrading to that version.

EDIT: I just noticed you have discovered that already, and documented it in this fix here.

To answer your question, the behaviour of RDFLib in this case is correct. RDFlib v6.2.0 has exactly the same behaviour of previous RDFLib versions, except now with the addition that it flags these discrepencies with the Ill-Typed flag, for greater visibility.

@ajnelson-nist
Copy link
Contributor Author

@ashleysommer Thank you, especially for the part I'd missed, that Python's JSON parser was causing the initial conversion to Float. I'll add a JSON-LD snippet to the test to demonstrate this issue.

Referencing the confusing-looking SHACL validation results again---these were the SHACL validation results from the ill-typed JSON-LD---it looks like this may be a nefarious data issue to explain to users. The Turtle serializes a text snippet that looks, and is, properly typed. I'm guessing it's not possible (or at least not a good idea) to "carry forward" the original ill-typed data into Turtle. Is there anything RDFLib or pySHACL can do to flag this ill typing? I'm guessing flagging would have to happen in the JSON-LD parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants