Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shacl - Background knowledge for validation #125

Closed
aidig opened this issue Mar 12, 2020 · 4 comments
Closed

shacl - Background knowledge for validation #125

aidig opened this issue Mar 12, 2020 · 4 comments

Comments

@aidig
Copy link

aidig commented Mar 12, 2020

This is not - as such - a new issue, but an attempt to highlight and generalize problems raised by init-dcat-ap-de in #115, #116, #116 and #117 where the lack of background knowledge for the validation results in several error types (sh:Violation) when attempting a shacl validation using the DCAT-AP validator: https://www.itb.ec.europa.eu/shacl/dcat-ap/upload

Similarily, all attempts to classify resources directly by using the URI of the skos:Concept individual will produce an error (violation). Not a warning or a message ... an error.

This is problematic.

Although it is fully understandable that the ambition is to make the SHACL constraints as close to the constraints expressed in the specification, it might lead to datasets being described with less details (eg. the contact point is described as a vcard:Kind - although vcard:Organization would be correct and more precise #115) or the publisher having to add quite a lot of background knowledge explictly in the dataset description - doing the job of a reasoner? (eg. specifying that a given landingPage url is in fact a foaf:Document #116)

Furthermore, it is also very interesting to note that examples provided by DCAT 2.0 will produce several shacl violations of the above-mentioned type with the current constraints. (https://github.com/w3c/dxwg/tree/gh-pages/dcat/examples)

Perhaps the severity of these shape types could be weakned from sh:Violation to sh:Warning or even sh:Message?

@aidig
Copy link
Author

aidig commented Mar 12, 2020

Also, very much agree with dcat-ap-de that examples of valid DCAT-AP dataset descriptions would be very useful indeed (#121) especially seeing that the examples provided by W3C cannot be validated by the DCAT-AP validator.

@bertvannuffelen
Copy link
Contributor

@aidig thanks for expressing this issue so clearly. It is indeed the case that the current shacl validation rules implement a very strict interpretation of all the constraints in the specification.

To address this issue, the DCAT-AP community should agree upon a generic approach for each of the validation rules. We should avoid the case that for one range constraint is a error level, for another warning and for a third just informative. We need clear rules otherwise it becomes very unmaintainable.

Connected with this is of-course what is the purpose of the DCAT-AP SHACL validation rules. Are these in the distritution the canonical implementation of the contraints in a machine readible way. Or are they to be used as is in any implementation context like the EDP? This impacts the organisation of the files, but also how the interpretation is being done.

What is the relationship of the SHACL specificication with the human readible specification. E.g. if all range contraints are informative then we should make that clear in the human readible specification.
Currently it reads as MUST. So it is logical that the SHACL interprets MUST as error.

You also pointed out one of the key interpretation choices of using the SHACL rules. Are they being used with inference or not. May we assume that the data exchange does correct inference and that the state obtained after the inference still satisfies the constraints if the constraints are satisfied before? What is the background knowledge that is being assumed for the execution of the validation proces?

So to the community: what are your answers to these so that we can build a SHACL distribution that is complete (all constraints), corresponding to the human text, supportive to create validation processes. I am looking forward to your feedback.

@init-dcat-ap-de
Copy link

SHACL is "for validating RDF graphs against a set of conditions." If DCAT-AP offers SHACL shapes, we should be able to use them in order to check if a RDF dataset is valid DCAT-AP data, as the EDP is using them at the moment.
(We need them not only for DCAT-AP 2.0 but also for 1.2.1, but that's another story...)

At the moment, they can't be use for this purpose, because e.g. the object of dct:language will not have the class dct:LinguisticSystem. They can't be a direct convertion from the OWL ontology (and why should they, it would be the same information in another dialect...)

Due to the missing inferencing of SHACL validators and them not following IRIs to external sources, I feel like the provided rules should be the minimum of what is considered a valid DCAT-AP RDF. At least for the rules that are a sh:Violation.
Maybe it would be reasonable to provide two sets of rules, the minimum and an additional set with advanced rules?

@aidig
Copy link
Author

aidig commented Aug 4, 2020

Further related info can be found on the JoinUp page on the SHACL shapes in DCAT-AP context webinar - 26/06/2020: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/news/shacl-shapes-dcat-ap-webinar

The agenda included:

  • the organisation of the SHACL templates (files, per constraint, message texts, etc.);
  • the usage of the SHACL shapes for validation (which background knowledge to include);
  • handling implementation-specific requirements (addressing differences between the European Data Portal implementation and the DCAT-AP specifications); and
  • rules about how to express the constraints in the DCAT-AP specifications in SHACL.

On the importance of providing background knowledge with SHACLtemplate for validation, the SEMIC Team proposed two sets of solutions.

• For the DCAT-AP specification: to create SHACL constraints for class membership in a separate file and create options in the DCAT-AP validator with/without class-membership;
• For the implementation part: the implementation could publish its constraints and assumptions against which it validates the input. This can be done by an aggregated SHACL file based on the DCAT-AP SHACL files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants