shacl - Background knowledge for validation #125

aidig · 2020-03-12T14:03:11Z

This is not - as such - a new issue, but an attempt to highlight and generalize problems raised by init-dcat-ap-de in #115, #116, #116 and #117 where the lack of background knowledge for the validation results in several error types (sh:Violation) when attempting a shacl validation using the DCAT-AP validator: https://www.itb.ec.europa.eu/shacl/dcat-ap/upload

Similarily, all attempts to classify resources directly by using the URI of the skos:Concept individual will produce an error (violation). Not a warning or a message ... an error.

This is problematic.

Although it is fully understandable that the ambition is to make the SHACL constraints as close to the constraints expressed in the specification, it might lead to datasets being described with less details (eg. the contact point is described as a vcard:Kind - although vcard:Organization would be correct and more precise #115) or the publisher having to add quite a lot of background knowledge explictly in the dataset description - doing the job of a reasoner? (eg. specifying that a given landingPage url is in fact a foaf:Document #116)

Furthermore, it is also very interesting to note that examples provided by DCAT 2.0 will produce several shacl violations of the above-mentioned type with the current constraints. (https://github.com/w3c/dxwg/tree/gh-pages/dcat/examples)

Perhaps the severity of these shape types could be weakned from sh:Violation to sh:Warning or even sh:Message?

aidig · 2020-03-12T14:08:45Z

Also, very much agree with dcat-ap-de that examples of valid DCAT-AP dataset descriptions would be very useful indeed (#121) especially seeing that the examples provided by W3C cannot be validated by the DCAT-AP validator.

bertvannuffelen · 2020-04-28T14:31:31Z

@aidig thanks for expressing this issue so clearly. It is indeed the case that the current shacl validation rules implement a very strict interpretation of all the constraints in the specification.

To address this issue, the DCAT-AP community should agree upon a generic approach for each of the validation rules. We should avoid the case that for one range constraint is a error level, for another warning and for a third just informative. We need clear rules otherwise it becomes very unmaintainable.

Connected with this is of-course what is the purpose of the DCAT-AP SHACL validation rules. Are these in the distritution the canonical implementation of the contraints in a machine readible way. Or are they to be used as is in any implementation context like the EDP? This impacts the organisation of the files, but also how the interpretation is being done.

What is the relationship of the SHACL specificication with the human readible specification. E.g. if all range contraints are informative then we should make that clear in the human readible specification.
Currently it reads as MUST. So it is logical that the SHACL interprets MUST as error.

You also pointed out one of the key interpretation choices of using the SHACL rules. Are they being used with inference or not. May we assume that the data exchange does correct inference and that the state obtained after the inference still satisfies the constraints if the constraints are satisfied before? What is the background knowledge that is being assumed for the execution of the validation proces?

So to the community: what are your answers to these so that we can build a SHACL distribution that is complete (all constraints), corresponding to the human text, supportive to create validation processes. I am looking forward to your feedback.

init-dcat-ap-de · 2020-05-04T18:06:31Z

SHACL is "for validating RDF graphs against a set of conditions." If DCAT-AP offers SHACL shapes, we should be able to use them in order to check if a RDF dataset is valid DCAT-AP data, as the EDP is using them at the moment.
(We need them not only for DCAT-AP 2.0 but also for 1.2.1, but that's another story...)

At the moment, they can't be use for this purpose, because e.g. the object of dct:language will not have the class dct:LinguisticSystem. They can't be a direct convertion from the OWL ontology (and why should they, it would be the same information in another dialect...)

Due to the missing inferencing of SHACL validators and them not following IRIs to external sources, I feel like the provided rules should be the minimum of what is considered a valid DCAT-AP RDF. At least for the rules that are a sh:Violation.
Maybe it would be reasonable to provide two sets of rules, the minimum and an additional set with advanced rules?

aidig · 2020-08-04T13:17:58Z

Further related info can be found on the JoinUp page on the SHACL shapes in DCAT-AP context webinar - 26/06/2020: https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/news/shacl-shapes-dcat-ap-webinar

The agenda included:

the organisation of the SHACL templates (files, per constraint, message texts, etc.);
the usage of the SHACL shapes for validation (which background knowledge to include);
handling implementation-specific requirements (addressing differences between the European Data Portal implementation and the DCAT-AP specifications); and
rules about how to express the constraints in the DCAT-AP specifications in SHACL.

On the importance of providing background knowledge with SHACLtemplate for validation, the SEMIC Team proposed two sets of solutions.

• For the DCAT-AP specification: to create SHACL constraints for class membership in a separate file and create options in the DCAT-AP validator with/without class-membership;
• For the implementation part: the implementation could publish its constraints and assumptions against which it validates the input. This can be done by an aggregated SHACL file based on the DCAT-AP SHACL files.

This was referenced Mar 28, 2020

shacl - minor feedback on dcat-ap_2.0.0_shacl_shapes.ttl #126

Closed

shacl - validating the use of controlled vocabularies #127

Closed

aidig mentioned this issue Apr 29, 2020

Spørgsmål mediatyper digst/DCAT-AP-DK#12

Open

init-dcat-ap-de mentioned this issue Oct 12, 2020

SHACL-Webinar: Integrated Background Knowledge & Validating of External Classes #163

Open

bertvannuffelen mentioned this issue Jul 8, 2021

announcement > updated SHACL shapes #193

Closed

bertvannuffelen added the component:shacl label Jul 8, 2021

bertvannuffelen added the status:wont-fix label Sep 3, 2021

bertvannuffelen closed this as completed Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shacl - Background knowledge for validation #125

shacl - Background knowledge for validation #125

aidig commented Mar 12, 2020

aidig commented Mar 12, 2020

bertvannuffelen commented Apr 28, 2020

init-dcat-ap-de commented May 4, 2020

aidig commented Aug 4, 2020 •

edited

Loading

shacl - Background knowledge for validation #125

shacl - Background knowledge for validation #125

Comments

aidig commented Mar 12, 2020

aidig commented Mar 12, 2020

bertvannuffelen commented Apr 28, 2020

init-dcat-ap-de commented May 4, 2020

aidig commented Aug 4, 2020 • edited Loading

aidig commented Aug 4, 2020 •

edited

Loading