export to rdf #54

aklawonn · 2021-11-24T14:31:43Z

In this branch, probeye's export-to-rdf feature should be developed. This feature should allow to export a well-defined inference problem to an rdf-file containing data triples relating to an ontology for parameter estimation problems (see probeye/parameter_estimation_ontology.owl), that describe the problem unambiguously.

update local branch with changes from main

* missing probeye.txt (#56) * explicitly added probeye.txt in setup.cfg * increased version numbers * post-processing (#30) * revised arviz-based post-processing in create_posterior_plot, create_trace_plot * applied black to manually merged sampling.py * fixed figure size of one of the plot types * fixed some problems with the three plot-types * adjusted version number

…into export_to_rdf

aklawonn · 2022-03-15T09:29:20Z

Most of the rdf export work is done at this point. This refers to the export of the problem definition. The inference data is not yet part of the export-framework. However, the export methods (and frankly also some other methods) are not well tested yet. So I'm gonna work on the testing framework now.

joergfunger

I will separately look into the owl first, but there are general questions related to the implementation (that is why I did not review everything yet). In addition, the RDF export seems to be mixed up with other changes.

CHANGELOG.md

joergfunger · 2022-03-23T16:17:04Z

docs/examples/plot_correlation_1D.py

@@ -150,13 +150,15 @@ def response(self, inp: dict) -> dict:
 problem.add_parameter(
 "std_noise",
 "likelihood",
+ domain="(0, +oo)",


looks like w weird composition of +oo. Even though I understand this, wouldn't it be better to pass a (lambda) function that evaluates to true (in the domain) or false outside. Thus, it would be much more flexible than pure intervals.

True. But this is the definition part of the problem where computing methods should be avoided. Providing a specific function might also lead to problems between numpy.arrays and torch.tensors when used by the solvers. The advantage of intervals is their simplicity and therefore easier definition in the ontology. Most application cases should be covered by the interval-option.

There are some applications, where the constraints actually include more than one parameter (e.g. they should be inside circle, on one side of a hyperplane, ..). That is tricky to model and requires additional effort. If it is just a function of the parameter dict (that you yould also provide/document (e.g. https://stackoverflow.com/questions/40828921/parsing-a-string-input-into-a-lambda-function-python) as a string, it is much less code, much easier to extend by someone and thus easier to maintain. As for the pyro, we we should have a short discussion.

I think this can be treated as an extension that is mostly independent from this PR. I opened a new issue #73 for that feature.

probeye/_setup_cfg.py

probeye/definition/inference_problem.py

probeye/definition/parameter.py

joergfunger · 2022-03-23T16:26:38Z

probeye/inference/scipy/likelihood_models.py

+ # this is the formula for the variance of the sum of two normal dist.
+ var = prms["std_model"] ** 2 + prms["std_measurement"] ** 2
+ else:
+ var = prms["std_model"] ** 2


is that the multiplicative case??

No, the additive case. See here.

so what is that std['model'] not sure that I understand that. I thought in the current implementation, the model uncertainty is zero. What exactly is that test for a the model std to be positive this should never be negative (I mean by definition)

The model error is meant here as the error between the model response and the corresponding experimental data. A more elaborate name is the "model prediction error". The checks have been defined before parameter domains have been introduced, because it was possible that the sampler were proposing negative values for std-parameter, which would then lead to an error.

what are std_model and std_measurement?

joergfunger · 2022-03-23T16:28:50Z

probeye/inference/scipy/likelihood_models.py

@@ -2187,8 +2164,10 @@ def loglike(
 if std_meas <= 0:
 return worst_value
 else:
- # consistent with tripy interface
- std_meas = None
+ # in case of zero-residuals, a value of std_meas = 0 leads to a covariance


why is that the residuals and not the value of the function

joergfunger · 2022-03-23T16:36:19Z

probeye/interface/export_rdf.py

+ t1 = iri(peo.single_experiment_data_set(exp_name))
+ t2 = RDF.type
+ t3 = iri(peo.single_experiment_data_set) # type: Union[URIRef, Literal]
+ graph.add((t1, t2, t3))


Not sure if you copied that from van Dung, but I had the impression that owlready allows to instantiate obejcts and only afterwards export to the graph. Here, you directly add all the triples directly.
https://owlready2.readthedocs.io/en/latest/class.html#creating-individuals

Yes, I add the triples directly because I think this is most transparent. The add-function is also not an owlready2-function but is defined in rdflib, see here.

joergfunger

The ontology is quite complex, and I just started to look into the class definitions. From my point of view, it might be possible to remove some of the classes reducing complexity and allowing others to better understand the concepts.

joergfunger · 2022-03-24T05:19:11Z