How best to handle model failure with certain parameter sets? #134

LindsayLBE · 2017-02-14T15:50:20Z

I'm wondering how best to handle cases where some of the saltelli-generated input parameter sets cause model failure, resulting in no output. How will this interfere with the calculating of sensitivity indices using the Sobol method?

willu47 · 2017-02-14T17:37:15Z

Hi @LindsayLBE. This is indeed a problem which I encountered working with the Method of Morris. Check Section 3.4.4.1 (p. 64) of my thesis for a brief discussion of this and links to a few sources.

jdherman · 2017-02-14T17:52:54Z

Thanks Will. @LindsayLBE yes it will interfere with the Sobol indices calculations -- it will probably make them return NaN, if I had to guess.

My suggested fixes are just hacks, Will might have better options:

If you can afford to do the model runs again, try to find the offending values and reduce the parameter ranges so that they don't occur anymore.
If you can't afford to redo the model runs, and need to just manipulate the data, I'd suggest replacing the missing values with the mean of the other values. This will affect the indices slightly but hopefully not too much.
The one thing you can't do, at least in the case of Sobol, is remove them completely. The sampling depends on a very specific order of parameter sets that is expected to be preserved. If you were using a different method that works with "given data" (i.e. an arbitrary number and ordering of samples), like the Delta method, then you would be able to just remove these samples and calculate the indices using the remaining ones.

Hope that helps,
Jon

juancastilla · 2018-03-10T13:01:46Z

@jdherman @willu47: I am currently facing a similar problem.

I have a (groundwater) model that may or may not converge depending on parameter combinations that are not entirely trivial i.e., this is not straightforward to fix by simply adjusting the max/min of the prior distributions, there are interaction effects (so hack #1 will not work). I do have control of what the model will output in case of infeasible parameter sets—this can be either NaNs or an unrealistic simulated value that will yield a low likelihood (e.g., sim_value=-9999).

Although the sim_value=-9999/NaN approach can work in the case of MCMC sampling (parameter estimation), I think it can/will corrupt the whole SA.

From your previous comments, it seems that for FAST (and Morris, and Sobol) this may be a show-stopper as it relies on a very specific sampling strategy. Can anyone shed some light on whether the “missing” infeasible samples (parameter sets) will corrupt/invalidate the FAST analysis? Roughly, 15% of my samples are infeasible, so for 10,000 samples I get around 8,500 valid samples. FAST will still produce sensitivity indices from the 8500 samples, but are they reliable?

@jdherman, For hack#2, what do you mean exactly by “replacing the missing values with the mean of the other values”? Could you please provide a minimal example of how to do this?

Thanks!

jdherman · 2018-03-10T23:45:10Z

I believe FAST would also have a problem with missing data (like Sobol or Morris), because the samples are generated in a specific order, and the final step of the analysis expects that order to be preserved.

Hack 2 would be something like:

# model output vector Y
meanY = Y[Y != -9999].mean()
Y[Y==-9999] = meanY

This will probably lead to an underestimate of the total variance, but at least would preserve the mean. And it would allow the rest of the analysis to continue.

This is an important question and I don't know of any official way around it. So if anyone reading this has an idea, please let us know!

jdherman · 2018-08-02T16:06:42Z

Here is another idea suggested by @dmey in #206 . This is probably better than filling missing values with the mean:

"""
I couldn't find much about this but I think that using the mean to fill missing values will introduce bias and may under/overestimate the impact of certain parameters over others.

A different approach may be to fill missing values due to model failures by interpolating the values from successes originating from samples close to those that lead the model to fail.

In other words, given a model g(u,w) where u and w are model parameters, and Y is the model-output vector, we could say that at the failure instance g', corresponding to model-output value Y', value Y' can be generated by interpolating the results from those parameters that are in closest proximity to those used when the model failed. In effect, you generate missing values by weighting the results from parameters closest in space with those leading the model to fail.
"""

jdherman · 2019-11-07T22:34:14Z

Reopening this as an issue to be resolved (or at least better dealt with) in v2.0

baherehvojdani · 2020-08-10T07:07:39Z

hello
I don't understand instead of Y = Ishigami.evaluate(param_values) benchmark for 5 inputs, which benchmarks can I use?
I want to generate the sobol sequences and then predict Y1 and Y2 for each row of the sobol sequences matrix with my ML model. Once done, how can I analyze the results with a sobol analyses?

ConnectedSystems · 2020-08-10T07:12:41Z

Hi @baherehvojdani

It seems your question is unrelated to this issue. Could you open a new one and I will try to assist you there.

baherehvojdani · 2020-08-10T07:25:18Z

Hi @baherehvojdani

It seems your question is unrelated to this issue. Could you open a new one and I will try to assist you there.

thanks , I want to learn how can I use the SAlib

willu47 added the question_interpretation label Feb 14, 2017

jdherman mentioned this issue Jul 31, 2018

Handle Model Failures #206

Closed

willu47 mentioned this issue Apr 23, 2019

Failing to calculate sensitivity indexes with NAN values in the model results #237

Closed

jdherman mentioned this issue Jun 18, 2019

what happens if no model output is possible for sample vector? #255

Closed

jdherman mentioned this issue Sep 5, 2019

Method to locate parameter combinations/ranges that cause model failure with Morris? #262

Closed

jdherman closed this as completed Nov 7, 2019

jdherman mentioned this issue Nov 7, 2019

Handling NaN/inf model output #273

Open

willu47 mentioned this issue Apr 27, 2023

Please revise the code for Sobol and fast #566

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How best to handle model failure with certain parameter sets? #134

How best to handle model failure with certain parameter sets? #134

LindsayLBE commented Feb 14, 2017

willu47 commented Feb 14, 2017

jdherman commented Feb 14, 2017

juancastilla commented Mar 10, 2018 •

edited

Loading

jdherman commented Mar 10, 2018

jdherman commented Aug 2, 2018

jdherman commented Nov 7, 2019

baherehvojdani commented Aug 10, 2020

ConnectedSystems commented Aug 10, 2020

baherehvojdani commented Aug 10, 2020

How best to handle model failure with certain parameter sets? #134

How best to handle model failure with certain parameter sets? #134

Comments

LindsayLBE commented Feb 14, 2017

willu47 commented Feb 14, 2017

jdherman commented Feb 14, 2017

juancastilla commented Mar 10, 2018 • edited Loading

jdherman commented Mar 10, 2018

jdherman commented Aug 2, 2018

jdherman commented Nov 7, 2019

baherehvojdani commented Aug 10, 2020

ConnectedSystems commented Aug 10, 2020

baherehvojdani commented Aug 10, 2020

juancastilla commented Mar 10, 2018 •

edited

Loading