Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How best to handle model failure with certain parameter sets? #134

Closed
LindsayLBE opened this issue Feb 14, 2017 · 9 comments
Closed

How best to handle model failure with certain parameter sets? #134

LindsayLBE opened this issue Feb 14, 2017 · 9 comments

Comments

@LindsayLBE
Copy link

I'm wondering how best to handle cases where some of the saltelli-generated input parameter sets cause model failure, resulting in no output. How will this interfere with the calculating of sensitivity indices using the Sobol method?

@willu47
Copy link
Member

willu47 commented Feb 14, 2017

Hi @LindsayLBE. This is indeed a problem which I encountered working with the Method of Morris. Check Section 3.4.4.1 (p. 64) of my thesis for a brief discussion of this and links to a few sources.

@jdherman
Copy link
Member

Thanks Will. @LindsayLBE yes it will interfere with the Sobol indices calculations -- it will probably make them return NaN, if I had to guess.

My suggested fixes are just hacks, Will might have better options:

  • If you can afford to do the model runs again, try to find the offending values and reduce the parameter ranges so that they don't occur anymore.

  • If you can't afford to redo the model runs, and need to just manipulate the data, I'd suggest replacing the missing values with the mean of the other values. This will affect the indices slightly but hopefully not too much.

  • The one thing you can't do, at least in the case of Sobol, is remove them completely. The sampling depends on a very specific order of parameter sets that is expected to be preserved. If you were using a different method that works with "given data" (i.e. an arbitrary number and ordering of samples), like the Delta method, then you would be able to just remove these samples and calculate the indices using the remaining ones.

Hope that helps,
Jon

@juancastilla
Copy link

juancastilla commented Mar 10, 2018

@jdherman @willu47: I am currently facing a similar problem.

I have a (groundwater) model that may or may not converge depending on parameter combinations that are not entirely trivial i.e., this is not straightforward to fix by simply adjusting the max/min of the prior distributions, there are interaction effects (so hack #1 will not work). I do have control of what the model will output in case of infeasible parameter sets—this can be either NaNs or an unrealistic simulated value that will yield a low likelihood (e.g., sim_value=-9999).

Although the sim_value=-9999/NaN approach can work in the case of MCMC sampling (parameter estimation), I think it can/will corrupt the whole SA.

From your previous comments, it seems that for FAST (and Morris, and Sobol) this may be a show-stopper as it relies on a very specific sampling strategy. Can anyone shed some light on whether the “missing” infeasible samples (parameter sets) will corrupt/invalidate the FAST analysis? Roughly, 15% of my samples are infeasible, so for 10,000 samples I get around 8,500 valid samples. FAST will still produce sensitivity indices from the 8500 samples, but are they reliable?

@jdherman, For hack#2, what do you mean exactly by “replacing the missing values with the mean of the other values”? Could you please provide a minimal example of how to do this?

Thanks!

@jdherman
Copy link
Member

I believe FAST would also have a problem with missing data (like Sobol or Morris), because the samples are generated in a specific order, and the final step of the analysis expects that order to be preserved.

Hack 2 would be something like:

# model output vector Y
meanY = Y[Y != -9999].mean()
Y[Y==-9999] = meanY

This will probably lead to an underestimate of the total variance, but at least would preserve the mean. And it would allow the rest of the analysis to continue.

This is an important question and I don't know of any official way around it. So if anyone reading this has an idea, please let us know!

@jdherman
Copy link
Member

jdherman commented Aug 2, 2018

Here is another idea suggested by @dmey in #206 . This is probably better than filling missing values with the mean:

"""
I couldn't find much about this but I think that using the mean to fill missing values will introduce bias and may under/overestimate the impact of certain parameters over others.

A different approach may be to fill missing values due to model failures by interpolating the values from successes originating from samples close to those that lead the model to fail.

In other words, given a model g(u,w) where u and w are model parameters, and Y is the model-output vector, we could say that at the failure instance g', corresponding to model-output value Y', value Y' can be generated by interpolating the results from those parameters that are in closest proximity to those used when the model failed. In effect, you generate missing values by weighting the results from parameters closest in space with those leading the model to fail.
"""

@jdherman
Copy link
Member

jdherman commented Nov 7, 2019

Reopening this as an issue to be resolved (or at least better dealt with) in v2.0

@baherehvojdani
Copy link

hello
I don't understand instead of Y = Ishigami.evaluate(param_values) benchmark for 5 inputs, which benchmarks can I use?
I want to generate the sobol sequences and then predict Y1 and Y2 for each row of the sobol sequences matrix with my ML model. Once done, how can I analyze the results with a sobol analyses?

@ConnectedSystems
Copy link
Member

Hi @baherehvojdani

It seems your question is unrelated to this issue. Could you open a new one and I will try to assist you there.

@baherehvojdani
Copy link

Hi @baherehvojdani

It seems your question is unrelated to this issue. Could you open a new one and I will try to assist you there.

thanks , I want to learn how can I use the SAlib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants