-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Method to locate parameter combinations/ranges that cause model failure with Morris? #262
Comments
Hi @TobiasKAndersen , interesting question. There isn't a way to this within SALib, but you could consider running a classification method like CART in scikit-learn to find the parameter combinations that cause model failure. First you'd just have to make a 0/1 output column indicating failures, then train CART on that. Hopefully the resulting decision tree would give some insight into what parameter combinations are causing problems. In case that doesn't work, or doesn't provide useful insight ... the other option you could consider is to use the Delta-MIM method (in SALib) which does not require a specific sampling scheme, and therefore can be used with a set of results where the failures have been removed. Finally, in this issue thread a while ago we talked about replacing missing values with a placeholder so that the sensitivity indices could still be calculated. |
Thanks for all the great advice! |
@jdherman , I have now implemented the CART algorithm and it works quite well as an exploratory model crash tool. So thanks for the advice! |
Cool. One experiment you could try would be to treat your dataset as the "true" one, after removing the crashed samples -- then randomly remove 2% of the samples and see how much the indices change. Do this a couple times to estimate the average effect of removing 2% of samples. But you can't do it with the crashed samples because you need the index values with and without removal. |
Perhaps there is something in your model output files you could use to flag failed runs? That is what I have done for this issue previously. Once you have identified the failed runs, you can boolean index your results dataframe or matrix (which in my case I made to also contain the parameters for that run) to identify what combinations of parameters caused a failure. Subsequently plotting the failed combinations in a scatter matrix was helpful in my case. |
Thanks for the comments, @jdherman and @spizwhiz. Though, I am not sure I understand "But you can't do it with the crashed samples because you need the index values with and without removal.". I first remove model crashes, and then continue to remove 2% and then again 2%... and analyze results. Of course I would need to match the removals in both X and Y dataframes, but do I need the model crash index for anything else? |
Sorry, that wasn't clear. I meant that the "2% removal effect" experiment can't actually use the crashed samples at all, because you don't have the objective function values for those. What you said is what I was thinking. You don't need the model crash index. Also just to be sure, the 2% resampling would be with replacement every time. |
On a related question regarding model failure when running Method of Morris:
I am running a process-based, biogeochemical model with 200+ parameters and get around 5% model failures. I was thinking of trying to locate the parameters and/or parameter combinations as a way to handle/avoid model failure and thereby being able to go forward with Method of Morris analysis. Does anyone has a good method to locate parameters causing model failure?
(I hope this question is not out of the scope for this board) :-)
The text was updated successfully, but these errors were encountered: