How would one deal with categorical data? #195

bretttully · 2018-04-12T22:06:58Z

Hi guys,
Thanks for a great library. I am building a model that has categorical and continuous parameters. I have read a few papers that suggested you could create a categorical parameter by binning a continuous value (e.g. sample with bounds [-1, 1] and the true/false category can be val >= 0). However, I've found that doing so with Saltelli I end up with duplicate sets of parameters in the problem. Do you have any advice?
Thanks!

The text was updated successfully, but these errors were encountered:

jdherman · 2018-04-12T22:40:35Z

Hi Brett,
Thanks for using the library. The approach you described is what I've suggested to people in the past (https://waterprogramming.wordpress.com/2014/02/11/extensions-of-salib-for-more-complex-sensitivity-analyses/).

But, I never thought about the fact that it would result in duplicate samples! If the model is very slow, clearly this is not a good use of computing time. But if the model runs reasonably quickly, is there a downside (?) I'm not sure.

From the perspective of the SA method, a small change (in the continuous variable that it samples), results in "no change" in the output, because the inputs are rounded. This would be the same situation as if the model itself had some discontinuity, even with continuous inputs. So I'd have to think it's still a valid application of these SA methods.

We don't have any sampling methods written for categorical input variables -- mostly because I don't know of any published SA methods that work with them!

bretttully · 2018-04-13T07:23:25Z

Thanks @jdherman -- in case anyone else comes here too, this is what StackOverflow has to say: https://stackoverflow.com/questions/36606101/using-python-salib-saltelli-sample-method-with-boolean-or-discrete-input-paramet

bretttully · 2018-04-13T08:17:25Z

While you are here, I was wondering how you would approach selecting a model in the following scenario:

9 parameters: 5 are boolean, 1 is choice of three, and 3 are float ranges
Each run of the simulation is about 75-90 hours of compute, so we can run this at most 500 times
There isn't a physical model that we can build to understand how the parameters should behave
We are trying to find the parameter set that minimises a cost function

Having watched @willu47's talk (https://www.youtube.com/watch?v=gkR_lz5OptU), it seems like Morris is more appropriate than Sobol...?

jdherman · 2018-04-14T21:14:15Z

That's a tough situation. It's true that Morris tends to converge faster than Sobol. But 500 model runs with 9 uncertain parameters is still not very much. You might consider instead using the 500 evaluations to build a surrogate model, then running SA on that.

willu47 · 2018-04-15T13:47:09Z

Given the expense of your model, it may be worth starting with a fractional factorial approach. This requires a very small sample for the 9 parameters and will at least tell you something.

Another observation is that while your sample may contain duplicate values, you don't need to run the model again, just substitute in the model run from the first combination of variable values.

jdherman added the question label Nov 7, 2019

ConnectedSystems mentioned this issue Sep 10, 2020

Discrete variables concern #341

Closed

ConnectedSystems mentioned this issue Feb 16, 2021

Booleans in bound parameter generates continouos values when using saltelli.sample #400

Closed

ConnectedSystems added this to Misc. in SALib Development Roadmap Sep 4, 2021

ConnectedSystems removed this from Misc. in SALib Development Roadmap Sep 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How would one deal with categorical data? #195

How would one deal with categorical data? #195

bretttully commented Apr 12, 2018

jdherman commented Apr 12, 2018

bretttully commented Apr 13, 2018

bretttully commented Apr 13, 2018

jdherman commented Apr 14, 2018

willu47 commented Apr 15, 2018

How would one deal with categorical data? #195

How would one deal with categorical data? #195

Comments

bretttully commented Apr 12, 2018

jdherman commented Apr 12, 2018

bretttully commented Apr 13, 2018

bretttully commented Apr 13, 2018

jdherman commented Apr 14, 2018

willu47 commented Apr 15, 2018