EconomicRatio PostProcessor Enhancements #1763

dgarrett622 · 2022-02-03T18:04:54Z

Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

#1752

What are the significant changes in functionality due to this change request?

Metrics from BasicStatistics are requestable using EconomicRatio alone
Standard error for percentile (in BasicStatistics) implemented including addition of percent to percentile standard error variable name for clarity - e.g., percentile_10_ste_x
Standard error for value at risk (in EconomicRatio) implemented
Added threshold to variable names valueAtRisk (including standard error), expectedShortfall, gainLossRatio, and sortinoRatio for clarity (EconomicRatio) - e.g., VaR_0.05_NPV, VaR_0.05_ste_NPV, es_0.10_NPV, gainLoss_zero_NPV, sort_median_NPV

For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

1. Review all computer code.
2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
8. If an analytic test is changed/added is the the analytic documentation updated/added?
9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

…hecking no need to overload

…tics metrics

…nomicRatio and not throw warnings

…er and inherit

…rtinoRatio, gainLossRatio, and expectedShortfall

…iles

… EconomicRatio

dgarrett622 · 2022-02-03T18:06:14Z

I think tests will need to be modified or added for BasicStatistics, but I don't know which ones. I did not find tests for EconomicRatio, so suggestions on good tests to write are appreciated.

PaulTalbot-INL

This looks great! Nice implementation of the percentile STE and callback to BasicStatistics.

A couple comments and one change suggestion to review.

framework/Models/PostProcessors/BasicStatistics.py

framework/Models/PostProcessors/EconomicRatio.py

PaulTalbot-INL · 2022-02-03T19:06:19Z

I think tests will need to be modified

You can find the BasicStatistics with standard error tests in raven/tests/framework/PostProcessors/BasicStatistics, see for example ste_mc.xml. You should be able to add percentile STE to that test, probably.

For EconomicRatio, I sent an email seeing if anyone knows where one is; it might need to be created new.

PaulTalbot-INL · 2022-02-03T21:36:22Z

Code changes are good, we just need a test to cover EconomicRatio.

I note that several tests using BasicStatistics are failing due to the added percentile STE column, so we'll have to regold those after checking they're the same otherwise.

PaulTalbot-INL · 2022-02-09T15:09:09Z

Notification to plugin developers should be sent noting that use of the BasicStatistics is producing more information than it used to, which may fail some tests using the output from that postprocessor.

dgarrett622 · 2022-02-09T15:15:01Z

@wangcj05 do we want to send an email to the user group about the new percentile standard error?

wangcj05 · 2022-02-09T16:00:18Z

@dgarrett622 Email to the user group is optional, we are required to send the email when there is a defect in the code that will cause incorrect results.

wangcj05 · 2022-02-16T03:49:11Z

framework/Models/PostProcessors/BasicStatistics.py

+ kde = stats.gaussian_kde(group.values, weights=targWeight)
+ val = calculatedPercentiles[target].sel(**{'percent': pct, self.pivotParameter: label}).values
+ subPercentileSte.append(factor/kde(val)[0])
+ percentileSte.append(subPercentileSte)
+ da = xr.DataArray(percentileSte, dims=('percent', self.pivotParameter), coords={'percent': percent, self.pivotParameter: self.pivotValue})
+ percentileSteSet[target] = da
+ else:
+ calcPercentiles = calculatedPercentiles[target]
+ if targDa.values.min() == targDa.values.max():
+ # distribution is a delta function, so no KDE construction
+ percentileSte = list(np.zeros(calcPercentiles.shape))
+ else:
+ # get KDE
+ kde = stats.gaussian_kde(targDa.values, weights=targWeight)
+ factor = np.sqrt(np.array(percent)*(1.0 - np.array(percent))/en)
+ percentileSte = list(factor/kde(calcPercentiles.values))


@dgarrett622 I have a question regarding the calculation of standard error for percentile, from the reference (see the following snapshot)

The standard normal distribution is used instead of KDE of real data, could you double check with the reference, and let me know what do you think?

@wangcj05 That reference gives an equation for the asymptotic standard error estimate of a percentile assuming the random variable belongs to a normal distribution. There is a more general form of this asymptotic estimate that can be found in various references such as:

https://www.medicine.mcgill.ca/epidemiology/hanley/bios601/DescriptiveStatistics/Var(percentile).pdf

Here is a snapshot of the more general asymptotic estimate from the reference above which I implemented:

The KDE tries to get the real distribution from the data without making the assumption that the data are normally distributed. The percentile, its standard error, and the KDE perform better with more samples, but that was always going to be a limitation when requesting a percentile.

@dgarrett622 Thanks for the reference, as you mentioned and also concluded by the paper, the limitation is the number of samples. I would like this information to be included in the user manual.

I do have some question about your implementation. In the code, you are using gaussian_kde which is based on gaussian kernels. In this case, it seems the two equations from the different papers are equivalent. Basically, you can transform your fitted PDF to standard normal distribution, and you will get the same equation. If this is the case, I prefer to use the standard normal distribution to compute the error. The reason for that is: you do not need to fit your data with gaussian_kde model for target variable. Especially when you have a larger sample size, the fitting will take more time to process.

Do I understand correctly then that you should only use the generic formula when you have a great many samples, which will take forever due to creating the kernel?

I do agree that generally creating the kernel is costly and we've often found ways to work around it in RAVEN for that reason. I did not check the time impact of the percentile_ste addition; we really need some timing tests in RAVEN.

@wangcj05 The two equations are not equivalent. Using a Gaussian kernel is not the same as fitting a Gaussian to the data. The Gaussian kernel is like smoothing or interpolating to fit a distribution to the real data. Here is an example of using a Gaussian kernel to find a distribution that is definitely not Gaussian:

https://gsalvatovallverdu.gitlab.io/python/kernel_density_estimation/

With the distribution fit to the data becoming:

The number of samples is an issue here, both in accuracy and computational cost. Determining the percentile itself requires many samples, especially if the requested percentile is in the tail of the distribution. The more samples you take, the more time it takes to compute the percentile since we do sorting on the data itself. The limitation in the paper applies to the calculation of the percentile itself and the standard error calculation, whether the standard normal formulation or the KDE method is used since it is based on number of samples. However, the standard normal formulation assumes that the random variable is distributed normally while the KDE method makes no assumption about the distribution.

An alternate method would be to use bootstrapped samples, but again there are concerns about computational time with that method.

dgarrett622 added 13 commits January 26, 2022 14:30

inherited collectOutput method has more comprehensive error/warning c…

b541c73

…hecking no need to overload

adding steVals from BasicStatistics to EconomicRatio

4f14d9c

update getInputSpecification

f8ec547

inherited inputToInternal is more comprehensive, use it

9be219a

update initialize method to include _ste metadata and all BasicStatis…

f3a8b91

…tics metrics

updating _handleInput to allow BasicStatistics metrics through to Eco…

f3d5d40

…nomicRatio and not throw warnings

updating handling of targets and run method

67547d9

update __runLocal to handle vectorVals in BasicStatistics

2abf634

__computePower copy/pasted from BasicStatistics change to _computePow…

49cf5d6

…er and inherit

adding standard error calculation of percentiles

6f0b5a2

adding threshold name to variable names percentile, value at risk, so…

f8cac8a

…rtinoRatio, gainLossRatio, and expectedShortfall

adding standard error calculation of value at risk similar to percent…

387791d

…iles

update user manual to include requesting BasicStatistics metrics from…

c676d0f

… EconomicRatio

dgarrett622 requested review from PaulTalbot-INL and wangcj05 February 3, 2022 18:04

dgarrett622 self-assigned this Feb 3, 2022

PaulTalbot-INL requested changes Feb 3, 2022

View reviewed changes

dgarrett622 added 4 commits February 3, 2022 12:17

removing empty list as optional argument, replaced with None

199977a

adding reference for standard error calculations

063b945

adding docstring to constructor

8f0fe30

renaming tealVals to econVals since they don't come from TEAL

9338dde

dgarrett622 added 6 commits February 4, 2022 09:32

adding defensive check where all values are equal

927597e

update percentile/VaR standard error for data with pivotParameter

7bd6ce1

fixing implementation with DataSet and golding files from tests

141f3b0

remove misleading descriptions using teal

f47a21b

fixing implementation for DataSet output

d374df4

fixing DataSet implementation

5a57e42

dgarrett622 added 2 commits February 8, 2022 10:39

adding tests for EconomicRatio PostProcessor

d7b6b3e

adding additional samples to test

563bfda

dgarrett622 requested a review from PaulTalbot-INL February 9, 2022 01:19

PaulTalbot-INL approved these changes Feb 9, 2022

View reviewed changes

PaulTalbot-INL linked an issue Feb 9, 2022 that may be closed by this pull request

[TASK] Request metrics/statistics from BasicStatistics when using EconomicRatio #1752

Closed

10 tasks

PaulTalbot-INL mentioned this pull request Feb 9, 2022

[TASK] Request metrics/statistics from BasicStatistics when using EconomicRatio #1752

Closed

10 tasks

PaulTalbot-INL merged commit a8f779f into idaholab:devel Feb 9, 2022

wangcj05 added the RAVENv2.2 for RAVENv2.2 Release label Feb 9, 2022

wangcj05 reviewed Feb 16, 2022

View reviewed changes

dgarrett622 deleted the EconomicRatio_enhance branch April 19, 2022 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EconomicRatio PostProcessor Enhancements #1763

EconomicRatio PostProcessor Enhancements #1763

dgarrett622 commented Feb 3, 2022 •

edited by PaulTalbot-INL

Loading

dgarrett622 commented Feb 3, 2022

PaulTalbot-INL left a comment

PaulTalbot-INL commented Feb 3, 2022

PaulTalbot-INL commented Feb 3, 2022

PaulTalbot-INL commented Feb 9, 2022

dgarrett622 commented Feb 9, 2022

wangcj05 commented Feb 9, 2022

wangcj05 Feb 16, 2022

dgarrett622 Feb 16, 2022

wangcj05 Feb 16, 2022

PaulTalbot-INL Feb 16, 2022

dgarrett622 Feb 16, 2022

EconomicRatio PostProcessor Enhancements #1763

EconomicRatio PostProcessor Enhancements #1763

Conversation

dgarrett622 commented Feb 3, 2022 • edited by PaulTalbot-INL Loading

Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

What are the significant changes in functionality due to this change request?

For Change Control Board: Change Request Review

dgarrett622 commented Feb 3, 2022

PaulTalbot-INL left a comment

Choose a reason for hiding this comment

PaulTalbot-INL commented Feb 3, 2022

PaulTalbot-INL commented Feb 3, 2022

PaulTalbot-INL commented Feb 9, 2022

dgarrett622 commented Feb 9, 2022

wangcj05 commented Feb 9, 2022

wangcj05 Feb 16, 2022

Choose a reason for hiding this comment

dgarrett622 Feb 16, 2022

Choose a reason for hiding this comment

wangcj05 Feb 16, 2022

Choose a reason for hiding this comment

PaulTalbot-INL Feb 16, 2022

Choose a reason for hiding this comment

dgarrett622 Feb 16, 2022

Choose a reason for hiding this comment

dgarrett622 commented Feb 3, 2022 •

edited by PaulTalbot-INL

Loading