Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postprocessor plugin APIs #1512

Merged
merged 30 commits into from
May 4, 2021
Merged

Postprocessor plugin APIs #1512

merged 30 commits into from
May 4, 2021

Conversation

wangcj05
Copy link
Collaborator

@wangcj05 wangcj05 commented Apr 13, 2021


Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

see #1486
Closes #1451

What are the significant changes in functionality due to this change request?

Standardize the APIs for Postprocessor Plugin.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

@wangcj05
Copy link
Collaborator Author

@alfoa @mandd @PaulTalbot-INL I have changed the PostProcessor Plugin base class inheritance. Please take a look, any comments are appreciated.

@PaulTalbot-INL
Copy link
Collaborator

I think this is a good approach.

Note we don't have an automatic system for loading the plugins from outside RAVEN into the factories in RAVEN; I started on it but have not been able to finish that yet.

@ In, paramInput, InputData.ParameterInput, the already parsed input.
@ Out, None
"""
super()._handleInput(paramInput)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only thought from the plugin side is the somewhat confusing usage of the term "input".

In handleInput it means "handle input XML from the RAVEN input file" whereas in createNewInput it means "prepare the input for a single postprocessor run". I wonder how (or even if) we could approach that to make it more clear for plugin developers.

@@ -175,6 +175,8 @@ def createNewInput(self,myInput,samplerType,**kwargs):
a mandatory key is the sampledVars'that contains a dictionary {'name variable':value}
@ Out, myInput, list, the inputs (list) to start from to generate the new one
"""
if 'createNewInput' in dir(self._pp):
myInput = self._pp.createNewInput(myInput,samplerType,**kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove the "samplerType" (I know it was present already)? Postprocesessors do not use sampleType or we are thinking here to extend the possibility to have PP behave differently depending on the Sampler that generated the data? (e.g. Sortino -> Sobol, OtherSamplers -> least square of HDMR coefficients, etc,)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering also Paul's comment, I suggest to change the function name "createNewInput(myInput,samplerType,**kwargs)" to something like "processInput(myInput, **kwargs)". The "samplerType" will be removed.

Copy link
Collaborator

@PaulTalbot-INL PaulTalbot-INL Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could clarify which input, like "createPostprocessorInput" or "createModelInput"; I feel like the term "input" is the confusing one; or maybe even "prepareModelInput"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use "createPostprocessorInput" and "samplerType" is removed.

self.raiseAnError(IOError, "Only PointSet is allowed as classifier, but HistorySet", inputObject.name, "is provided!")
requiredKeys = list(self.mapping.keys()) + [self.label]
for inputDict in currentInput:
print(inputDict['type'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

understandable by this pp.
@ In, currentInput, list, a list of DataObjects
@ Out, newInput, list, list of converted data
Method to identify the inputs for classifier and target, respectively
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand the description here? What "identify" means?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@ In, currentInput, list, a list of DataObjects
@ Out, newInput, list, list of converted data
Method to identify the inputs for classifier and target, respectively
@ In, currentInput, list, a list of dictionaries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of dictionaries...what is in these? This defines the API (e.g. {'data':array, 'dims':list(dimensions), etc}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this comment, there are few things as confusing as trying to interrogate a dictionary for what should be there. I wonder if we could explicitly list the contents as args and kwargs instead of as a summary input object for the interface objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

return newInput

def run(self, inputIn):
"""
This method executes the postprocessor action.
@ In, inputIn, list, list of DataObjects
@ In, inputIn, list, list of input dictionaries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define what those dictionaries look like (what do we have inside?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. dims, type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -39,6 +39,7 @@
from .ParetoFrontierPostProcessor import ParetoFrontier
from .MCSimporter import MCSImporter
from .EconomicRatio import EconomicRatio
from .RiskMeasuresDiscrete import RiskMeasuresDiscrete
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this "double" import (both in the __init__ and Factory.py).

See my PR about the validation gate, the double import is not needed as long as the imports are in the Factory.py. This is valid for all the Factories that have been "Restructured" with PR #1480

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know, we can do a quick clean up in another PR, since I do not see the double imports get removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but we can remove it from here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the __init__ got cleaned up in previous PR.

# Set default to 'dict', this is consistent with current post-processors
outType = kwargs.get('outType', 'dict')
inputDs.append(inp.asDataset(outType=outType))
elif isinstance(inp, Database):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these checks should be moved at the validation stage in the Models.py (I thought we already have it. Doesn't it work?). Databases are not possible for PP and they should be checked for all of them at once

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

dataDict['outVars'] = self.getVars('output')
dataDict['numberRealization'] = self.size
dataDict['name'] = self.name
dataDict['metaKeys'] = self.getVars('meta')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am conflicted here (I mean for the usage of this for PP). I do not think we should use dictionaries anymore in PP). The data should be provided as Dataset only. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can provide sufficiently clear examples for developers so that Datasets are easy for them to work with for the things they need to do; otherwise, I worry this could be a big barrier to external development, since the xarray datasets can be difficult until you've played with them a bit, at least for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave the option to use either Dict or Dataset. Currently, we only have BasicStatistics to use Dataset. With this option, we can identify the PP that we need to convert to use Dataset, and convert them one by one. Otherwise, we can not merge the PP changes without converting all PPs to use Dataset. In addition, I'm not sure it is worth to convert our PPs to use Dataset if there is no need from our customer.

<Models>
<ExternalModel name='PythonModuleReduced' subType='' ModuleToLoad='modelTHreduced'>
<variables>pump1Time,pump2Time,valveTime,Tmax,outcome,pump1State,pump2State,valveState,failureTime</variables>
</ExternalModel>
<PostProcessor name="riskMeasuresDiscrete" subType="InterfacedPostProcessor">
<PostProcessor name="riskMeasuresDiscrete" subType="RiskMeasuresDiscrete">
<method>riskMeasuresDiscrete</method>
Copy link
Contributor

@joshua-cogliati-inl joshua-cogliati-inl Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method line still needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not needed anymore. removed.

Base automatically changed from wangc/convert_pp_to_interface to devel April 22, 2021 19:42
@wangcj05
Copy link
Collaborator Author

@alfoa This is ready for you to review. Several things to mention:

  1. I have split the post-processor documents (not for all the postprocessors, but I think we should. If you agree, I can do it in another PR)
  2. RavenOut PP is still exist in the manual, I think we do not have the RavenOut PP, and we should remove it, in the codes, tests, and documents.

@wangcj05 wangcj05 added RAVENv2.1 All tasks and defects that will go in RAVEN v2.1 and removed Do Not Merge labels Apr 23, 2021
@wangcj05
Copy link
Collaborator Author

@alfoa Any comments for this PR?

@moosebuild
Copy link

Job Test qsubs on b7266a0 : invalidated by @alfoa

@alfoa
Copy link
Collaborator

alfoa commented May 4, 2021

Approved.

Yes the RavenOut should be removed.

@moosebuild
Copy link

Job Test OpenSUSE Leap 15 on b7266a0 : invalidated by @wangcj05

@alfoa alfoa mentioned this pull request May 4, 2021
10 tasks
@alfoa
Copy link
Collaborator

alfoa commented May 4, 2021

Checklist passed...
Test OpenSUSE Leap 15 has still IT issue (@joshua-cogliati-inl do we have an eta for this?) but the tests have been run locally.

PR can be merged.

@alfoa alfoa merged commit 10eafae into devel May 4, 2021
@alfoa alfoa deleted the wangc/pp_plugin_api_fy21_v3 branch May 4, 2021 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RAVENv2.1 All tasks and defects that will go in RAVEN v2.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TASK] PostProcessor Plugin
5 participants