-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing blocklist policy for Sequence stages #524
Conversation
Thanks for the contribution! It looks like @mweilsalesforce is an internal user so signing the CLA is not required. However, we need to confirm this. |
Codecov Report
@@ Coverage Diff @@
## master #524 +/- ##
=========================================
Coverage ? 86.73%
=========================================
Files ? 347
Lines ? 11961
Branches ? 630
=========================================
Hits ? 10374
Misses ? 1587
Partials ? 0
Continue to review full report at Codecov.
|
@@ -149,7 +150,21 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore { | |||
} | |||
val inputsChanged = blocklistRemoved.map{ f => allUpdated.find(u => u.sameOrigin(f)).getOrElse(f) } | |||
val oldOutput = stg.getOutput() | |||
Try(stg.setInputFeatureArray(inputsChanged).setOutputFeatureName(oldOutput.name).getOutput()) match { | |||
Try(stg match { | |||
case s: SequenceEstimator[_, _] if (inputsChanged.size > 0) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create a helper function
@@ -149,7 +150,21 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore { | |||
} | |||
val inputsChanged = blocklistRemoved.map{ f => allUpdated.find(u => u.sameOrigin(f)).getOrElse(f) } | |||
val oldOutput = stg.getOutput() | |||
Try(stg.setInputFeatureArray(inputsChanged).setOutputFeatureName(oldOutput.name).getOutput()) match { | |||
Try(stg match { | |||
case s: SequenceEstimator[_, _] if (inputsChanged.size > 0) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputsChanged.size > 0
= inputsChanged.nonEmpty
My code still need some improvement. |
case s @ (_ : SequenceEstimator[_, _] | _ : SequenceTransformer[_, _]) if !inputsChanged.isEmpty => { | ||
s.set(s.inputFeatures, inputsChanged.map(TransientFeature(_))).setOutputFeatureName(oldOutput.name) | ||
// Resetting Metadata | ||
s.setMetadata(new MetadataBuilder().build()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot destroy the metadata on a sequence estimator like this - better call the set input and then call
val newMeta = s.getMetadata()
s.setMetadata(newMeta)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't fix my issue. When setting the input a second time, it does not remove the metadata related to features that were blocklisted. This is a problem for stages that rely on column metadata E.g SanityChecker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the problem is with a specific stage then I would suggest that you fix the getMetadata call to that stage so that it will recreate the metadata with the new input features taken into account - doing this will break everything that relies on metadata to work for example all the model info and feature importance caluculations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, but I don't have good knowledge of the consequences of changing the metadata.
Try(stg.setInputFeatureArray(inputsChanged).setOutputFeatureName(oldOutput.name).getOutput()) match { | ||
Try(stg match { | ||
// For Sequence stages, we still want to keep them and remove the blocklisted inputs | ||
case s @ (_ : SequenceEstimator[_, _] | _ : SequenceTransformer[_, _]) if !inputsChanged.isEmpty => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!inputsChanged.isEmpty
=> inputsChanged.nonEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the metadata to empty is not the answer - you need to fix the metadata creation on the stage that is not working correctly
Weird. Not able to reproduce the bug anymore. It might have been fixed. Closing PR |
Related issues
Example : If a SequenceEstimator/Transformer with input features
Seq(f1, f2, f3)
has af1
as a blocklist, then, becauseSeq(f2, f3)
andSeq(f1, f2, f3)
don't have the same size, the SequenceEstimator/Transformer will be removed when updating the DAG.However those sequence stages should ignore if one or more original inputs are missing.
Describe the proposed solution
When updating the DAG, sequence stages with updated inputs with length different than 0 will be kept.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
This problem was addressed when we witnessed a result feature being part of the blocklist after updating the DAG. We acknowledge the possibility to change the policy in RawFeatureFilter.