Changing blocklist policy for Sequence stages #524

michaelweilsalesforce · 2020-10-26T21:31:54Z

Related issues
Example : If a SequenceEstimator/Transformer with input features Seq(f1, f2, f3) has a f1 as a blocklist, then, because Seq(f2, f3) and Seq(f1, f2, f3) don't have the same size, the SequenceEstimator/Transformer will be removed when updating the DAG.
However those sequence stages should ignore if one or more original inputs are missing.

Describe the proposed solution
When updating the DAG, sequence stages with updated inputs with length different than 0 will be kept.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
This problem was addressed when we witnessed a result feature being part of the blocklist after updating the DAG. We acknowledge the possibility to change the policy in RawFeatureFilter.

salesforce-cla · 2020-10-26T21:32:00Z

Thanks for the contribution! It looks like @mweilsalesforce is an internal user so signing the CLA is not required. However, we need to confirm this.

codecov · 2020-10-26T21:56:14Z

Codecov Report

❗ No coverage uploaded for pull request base (master@13ad9cd). Click here to learn what that means.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master     #524   +/-   ##
=========================================
  Coverage          ?   86.73%           
=========================================
  Files             ?      347           
  Lines             ?    11961           
  Branches          ?      630           
=========================================
  Hits              ?    10374           
  Misses            ?     1587           
  Partials          ?        0

Impacted Files	Coverage Δ
.../src/main/scala/com/salesforce/op/OpWorkflow.scala	`88.59% <100.00%> (ø)`
...sforce/op/stages/base/unary/UnaryTransformer.scala	`100.00% <0.00%> (ø)`
...op/stages/impl/tuning/OpTrainValidationSplit.scala	`100.00% <0.00%> (ø)`
...ala/com/salesforce/op/test/TempDirectoryTest.scala	`82.00% <0.00%> (ø)`
...esforce/op/stages/impl/CheckIsResponseValues.scala	`75.00% <0.00%> (ø)`
...ala/com/salesforce/op/utils/io/csv/CSVToAvro.scala	`87.87% <0.00%> (ø)`
...ala/com/salesforce/op/testkit/RandomIntegral.scala	`100.00% <0.00%> (ø)`
...com/salesforce/op/test/TestOpWorkflowBuilder.scala	`100.00% <0.00%> (ø)`
.../salesforce/op/stages/impl/tuning/DataCutter.scala	`97.22% <0.00%> (ø)`
...com/salesforce/op/testkit/ProbabilityOfEmpty.scala	`100.00% <0.00%> (ø)`
... and 338 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13ad9cd...8c2ab5e. Read the comment docs.

tovbinm · 2020-10-26T22:01:32Z

core/src/main/scala/com/salesforce/op/OpWorkflow.scala

@@ -149,7 +150,21 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore {
 }
 val inputsChanged = blocklistRemoved.map{ f => allUpdated.find(u => u.sameOrigin(f)).getOrElse(f) }
 val oldOutput = stg.getOutput()
- Try(stg.setInputFeatureArray(inputsChanged).setOutputFeatureName(oldOutput.name).getOutput()) match {
+ Try(stg match {
+ case s: SequenceEstimator[_, _] if (inputsChanged.size > 0) => {


please create a helper function

tovbinm · 2020-10-26T22:01:54Z

core/src/main/scala/com/salesforce/op/OpWorkflow.scala

@@ -149,7 +150,21 @@ class OpWorkflow(val uid: String = UID[OpWorkflow]) extends OpWorkflowCore {
 }
 val inputsChanged = blocklistRemoved.map{ f => allUpdated.find(u => u.sameOrigin(f)).getOrElse(f) }
 val oldOutput = stg.getOutput()
- Try(stg.setInputFeatureArray(inputsChanged).setOutputFeatureName(oldOutput.name).getOutput()) match {
+ Try(stg match {
+ case s: SequenceEstimator[_, _] if (inputsChanged.size > 0) => {


inputsChanged.size > 0 = inputsChanged.nonEmpty

michaelweilsalesforce · 2020-10-26T22:14:29Z

My code still need some improvement.
However do you folks agree with this change?

michaelweilsalesforce · 2020-10-29T20:35:09Z

@nicodv @Jauntbox Please review. Thanks.

leahmcguire · 2020-10-30T18:56:05Z

core/src/main/scala/com/salesforce/op/OpWorkflow.scala

+ case s @ (_ : SequenceEstimator[_, _] | _ : SequenceTransformer[_, _]) if !inputsChanged.isEmpty => {
+ s.set(s.inputFeatures, inputsChanged.map(TransientFeature(_))).setOutputFeatureName(oldOutput.name)
+ // Resetting Metadata
+ s.setMetadata(new MetadataBuilder().build())


you cannot destroy the metadata on a sequence estimator like this - better call the set input and then call

val newMeta = s.getMetadata()
s.setMetadata(newMeta)

That doesn't fix my issue. When setting the input a second time, it does not remove the metadata related to features that were blocklisted. This is a problem for stages that rely on column metadata E.g SanityChecker

if the problem is with a specific stage then I would suggest that you fix the getMetadata call to that stage so that it will recreate the metadata with the new input features taken into account - doing this will break everything that relies on metadata to work for example all the model info and feature importance caluculations

nicodv

Looks reasonable, but I don't have good knowledge of the consequences of changing the metadata.

nicodv · 2020-10-30T22:46:44Z

core/src/main/scala/com/salesforce/op/OpWorkflow.scala

- Try(stg.setInputFeatureArray(inputsChanged).setOutputFeatureName(oldOutput.name).getOutput()) match {
+ Try(stg match {
+ // For Sequence stages, we still want to keep them and remove the blocklisted inputs
+ case s @ (_ : SequenceEstimator[_, _] | _ : SequenceTransformer[_, _]) if !inputsChanged.isEmpty => {


!inputsChanged.isEmpty => inputsChanged.nonEmpty

leahmcguire

Setting the metadata to empty is not the answer - you need to fix the metadata creation on the stage that is not working correctly

michaelweilsalesforce · 2020-11-18T20:24:12Z

Weird. Not able to reproduce the bug anymore. It might have been fixed. Closing PR

Changing blocklist policy for Sequence stages

2dafc2a

michaelweilsalesforce requested review from gerashegalov, leahmcguire, nicodv and tovbinm as code owners October 26, 2020 21:31

salesforce-cla bot added the cla:missing label Oct 26, 2020

Merge branch 'master' into mw/SeRFF

1325913

tovbinm reviewed Oct 26, 2020

View reviewed changes

michaelweilsalesforce added the question label Oct 26, 2020

mweilsalesforce and others added 2 commits October 27, 2020 15:19

Changing pattern matching and conditions

0a3e312

Merge branch 'master' into mw/SeRFF

6b9b7e3

michaelweilsalesforce added ready for review and removed question labels Oct 28, 2020

leahmcguire reviewed Oct 30, 2020

View reviewed changes

nicodv reviewed Oct 30, 2020

View reviewed changes

nonEmpty instead

8b35172

leahmcguire requested changes Nov 9, 2020

View reviewed changes

Merge branch 'master' into mw/SeRFF

8c2ab5e

michaelweilsalesforce requested a review from Jauntbox as a code owner November 18, 2020 18:33

michaelweilsalesforce closed this Nov 18, 2020

tovbinm deleted the mw/SeRFF branch January 18, 2021 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing blocklist policy for Sequence stages #524

Changing blocklist policy for Sequence stages #524

michaelweilsalesforce commented Oct 26, 2020 •

edited

Loading

salesforce-cla bot commented Oct 26, 2020

codecov bot commented Oct 26, 2020 •

edited

Loading

tovbinm Oct 26, 2020

tovbinm Oct 26, 2020

michaelweilsalesforce commented Oct 26, 2020

michaelweilsalesforce commented Oct 29, 2020

leahmcguire Oct 30, 2020 •

edited

Loading

michaelweilsalesforce Oct 30, 2020

leahmcguire Nov 9, 2020

nicodv left a comment

nicodv Oct 30, 2020

leahmcguire left a comment

michaelweilsalesforce commented Nov 18, 2020

Changing blocklist policy for Sequence stages #524

Changing blocklist policy for Sequence stages #524

Conversation

michaelweilsalesforce commented Oct 26, 2020 • edited Loading

salesforce-cla bot commented Oct 26, 2020

codecov bot commented Oct 26, 2020 • edited Loading

Codecov Report

tovbinm Oct 26, 2020

Choose a reason for hiding this comment

tovbinm Oct 26, 2020

Choose a reason for hiding this comment

michaelweilsalesforce commented Oct 26, 2020

michaelweilsalesforce commented Oct 29, 2020

leahmcguire Oct 30, 2020 • edited Loading

Choose a reason for hiding this comment

michaelweilsalesforce Oct 30, 2020

Choose a reason for hiding this comment

leahmcguire Nov 9, 2020

Choose a reason for hiding this comment

nicodv left a comment

Choose a reason for hiding this comment

nicodv Oct 30, 2020

Choose a reason for hiding this comment

leahmcguire left a comment

Choose a reason for hiding this comment

michaelweilsalesforce commented Nov 18, 2020

michaelweilsalesforce commented Oct 26, 2020 •

edited

Loading

codecov bot commented Oct 26, 2020 •

edited

Loading

leahmcguire Oct 30, 2020 •

edited

Loading