-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid having to have an implicit SparkSession in OpWorkFlow(Model) #468
Conversation
… moving job grouping down to reader/writer
Codecov Report
@@ Coverage Diff @@
## master #468 +/- ##
==========================================
+ Coverage 81.25% 86.99% +5.74%
==========================================
Files 345 345
Lines 11616 11614 -2
Branches 376 374 -2
==========================================
+ Hits 9438 10104 +666
+ Misses 2178 1510 -668
Continue to review full report at Codecov.
|
case Failure(error) => throw new RuntimeException(s"Failed to load Workflow from path '$path'", error) | ||
case Success(wf) => wf | ||
implicit val spark: SparkSession = this.sparkSession | ||
JobGroupUtil.withJobGroup(OpStep.ModelIO) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could just pass sparkSession explicitly instead of creating an implicit val.
* @param path path to save the model and its stages | ||
* @param overwrite should overwrite the destination | ||
*/ | ||
def save(model: OpWorkflowModel, path: String, overwrite: Boolean = true): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the scope of this PR did not require it, we should not do breaking changes such as removing public methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that one! good catch @gerashegalov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicodv please revert this part of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gerashegalov / @tovbinm So we are OK with adding an implicit SparkSession to OpWorkflowModel.save()
instead? (And having inconsistent approaches between model loading/saving.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nicodv this is not the choice we have to make. We can make it consistent by just changing saveImpl
in class OpWorkflowModelWriter
override protected def saveImpl(path: String): Unit = {
JobGroupUtil.withJobGroup(OpStep.ModelIO) {
sc.parallelize(Seq(toJsonString(path)), 1)
.saveAsTextFile(OpWorkflowModelReadWriteShared.jsonPath(path), classOf[GzipCodec])
}(sparkSession)
}
If you do this and undo the save method move we should be ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I overlooked the saveImpl
method. New PR that reverts this: #470
Related issues
N/A
Describe the proposed solution
Avoid having to provide a
SparkSession
implicit toOpWorkflow.loadModel
/OpWorkflowModel.save
to keep backward compatibility.Describe alternatives you've considered
N/A
Additional context
Small follow-up on #467