Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid having to have an implicit SparkSession in OpWorkFlow(Model) #468

Merged
merged 1 commit into from
Mar 27, 2020

Conversation

nicodv
Copy link
Contributor

@nicodv nicodv commented Mar 26, 2020

Related issues
N/A

Describe the proposed solution
Avoid having to provide a SparkSession implicit to OpWorkflow.loadModel/OpWorkflowModel.save to keep backward compatibility.

Describe alternatives you've considered
N/A

Additional context
Small follow-up on #467

@codecov
Copy link

codecov bot commented Mar 26, 2020

Codecov Report

Merging #468 into master will increase coverage by 5.74%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #468      +/-   ##
==========================================
+ Coverage   81.25%   86.99%   +5.74%     
==========================================
  Files         345      345              
  Lines       11616    11614       -2     
  Branches      376      374       -2     
==========================================
+ Hits         9438    10104     +666     
+ Misses       2178     1510     -668     
Impacted Files Coverage Δ
.../src/main/scala/com/salesforce/op/OpWorkflow.scala 88.11% <100.00%> (ø)
...main/scala/com/salesforce/op/OpWorkflowModel.scala 93.90% <100.00%> (+14.63%) ⬆️
...cala/com/salesforce/op/OpWorkflowModelReader.scala 95.58% <100.00%> (+11.53%) ⬆️
...cala/com/salesforce/op/OpWorkflowModelWriter.scala 100.00% <100.00%> (ø)
...com/salesforce/op/features/types/FeatureType.scala 95.95% <0.00%> (+1.01%) ⬆️
...scala/com/salesforce/op/features/FeatureLike.scala 43.47% <0.00%> (+1.08%) ⬆️
.../op/features/types/FeatureTypeSparkConverter.scala 99.12% <0.00%> (+1.75%) ⬆️
...scala/com/salesforce/op/testkit/RandomStream.scala 100.00% <0.00%> (+2.38%) ⬆️
.../com/salesforce/op/local/MLeapModelConverter.scala 10.25% <0.00%> (+2.56%) ⬆️
.../scala/com/salesforce/op/dsl/RichTextFeature.scala 81.94% <0.00%> (+2.77%) ⬆️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da52ad9...6714e16. Read the comment docs.

@tovbinm tovbinm merged commit e8f8ec6 into master Mar 27, 2020
@tovbinm tovbinm deleted the ndv/jobgroups-2 branch March 27, 2020 03:26
case Failure(error) => throw new RuntimeException(s"Failed to load Workflow from path '$path'", error)
case Success(wf) => wf
implicit val spark: SparkSession = this.sparkSession
JobGroupUtil.withJobGroup(OpStep.ModelIO) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could just pass sparkSession explicitly instead of creating an implicit val.

* @param path path to save the model and its stages
* @param overwrite should overwrite the destination
*/
def save(model: OpWorkflowModel, path: String, overwrite: Boolean = true): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scope of this PR did not require it, we should not do breaking changes such as removing public methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that one! good catch @gerashegalov

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicodv please revert this part of the change.

Copy link
Contributor Author

@nicodv nicodv Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov / @tovbinm So we are OK with adding an implicit SparkSession to OpWorkflowModel.save() instead? (And having inconsistent approaches between model loading/saving.)

Copy link
Contributor

@gerashegalov gerashegalov Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicodv this is not the choice we have to make. We can make it consistent by just changing saveImpl in class OpWorkflowModelWriter

  override protected def saveImpl(path: String): Unit = {
    JobGroupUtil.withJobGroup(OpStep.ModelIO) {
      sc.parallelize(Seq(toJsonString(path)), 1)
        .saveAsTextFile(OpWorkflowModelReadWriteShared.jsonPath(path), classOf[GzipCodec])
    }(sparkSession)
  }

If you do this and undo the save method move we should be ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I overlooked the saveImpl method. New PR that reverts this: #470

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants