Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert back to Spark 2.3 #399

Merged
merged 36 commits into from
Sep 4, 2019
Merged

Revert back to Spark 2.3 #399

merged 36 commits into from
Sep 4, 2019

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Aug 30, 2019

Related issues
We are not ready for Spark 2.4 (#327)

Describe the proposed solution
Reverting to Spark 2.3 for now.
I will raise another PR with the 2.4 so we can have it ready to go once needed.

Describe alternatives you've considered
N/A

tovbinm and others added 30 commits May 30, 2019 13:48
… made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'
@codecov
Copy link

codecov bot commented Aug 30, 2019

Codecov Report

Merging #399 into master will decrease coverage by 12.02%.
The diff coverage is 92.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #399       +/-   ##
===========================================
- Coverage   86.89%   74.87%   -12.03%     
===========================================
  Files         337      337               
  Lines       11076    11054       -22     
  Branches      351      590      +239     
===========================================
- Hits         9625     8277     -1348     
- Misses       1451     2777     +1326
Impacted Files Coverage Δ
...sforce/op/stages/impl/selector/ModelSelector.scala 98.18% <ø> (ø) ⬆️
...sforce/op/stages/OpPipelineStageReaderWriter.scala 86.2% <ø> (ø) ⬆️
...sql/execution/datasources/csv/CSVSchemaUtils.scala 100% <ø> (ø) ⬆️
...la/com/salesforce/op/utils/spark/RichDataset.scala 87.09% <ø> (+0.94%) ⬆️
...m/salesforce/op/stages/OpPipelineStageReader.scala 59.09% <0%> (+2.56%) ⬆️
...ce/op/stages/impl/classification/OpLinearSVC.scala 77.27% <100%> (ø) ⬆️
...ges/sparkwrappers/specific/OpPredictionModel.scala 100% <100%> (ø) ⬆️
...ala/com/salesforce/op/utils/io/csv/CSVToAvro.scala 87.87% <100%> (ø) ⬆️
...ages/impl/regression/OpDecisionTreeRegressor.scala 53.84% <100%> (+3.84%) ⬆️
...rce/op/stages/impl/regression/OpGBTRegressor.scala 50% <100%> (-3.34%) ⬇️
... and 105 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51037a8...2f25962. Read the comment docs.

Copy link
Contributor

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tovbinm tovbinm merged commit 95a77b1 into master Sep 4, 2019
@tovbinm tovbinm deleted the mt/revert-spark-2.4 branch September 4, 2019 05:10
@tovbinm tovbinm restored the mt/revert-spark-2.4 branch September 4, 2019 05:10
tovbinm added a commit that referenced this pull request Sep 4, 2019
@gerashegalov gerashegalov mentioned this pull request Sep 8, 2019
gerashegalov added a commit that referenced this pull request Sep 11, 2019
Bug fixes:
- Ensure correct metrics despite model failures on some CV folds [#404](#404)
- Fix flaky `ModelInsight` tests [#395](#395)
- Avoid creating `SparseVector`s for LOCO [#377](#377)

New features / updates:
- Model combiner [#385](#399)
- Added new sample for HousingPrices [#365](#365)
- Test to verify that custom metrics appear in model insight metrics [#387](#387)
- Add `FeatureDistribution` to `SerializationFormat`s [#383](#383)
- Add metadata to `OpStandadrdScaler` to allow for descaling [#378](#378)
- Improve json serde error in `evalMetFromJson` [#380](#380)
- Track mean & standard deviation as metrics for numeric features and for text length of text features [#354](#354)
- Making model selectors robust to failing models [#372](#372)
- Use compact and compressed model json by default [#375](#375)
- Descale feature contribution for Linear Regression & Logistic Regression [#345](#345)

Dependency updates:   
- Update tika version [#382](#382)
@koertkuipers
Copy link

Related issues
We are not ready for Spark 2.4 (#327)

Describe the proposed solution
Reverting to Spark 2.3 for now.
I will raise another PR with the 2.4 so we can have it ready to go once needed.

Describe alternatives you've considered
N/A

curious to know why we are not ready for spark 2.4? i didnt observe any issues

@tovbinm
Copy link
Collaborator Author

tovbinm commented Sep 16, 2019

The main suite of products that use TransmogrifAI @ Salesforce requires Spark 2.3. Once they are ready to get upgrade we will move to 2.4.

@tovbinm tovbinm deleted the mt/revert-spark-2.4 branch September 16, 2019 17:28
nicodv added a commit that referenced this pull request Feb 19, 2020
* Revert "Revert back to Spark 2.3 (#399)"

This reverts commit 95a77b1.

* Update to Spark 2.4.3 and XGBoost 0.90

* special double serializer fix

* fix serialization

* fix serialization

* docs

* fixed missng value for test

* meta fix

* Updated DecisionTreeNumericMapBucketizer test to deal with the change made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'

* fix params meta test

* FIxed failing xgboost test

* ident

* cleanup

* added dataframe reader and writer extensions

* added const

* cherrypick fixes

* added xgboost params + update models to use public predict method

* blarg

* double ser test

* update mleap and spark testing base

* Update README.md

* type fix

* bump minor version

* Update Spark version in the README

* bump version

* Update build.gradle

* Update pom.xml

* set correct json4s version

* upgrade helloworld deps

* upgrade notebook deps on TMog and Spark

* bump to version 0.7.0 for Spark update

* align helloworld dependencies

* align helloworld dependencies

* get -> getOrElse with exception

* fix helloworld compilation

* Spark 2.4.5

* Spark 2.4.5

* Spark 2.4.5

* Update OpTitanicSimple.ipynb

* Update OpIris.ipynb

* Revert "Spark 2.4.5"

This reverts commit b3c0a74.

* Revert "Spark 2.4.5"

This reverts commit f4ab3fd.

* Revert "Spark 2.4.5"

This reverts commit 50d9dfb.

* Revert "Update OpTitanicSimple.ipynb"

This reverts commit 3417972.

* Revert "Update OpIris.ipynb"

This reverts commit df38bcc.

Co-authored-by: Christopher Suchanek <[email protected]>
Co-authored-by: Kevin Moore <[email protected]>
Co-authored-by: Nico de Vos <[email protected]>
nicodv added a commit that referenced this pull request Jun 11, 2020
* Revert "Revert back to Spark 2.3 (#399)"

This reverts commit 95a77b1.

* Update to Spark 2.4.3 and XGBoost 0.90

* special double serializer fix

* fix serialization

* fix serialization

* docs

* fixed missng value for test

* meta fix

* Updated DecisionTreeNumericMapBucketizer test to deal with the change made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b'

* fix params meta test

* FIxed failing xgboost test

* ident

* cleanup

* added dataframe reader and writer extensions

* added const

* cherrypick fixes

* added xgboost params + update models to use public predict method

* blarg

* double ser test

* update mleap and spark testing base

* Update README.md

* type fix

* bump minor version

* Update Spark version in the README

* bump version

* Update build.gradle

* Update pom.xml

* set correct json4s version

* upgrade helloworld deps

* upgrade notebook deps on TMog and Spark

* bump to version 0.7.0 for Spark update

* align helloworld dependencies

* align helloworld dependencies

* get -> getOrElse with exception

* fix helloworld compilation

* style

* WIP release notes

* TMog version bump

* update release notes

* update release notes

* updates to changelog

* updates to changelog

* updates to changelog

* updates to changelog

* updates to changelog

* updates to changelog

* fix changelog

* fix changelog

* keep helloworld on 0.6.1 until release

Co-authored-by: Matthew Tovbin <[email protected]>
Co-authored-by: Matthew Tovbin <[email protected]>
Co-authored-by: Christopher Suchanek <[email protected]>
Co-authored-by: Kevin Moore <[email protected]>
Co-authored-by: Matthew Tovbin <[email protected]>
@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement.

@salesforce-cla
Copy link

Thanks for the contribution! It looks like @Jauntbox is an internal user so signing the CLA is not required. However, we need to confirm this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants