Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression training limit #413

Merged
merged 19 commits into from
Oct 8, 2019
Merged
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
e4b8a92
refactored maxTrainingSample get and set function so that all classes…
AdamChit Sep 20, 2019
2170254
added downsampling logic if MaxTrainingSample reached
AdamChit Sep 20, 2019
dff09b9
added unit tests for downsampling in regression data splitter
AdamChit Sep 20, 2019
14c6b42
added integration tests to test downsampling from the model selector …
AdamChit Sep 20, 2019
722341b
style changes
AdamChit Oct 3, 2019
34d5bf1
changed the test to reduce run time
AdamChit Oct 3, 2019
8e2778d
Merge branch 'master' into achit/regression-training-limit
AdamChit Oct 3, 2019
433d483
test now checks all data splitter params
AdamChit Oct 4, 2019
0932810
Merge branch 'achit/regression-training-limit' of https://github.com/…
AdamChit Oct 4, 2019
2b02f8a
Update core/src/test/scala/com/salesforce/op/stages/impl/tuning/DataS…
AdamChit Oct 4, 2019
0521a37
Update core/src/test/scala/com/salesforce/op/stages/impl/regression/R…
AdamChit Oct 4, 2019
962e06f
added downSampleFraction default value and made style changes
AdamChit Oct 4, 2019
80a80d5
Merge branch 'achit/regression-training-limit' of https://github.com/…
AdamChit Oct 4, 2019
0ab4d9a
renamed test
AdamChit Oct 4, 2019
ef4327c
changed getDownSampleFraction to protected
AdamChit Oct 5, 2019
8ca0e78
name change based on RP comments
AdamChit Oct 5, 2019
8e67f27
added datacount to summary
AdamChit Oct 5, 2019
009706d
Trigger re-build
AdamChit Oct 7, 2019
cfbe22f
Trigger travis re-build
AdamChit Oct 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
changed the test to reduce run time
  • Loading branch information
AdamChit committed Oct 3, 2019
commit 34d5bf1fd50b851547883b1abbc84229da146a35
Original file line number Diff line number Diff line change
Expand Up @@ -134,11 +134,16 @@ class RegressionModelSelectorTest extends FlatSpec with TestSparkContext
implicit val e1 = Encoders.tuple(Encoders.scalaDouble, vectorEncoder)
val maxTrainingSample = 100
val sampleF = maxTrainingSample / dataCount.toDouble
val downSampleFraction = if (sampleF < 1) sampleF else 1
val downSampleFraction = if (maxTrainingSample < dataCount) sampleF else 1
AdamChit marked this conversation as resolved.
Show resolved Hide resolved
val dataSplitter = DataSplitter(maxTrainingSample = maxTrainingSample, seed = seed, reserveTestFraction = 0.0)
val modelSelector = RegressionModelSelector.withCrossValidation(Option(dataSplitter), seed = seed)
val modelSelector =
RegressionModelSelector.withTrainValidationSplit(
modelTypesToUse = Seq(RMT.OpLinearRegression),
dataSplitter = Option(dataSplitter),
seed = seed)
val model = modelSelector.setInput(label, features).fit(data)
val metaData = ModelSelectorSummary.fromMetadata(model.getMetadata().getSummaryMetadata())

val modelDownSampleFraction = metaData.dataPrepParameters("downSampleFraction" )

modelDownSampleFraction shouldBe downSampleFraction
Expand Down