-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added check for multipicklist other class as not being binary in metadata #127
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just two very minor things
@@ -75,7 +75,8 @@ class OpVectorMetadata private | |||
newColumns: Array[OpVectorColumnMetadata] | |||
): OpVectorMetadata = OpVectorMetadata(name, newColumns, history) | |||
|
|||
val categoricalTypes = Seq(FeatureType.typeName[Binary], FeatureType.typeName[BinaryMap]) | |||
private val categoricalTypes = Seq(FeatureType.typeName[Binary], FeatureType.typeName[BinaryMap]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we calling this categoricalTypes
, when many of the actual categorical types (eg. PickList) are not included? Maybe binaryTypes
instead?
@@ -155,6 +161,7 @@ object OpVectorColumnMetadata { | |||
val DescriptorValueKey = "descriptor_value" | |||
val IndicesKey = "indices" | |||
val NullString = "NullIndicatorValue" | |||
val OtherString = "OTHER" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string is also defined TransmogrifierDefaults
, can we make it so it's only defined in one place? I think it has to live here since it's in utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the TransmogrifierDefaults to point to this
Codecov Report
@@ Coverage Diff @@
## master #127 +/- ##
==========================================
+ Coverage 86.15% 86.18% +0.02%
==========================================
Files 298 298
Lines 9672 9676 +4
Branches 340 542 +202
==========================================
+ Hits 8333 8339 +6
+ Misses 1339 1337 -2
Continue to review full report at Codecov.
|
@leahmcguire should we add in |
@@ -42,8 +43,11 @@ trait AttributeAsserts { | |||
* @param schema | |||
* @param expectedNominal Expected array of booleans. True if the field is nominal, false if not. | |||
*/ | |||
final def assertNominal(schema: StructField, expectedNominal: Array[Boolean]): Assertion = { | |||
final def assertNominal(schema: StructField, expectedNominal: Array[Boolean], output: Array[OPVector]): Assertion = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update docs for the assertNominal
function
val attributes = AttributeGroup.fromStructField(schema).attributes | ||
output.foreach{ | ||
x => x.value.toArray.zip(expectedNominal).foreach{ case (v , n) => if (n) (v == 0.0 || v == 1.0) shouldBe true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add a check when n == false
? is so, then better:
for {
x <- output
(value, nominal) <- x.value.toArray.zip(expectedNominal)
} if (nominal) value should (be (0.0) or be (1.0)) else ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are no constraints on if n is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it can be NaN, -/+Inf as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, some minor comments
Related issues
creation of spark attribute groups was mislabeling multipicklist other column as binary. this made tree models fail
Describe the proposed solution
multipicklist other is now a numeric not binary