-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate name detection into SmartTextVectorizer #508
Conversation
…onoids instead of custom accumulators
… Still need to fix HLL serialization in Spark issue
…ix printing bug for NameDetectStats
…stead of moments of text length; Still need to fix no moments higher than the 1st being calculated
…Cleaned up test code
Right now, this is just compiling and passing tests so far. I still have to give it a pass to clean things up before it's ready for a real review |
Codecov Report
@@ Coverage Diff @@
## master #508 +/- ##
==========================================
+ Coverage 79.15% 86.73% +7.57%
==========================================
Files 347 347
Lines 11851 11897 +46
Branches 384 602 +218
==========================================
+ Hits 9381 10319 +938
+ Misses 2470 1578 -892
Continue to review full report at Codecov.
|
// val valueStats: Dataset[Array[TextStats]] = dataset.map( | ||
// _.map(TextStats.computeTextStats(_, shouldCleanText, shouldTokenizeForLengths, maxCard)).toArray | ||
// ) | ||
// val aggregatedStats: Array[TextStats] = valueStats.reduce(_ + _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor stuff - otherwise looks good
import com.salesforce.op.utils.spark.{OpVectorColumnMetadata, OpVectorMetadata} | ||
>>>>>>> 4d461814d1791ef7e6dd14b262cf6a8a303ffcde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix merge conflict
Related issues
Re-opening of #456 on branch directly on TransmogrifAI, please see that PR for historical discussion (I'll close once this one gets merged in).
Describe the proposed solution
Adds parameters for ignoring fields identified as personal names to SmartTextVectorizer
Describe alternatives you've considered
N/A
Additional context
N/A