-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added string transformers for substring search and valid email #265
Conversation
Codecov Report
@@ Coverage Diff @@
## master #265 +/- ##
==========================================
+ Coverage 86.61% 86.61% +<.01%
==========================================
Files 315 317 +2
Lines 10345 10375 +30
Branches 346 560 +214
==========================================
+ Hits 8960 8986 +26
- Misses 1385 1389 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #265 +/- ##
==========================================
+ Coverage 86.66% 86.67% +<.01%
==========================================
Files 315 317 +2
Lines 10396 10403 +7
Branches 344 557 +213
==========================================
+ Hits 9010 9017 +7
Misses 1386 1386
Continue to review full report at Codecov.
|
@@ -16,6 +16,7 @@ Use TransmogrifAI if you need a machine learning library to: | |||
To understand the motivation behind TransmogrifAI check out these: | |||
- [Open Sourcing TransmogrifAI: Automated Machine Learning for Structured Data](https://engineering.salesforce.com/open-sourcing-transmogrifai-4e5d0e098da2), a blog post by [@snabar](https://github.com/snabar) | |||
- [Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions](https://www.youtube.com/watch?v=93vsqjfGPCw&feature=youtu.be&t=2800), a talk by [@tovbinm](https://github.com/tovbinm) | |||
- [Low Touch Machine Learning](https://www.youtube.com/watch?v=PKTvo9X9Sjg), a talk by [@leahmcguire](https://github.com/leahmcguire) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
def toNGramSimilarity( | ||
that: FeatureLike[T], | ||
nGramSize: Int = NGramSimilarity.nGramSize, | ||
toLowerCase: Boolean = TextTokenizer.ToLowercase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the doc on the shortcut
class ValidEmailTransformer(uid: String = UID[ValidEmailTransformer]) extends | ||
UnaryTransformer[Email, Binary](operationName = "isValidEmail", uid = uid) { | ||
override def transformFn: Email => Binary = (in: Email) => { | ||
if (in.isEmpty) None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Binary.empty
- this would avoid necessary allocation, ie. if (in.isEmpty) Binary.empty else Some(in.prefix.nonEmpty && in.domain.nonEmpty).toBinary
@RunWith(classOf[JUnitRunner]) | ||
class ValidEmailTransformerTest extends OpTransformerSpec[Binary, ValidEmailTransformer] { | ||
|
||
val sample = Seq(Email("abc"), Email("a@b"), Email("a@"), Email("@blah"), Email.empty, Email("real@stuff")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real@stuff
does not seem to be a valid domain to me. how robust do we what this validation code to be? perhaps it's worth adding a proper email domain? (long awaited btw ;) - https://github.com/salesforce/TransmogrifAI/blob/master/features/src/main/scala/com/salesforce/op/features/types/Text.scala#L85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that you can actually have an email domain like that. The only way we could really validate is if we used a service that checked if the inbox exists....
No description provided.