-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Component.from_text
not capturing all parts of text
#139
Comments
It just comes down to the lexicon. The text is parsed in a very naive way, and it's up to the user to compile an appropriate lexicon for their task. That said, I think the default splitter 'with' should prevent components getting mixed like this. So that is a bug. The other thing here is that 'marl' is not in the default lexicon, but 'mrl' is (as an abbreviation). If we compile a more comprehensize list for the 'lithology' part of the default lexicon, it's trivial to add it. So that could be an enhancement. |
This method on
Component
seems to work fine in some cases but not always, here is an example:sample0
yields:while
sample1
yields:The text was updated successfully, but these errors were encountered: