Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Stanza option to TTR, Entity Grid #13

Merged
merged 2 commits into from
Apr 3, 2021

Conversation

brucewlee
Copy link
Contributor

Hello. First of all, thanks for making the amazing repo available.

I made the necessary changes to TTR and Entity Grid files, adding the Stanza option.
The major issue to tackle was the two models' different architectures and how one could access the token class.
I handled the differences in such architectures.

Quite notably, there's now an option in TTR and Entity Grid methods to choose Stanza by passing in (model="Stanza").
The model default is set to spaCy.

The examples are available in stanza_test.py.
spaCy and Stanza results differ by a slight margin.

@dpalmasan dpalmasan assigned dpalmasan and brucewlee and unassigned dpalmasan Mar 28, 2021
@dpalmasan dpalmasan self-requested a review March 28, 2021 17:42
@@ -19,7 +19,7 @@
change this.
"""

SPACY_UNIVERSAL_NOUN_TAGS = set([u'NOUN', u'PRON', u'PROPN'])
UNIVERSAL_NOUN_TAGS = set([u'NOUN', u'PRON', u'PROPN'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Owner

@dpalmasan dpalmasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to update unit tests as there are ones failing ( Let me know if I can help with this), the rest looks good! Thanks for the PR!

@@ -63,7 +63,7 @@ class EntityGrid(object):
module. It only supports 2-transitions entity grid.
"""

def __init__(self, doc):
def __init__(self, doc, model="spacy"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to make easier further improvements to internals, we might create an extra module with the constants, something like:

from enum import Enum


class SupportedModels(str, Enum):
    SPACY = "spacy"
    STANZA = "stanza"

Then in all instances we could use the enum. For example, the constructor arguments could be:

def __init__(self, doc, model_name="spacy"):
    # The following might do error checking on model name, will raise if not in Enum
    model = SupportedModels(model_name)

    # Checks could be done as
    if model == SupportedModels.SPACY:

In that manner, all instances could be replaced by something like model=SupportedModels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, I'll look into it

stanza_test.py Outdated
@@ -0,0 +1,43 @@
from src.TRUNAJOD.entity_grid import EntityGrid
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@brucewlee
Copy link
Contributor Author

I made a pull request and some summary:

  1. TTR tests were failing due to the keyword and positional argument conflicts. This could be easily resolved by adding a positional argument to the testing function. The used NLP model still defaults to spaCy. On the user side, they won't have to think about the positional arguments unless they change the TTR segment value (0.72).

  2. Thanks for your suggestion on using the enum class. It simplifies a lot of internal matter. I implemented the function in utils.py. Other modules would then be able to call the class each time model verification is needed.

  3. I changed the file name from "stanza_test.py" to "stanza_example.py" in the root directory.

@dpalmasan
Copy link
Owner

Hello @brucewlee , this looks pretty good. Could I ask just one last thing I forgot to mention previously. Could you install the pre-commit package (https://pre-commit.com/), so basically you follow these steps:

  1. pip install pre-commit
  2. In the root folder (where the .pre-commit-config.yaml file lives) run: pre-commit install
  3. Then do a git commit --amend (this will just repush the last commit and will run pre-commit hooks)
  4. Check if there is any pre-commit hook failing

Thanks in advance!

@brucewlee
Copy link
Contributor Author

brucewlee commented Apr 3, 2021

Hello @brucewlee , this looks pretty good. Could I ask just one last thing I forgot to mention previously. Could you install the pre-commit package (https://pre-commit.com/), so basically you follow these steps:

  1. pip install pre-commit
  2. In the root folder (where the .pre-commit-config.yaml file lives) run: pre-commit install
  3. Then do a git commit --amend (this will just repush the last commit and will run pre-commit hooks)
  4. Check if there is any pre-commit hook failing

Thanks in advance!

Yup, I've tried this but I'm not sure what it does. Nothing seems to be failing nor changing.

@dpalmasan
Copy link
Owner

dpalmasan commented Apr 3, 2021

Okay, then we are good to go I guess. The pre-commit hooks are a set of checks that are run before committing, they just do some checks like running flake8, check import order, running the code formatter black to ensure code standardization.
If they are all passing, then we are ready to merge, thanks for all the help!

@dpalmasan dpalmasan merged commit cf655a6 into dpalmasan:master Apr 3, 2021
@dpalmasan
Copy link
Owner

I merged! thanks!

@brucewlee
Copy link
Contributor Author

NP :)

@dpalmasan dpalmasan linked an issue Apr 7, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate models built in Spacy to use Stanford models
2 participants