Adding Stanza option to TTR, Entity Grid #13

brucewlee · 2021-03-28T01:18:37Z

Hello. First of all, thanks for making the amazing repo available.

I made the necessary changes to TTR and Entity Grid files, adding the Stanza option.
The major issue to tackle was the two models' different architectures and how one could access the token class.
I handled the differences in such architectures.

Quite notably, there's now an option in TTR and Entity Grid methods to choose Stanza by passing in (model="Stanza").
The model default is set to spaCy.

The examples are available in stanza_test.py.
spaCy and Stanza results differ by a slight margin.

dpalmasan · 2021-03-28T17:43:09Z

src/TRUNAJOD/entity_grid.py

@@ -19,7 +19,7 @@
 change this.
 """

-SPACY_UNIVERSAL_NOUN_TAGS = set([u'NOUN', u'PRON', u'PROPN'])
+UNIVERSAL_NOUN_TAGS = set([u'NOUN', u'PRON', u'PROPN'])


dpalmasan

We might need to update unit tests as there are ones failing ( Let me know if I can help with this), the rest looks good! Thanks for the PR!

dpalmasan · 2021-03-28T17:52:03Z

src/TRUNAJOD/entity_grid.py

@@ -63,7 +63,7 @@ class EntityGrid(object):
 module. It only supports 2-transitions entity grid.
 """

- def __init__(self, doc):
+ def __init__(self, doc, model="spacy"):


I think to make easier further improvements to internals, we might create an extra module with the constants, something like:

from enum import Enum class SupportedModels(str, Enum): SPACY = "spacy" STANZA = "stanza"

Then in all instances we could use the enum. For example, the constructor arguments could be:

def __init__(self, doc, model_name="spacy"): # The following might do error checking on model name, will raise if not in Enum model = SupportedModels(model_name) # Checks could be done as if model == SupportedModels.SPACY:

In that manner, all instances could be replaced by something like model=SupportedModels

alright, I'll look into it

dpalmasan · 2021-03-28T17:52:52Z

stanza_test.py

@@ -0,0 +1,43 @@
+from src.TRUNAJOD.entity_grid import EntityGrid


brucewlee · 2021-03-29T09:50:16Z

I made a pull request and some summary:

TTR tests were failing due to the keyword and positional argument conflicts. This could be easily resolved by adding a positional argument to the testing function. The used NLP model still defaults to spaCy. On the user side, they won't have to think about the positional arguments unless they change the TTR segment value (0.72).
Thanks for your suggestion on using the enum class. It simplifies a lot of internal matter. I implemented the function in utils.py. Other modules would then be able to call the class each time model verification is needed.
I changed the file name from "stanza_test.py" to "stanza_example.py" in the root directory.

dpalmasan · 2021-03-29T13:57:23Z

Hello @brucewlee , this looks pretty good. Could I ask just one last thing I forgot to mention previously. Could you install the pre-commit package (https://pre-commit.com/), so basically you follow these steps:

pip install pre-commit
In the root folder (where the .pre-commit-config.yaml file lives) run: pre-commit install
Then do a git commit --amend (this will just repush the last commit and will run pre-commit hooks)
Check if there is any pre-commit hook failing

Thanks in advance!

brucewlee · 2021-04-03T02:37:47Z

Hello @brucewlee , this looks pretty good. Could I ask just one last thing I forgot to mention previously. Could you install the pre-commit package (https://pre-commit.com/), so basically you follow these steps:

pip install pre-commit

In the root folder (where the .pre-commit-config.yaml file lives) run: pre-commit install

Then do a git commit --amend (this will just repush the last commit and will run pre-commit hooks)

Check if there is any pre-commit hook failing

Thanks in advance!

Yup, I've tried this but I'm not sure what it does. Nothing seems to be failing nor changing.

dpalmasan · 2021-04-03T11:54:07Z

Okay, then we are good to go I guess. The pre-commit hooks are a set of checks that are run before committing, they just do some checks like running flake8, check import order, running the code formatter black to ensure code standardization.
If they are all passing, then we are ready to merge, thanks for all the help!

dpalmasan · 2021-04-03T11:58:32Z

I merged! thanks!

brucewlee · 2021-04-06T05:08:38Z

NP :)

Stanza option added to TTR, Entity Grid

8710035

dpalmasan assigned dpalmasan and brucewlee and unassigned dpalmasan Mar 28, 2021

dpalmasan self-requested a review March 28, 2021 17:42

dpalmasan reviewed Mar 28, 2021

View reviewed changes

test conflict resolve and enum implementation

a911578

dpalmasan merged commit cf655a6 into dpalmasan:master Apr 3, 2021

dpalmasan linked an issue Apr 7, 2021 that may be closed by this pull request

Migrate models built in Spacy to use Stanford models #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Stanza option to TTR, Entity Grid #13

Adding Stanza option to TTR, Entity Grid #13

brucewlee commented Mar 28, 2021

dpalmasan Mar 28, 2021

dpalmasan left a comment •

edited

Loading

dpalmasan Mar 28, 2021

brucewlee Mar 28, 2021

dpalmasan Mar 28, 2021

brucewlee commented Mar 29, 2021

dpalmasan commented Mar 29, 2021

brucewlee commented Apr 3, 2021 •

edited

Loading

dpalmasan commented Apr 3, 2021 •

edited

Loading

dpalmasan commented Apr 3, 2021

brucewlee commented Apr 6, 2021

		@@ -0,0 +1,43 @@
		from src.TRUNAJOD.entity_grid import EntityGrid

Adding Stanza option to TTR, Entity Grid #13

Adding Stanza option to TTR, Entity Grid #13

Conversation

brucewlee commented Mar 28, 2021

dpalmasan Mar 28, 2021

Choose a reason for hiding this comment

dpalmasan left a comment • edited Loading

Choose a reason for hiding this comment

dpalmasan Mar 28, 2021

Choose a reason for hiding this comment

brucewlee Mar 28, 2021

Choose a reason for hiding this comment

dpalmasan Mar 28, 2021

Choose a reason for hiding this comment

brucewlee commented Mar 29, 2021

dpalmasan commented Mar 29, 2021

brucewlee commented Apr 3, 2021 • edited Loading

dpalmasan commented Apr 3, 2021 • edited Loading

dpalmasan commented Apr 3, 2021

brucewlee commented Apr 6, 2021

dpalmasan left a comment •

edited

Loading

brucewlee commented Apr 3, 2021 •

edited

Loading

dpalmasan commented Apr 3, 2021 •

edited

Loading