-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 2396 #3089
Fix issue 2396 #3089
Conversation
Really nice first PR, thanks! I noticed some ugliness with the original code as well --- unfortunately the implementation appears to be duplicated in the |
True! I hadn't seen the duplicated implementation inside a) Separate the function and call it from both b) seems cleaner to me. But in case you'd prefer a), where would you place this auxiliary function? |
Delegating to a span of the whole document is a clever solution, but I think I probably prefer a helper function in Moving this out into a helper function also lets us fix that closure, which I always found a bit ugly about the previous implementation. So, we want the function to be something like:
We should keep it as a private function so that we don't have to worry about validating the start and end offsets within it. From the span, if we call |
…om Doc.get_lca_matrix and Span.get_lca_matrix
Done! I first tried to just define a helper I also removed that closure within Finally, I guess that both helpers could be further "cythonized" resorting to TokenC, but this would mean renouncing to Let me know what you think :) |
Looks great, thanks! I think keeping that helper in the |
Hm, the span test is failing, though. Is the test wrong or the code? |
…ay instead of memoryview
It was a mistake in the code, but it should be fine now. |
Sweet, merging. |
* Test on #2396: bug in Doc.get_lca_matrix() * reimplementation of Doc.get_lca_matrix(), (closes #2396) * reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix() * tests Span.get_lca_matrix() as well as Doc.get_lca_matrix() * implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix * use memory view instead of np.ndarray in _get_lca_matrix (faster) * fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview * cleaner conditional, add comment
This PR adds a test for an untested case of `Span.get_lca_matrix`, and fixes a bug for that scenario, which I introduced in [this PR](#3089) (sorry!). ## Description The previous implementation of get_lca_matrix was failing for the case `doc[j:k].get_lca_matrix()` where `j > 0`. A test has been added for this case and the bug has been fixed. ### Types of change Bug fix ## Checklist - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
Description
This PR fixes issue #2396:
Doc.get_lca_matrix
had a bug in this line which caused wrong outputsfor some pairs of tokens.
Types of change
Bug fix
Explanation of the bug
I borrow the following example from @slavi (#2396):
The above mentioned line is:
Here, for tokens 3 ("test") and 5 ("spacy"), we are calling
__pairwise_lca
recursively with their heads ("created" and "for" respectively). But in doing so, we are skiping "test", which is the actual lowest common ancestor.Solution
The reimplementation consists on this simple algorihtm. It works as expected and is 3 times faster.
Checklist