Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when rematching a large number of lists #304

Closed
adam-collins opened this issue May 8, 2024 · 2 comments
Closed

Failure when rematching a large number of lists #304

adam-collins opened this issue May 8, 2024 · 2 comments
Assignees

Comments

@adam-collins
Copy link
Contributor

When using the admin button rematch all, it can fail on smaller machines when there are a large number of lists. This should not happen. There is a workaround implemented but this workaround is not ideal as it involves restarting tomcat.

During a rematch all, existing name matches are first removed, then items are iterated, with batches sent to namematching service.

For this issue, make changes to rematch all.

  • Perform an update to existing matches only after a diff. This is instead of the removal of matches at the beginning of the rematch all.
  • Use smaller transaction boundaries and wait for the database flush to complete.
  • Enable parallel updates with a configurable number of threads.
  • Iterate over lists before individual list items. This is instead of iterating through list items only.
@hamzajaved-csiro hamzajaved-csiro self-assigned this May 28, 2024
@qifeng-bai
Copy link
Contributor

#288

@qifeng-bai
Copy link
Contributor

qifeng-bai commented Jun 5, 2024

Q: Iterate over lists before individual list items. This is instead of iterating through list items only.
A: nameExplorerService iterates over species items, not lists

Q:Perform an update to existing matches only after a diff. This is instead of the removal of matches at the beginning of the rematch all.
A: Got answer from Simon Sherrin and Mahmoud: the taxonConceptID is changed when we updated the Taxonomic Backbone,
yes.The general rule is - you can't rely on taxonConceptIDs to be persistent across updates to the Taxonomic Backbone. (BIE reindex)

In this case, the taxonConceptID will be changed anyway. When we are rematching, we don't have to do a diff, just need to update directly

@qifeng-bai qifeng-bai added this to the 5.2.0 milestone Jun 26, 2024
qifeng-bai added a commit that referenced this issue Jun 26, 2024
change logs:
#304 improve rematching performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants