-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemory errors should fail biocache index operations #130
Comments
The same symptoms happen when Cassandra goes down, which happened on the weekend. An example of a stack trace that indicates this issue has not yet been fixed is when the following occurs at the end of the full reindex when the merge is attempted. The write lock file being in place indicates that the thread crashed before completion and the index is in a potentially inconsistent state:
|
This error was occurring more frequently, within minutes of starting an index process, when using the latest version of biocache-store on cassandra-b4 for the Complete Reindex job. I have reverted to the August 2016 copy until this issue can be solved. The deployed version is tagged as last-known-stable in git: https://github.com/AtlasOfLivingAustralia/biocache-store/tree/last-known-stable |
When OutOfMemory errors occur during biocache index operations, they do not cause the operation to fail. Rather, the thread terminates and leaves the partial index in a potentially inconsistent state. The eventual result when all of the threads terminate or complete is that the OOM threads have a lock file created by solr that is not deleted, and which causes a deadlock when trying to acquire the lock. However, even if the lock file is ignored, the index is not complete and is likely to be inconsistent and the entire process would need to complete again.
The exception, and any other exceptions that are not able to be handled while creating a full index should terminate the index process early and not attempt to merge the inconsistent partial solr indexes.
An example of a stack trace for one case is:
The text was updated successfully, but these errors were encountered: