Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemory errors should fail biocache index operations #130

Open
ansell opened this issue May 23, 2016 · 3 comments
Open

OutOfMemory errors should fail biocache index operations #130

ansell opened this issue May 23, 2016 · 3 comments

Comments

@ansell
Copy link
Contributor

ansell commented May 23, 2016

When OutOfMemory errors occur during biocache index operations, they do not cause the operation to fail. Rather, the thread terminates and leaves the partial index in a potentially inconsistent state. The eventual result when all of the threads terminate or complete is that the OOM threads have a lock file created by solr that is not deleted, and which causes a deadlock when trying to acquire the lock. However, even if the lock file is ignored, the index is not complete and is likely to be inconsistent and the entire process would need to complete again.

The exception, and any other exceptions that are not able to be handled while creating a full index should terminate the index process early and not attempt to merge the inconsistent partial solr indexes.

An example of a stack trace for one case is:

Exception in thread "Thread-7" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
at java.lang.StringCoding.encode(StringCoding.java:344)
at java.lang.String.getBytes(String.java:916)
at org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(ContentStreamBase.java:170)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:162)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)
at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at au.org.ala.biocache.index.SolrIndexDAO.indexFromMap(SolrIndexDAO.scala:639)
at au.org.ala.biocache.index.IndexRunner$$anonfun$run$7.apply(IndexRecordMultiThreaded.scala:478)
at au.org.ala.biocache.index.IndexRunner$$anonfun$run$7.apply(IndexRecordMultiThreaded.scala:466)
at au.org.ala.biocache.persistence.CassandraPersistenceManager$$anonfun$pageOver$1.apply(CassandraPersistenceManager.scala:341)
at au.org.ala.biocache.persistence.CassandraPersistenceManager$$anonfun$pageOver$1.apply(CassandraPersistenceManager.scala:333)
at scala.collection.immutable.List.foreach(List.scala:318)
at au.org.ala.biocache.persistence.CassandraPersistenceManager.pageOver(CassandraPersistenceManager.scala:333)
at au.org.ala.biocache.persistence.CassandraPersistenceManager.pageOverAll(CassandraPersistenceManager.scala:470)
at au.org.ala.biocache.index.IndexRunner.run(IndexRecordMultiThreaded.scala:466)
at java.lang.Thread.run(Thread.java:745)
@ansell
Copy link
Contributor Author

ansell commented Nov 20, 2016

The same symptoms happen when Cassandra goes down, which happened on the weekend.

An example of a stack trace that indicates this issue has not yet been fixed is when the following occurs at the end of the full reindex when the merge is attempted. The write lock file being in place indicates that the thread crashed before completion and the index is in a potentially inconsistent state:

2016-11-19 23:11:07,576 INFO : [BulkProcessor] - Merging index segments
2016-11-19 23:11:07,576 INFO : [IndexMergeTool] - Merging to directory:  /data/biocache-reindex/solr/merged
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-0/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-1/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-2/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-3/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-4/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-5/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-6/data/index
Directory included in merge: /data/biocache-reindex/solr-create/biocache-thread-7/data/index
2016-11-19 23:11:07,606 INFO : [IndexMergeTool] - Adding indexes...
Exception in thread "main" org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/data2/biocache-reindex/solr-create/biocache-thread-1/data/index/write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:89)
    at org.apache.lucene.index.IndexWriter.acquireWriteLocks(IndexWriter.java:2472)
    at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2526)
    at au.org.ala.biocache.index.IndexMergeTool$.merge(BulkProcessor.scala:257)
    at au.org.ala.biocache.index.BulkProcessor$.main(BulkProcessor.scala:164)
    at au.org.ala.biocache.cmd.CMD2$.main(CMD2.scala:134)
    at au.org.ala.biocache.cmd.CMD2.main(CMD2.scala)

@ansell
Copy link
Contributor Author

ansell commented Feb 16, 2017

@ansell
Copy link
Contributor Author

ansell commented Mar 7, 2017

This error was occurring more frequently, within minutes of starting an index process, when using the latest version of biocache-store on cassandra-b4 for the Complete Reindex job. I have reverted to the August 2016 copy until this issue can be solved. The deployed version is tagged as last-known-stable in git:

https://github.com/AtlasOfLivingAustralia/biocache-store/tree/last-known-stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant