Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing failure on cl layers with a single character value #372

Closed
adam-collins opened this issue Apr 16, 2020 · 12 comments
Closed

Indexing failure on cl layers with a single character value #372

adam-collins opened this issue Apr 16, 2020 · 12 comments
Assignees

Comments

@adam-collins
Copy link
Contributor

Indexing is failing for contextual layer values with a single character.

For example, the value "0" with layer AtlasOfLivingAustralia/layers-service#116

@ansell
Copy link
Contributor

ansell commented Apr 16, 2020

There is nothing inherently special about that 0. I don't mind working around it by changing it to a longer description label before loading if that would avoid having to track down an issue in biocache-store.

@adam-collins
Copy link
Contributor Author

This is also a problem for single character miscPropertiesColumn values and queryAssertionColumn values.

Fixing for miscPropertiesColumn may make the index slightly larger. Estimating as <1% larger.

Fixing for queryAssertionColumn should have no impact because it looks like it is not in use.

Fixing for cl will add up to 15 new values for up to 6 existing cl layers, including the referenced fire layer.

@adam-collins
Copy link
Contributor Author

My preference is to reindex with #373

An alternative is to edit the fire shapefile, do a full resample, then reindex.

@ansell
Copy link
Contributor

ansell commented Apr 16, 2020

I would also prefer merging and deploying the biocache store fix as it will also patch up the other cl layers you refer to that have been silently broken so far. I will merge, release, deploy, and reindex it today.

If it has unintended effects on existing indexed fields, the automatic swap collection jenkins job should pick it up and fail to swap.

@ansell
Copy link
Contributor

ansell commented Apr 16, 2020

On the index size concern, we have a buffer on disk of about 200GB on each node, which is effectively about 70GB for each of the three copies, so we should not expect to have issues there from some new small fields/values.

ansell added a commit that referenced this issue Apr 16, 2020
…dexing_of_single_character_cl_values

#372 Fix indexing of single character values within embedded JSON cas…
@ansell
Copy link
Contributor

ansell commented Apr 16, 2020

Released in biocache-store-2.4.7, deployed, and running complete reindex now:

http:https://aws-scjenkins.ala:9193/job/Complete%20Indexing/job/MASTER%20-%20Complete%20Re-index/628/

@ansell
Copy link
Contributor

ansell commented Apr 16, 2020

Running the complete reindex crashed zookeeper and solr, this was the error on one node http:https://aws-scjenkins.ala:9193/job/Complete%20Indexing/job/Complete%20Re-index/1917/console :

aws-bstore-4b 2020-04-16 18:05:54,898 INFO : [IndexRunner] - FINAL >>> cassandraTime(s)=1903, processingTime[8](s)=130919, solrTime[2](s)=23978, totalTime(s)=16322, index docs committed/in ram/ram MB=11842500/41250/477, mem free(Mb)=4321, mem total(Mb)=12288, queues (processing/lucene docs/commit batch) 487/0/1
aws-bstore-4b 2020-04-16 18:05:55,189 INFO : [IndexRunner] - Total indexing time for this thread 276.8357 minutes. Records indexed: 23898824
aws-bstore-4b 2020-04-16 18:11:44,034 INFO : [IndexLocalNode] - Indexing completed in 282.68463 minutes
aws-bstore-4b 2020-04-16 18:11:45,148 INFO : [IndexLocalNode] - Writing 133 new fields into updated schema: /data/solr/solr-create/biocache/conf/schema.xml
aws-bstore-4b 2020-04-16 18:11:45,315 INFO : [SolrIndexDAO] - Initialising the solr server aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181 cloudserver:null solrServer:null
aws-bstore-4b 2020-04-16 18:11:56,054 ERROR: [IndexLocalNode] - failed to add new fields into SOLR: aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181 within 10000 ms
	at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:183)
	at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:117)
	at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:107)
	at org.apache.solr.common.cloud.ZkStateReader.<init>(ZkStateReader.java:226)
	at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:131)
	at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:631)
	at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1084)
	at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1073)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
	at org.apache.solr.client.solrj.SolrClient.ping(SolrClient.java:926)
	at au.org.ala.biocache.index.SolrIndexDAO.init(SolrIndexDAO.scala:130)
	at au.org.ala.biocache.index.SolrIndexDAO.addFieldToSolr(SolrIndexDAO.scala:212)
	at au.org.ala.biocache.index.IndexLocalNode$$anonfun$importAdditionalFieldsToSOLR$1.apply(IndexLocalNode.scala:210)
	at au.org.ala.biocache.index.IndexLocalNode$$anonfun$importAdditionalFieldsToSOLR$1.apply(IndexLocalNode.scala:208)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at au.org.ala.biocache.index.IndexLocalNode.importAdditionalFieldsToSOLR(IndexLocalNode.scala:208)
	at au.org.ala.biocache.index.IndexLocalNode.indexRecords(IndexLocalNode.scala:126)
	at au.org.ala.biocache.tool.IndexLocalRecordsV2$.main(IndexLocalRecordsV2.scala:88)
	at au.org.ala.biocache.cmd.CMD2$.main(CMD2.scala:130)
	at au.org.ala.biocache.cmd.CMD2.main(CMD2.scala)
Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181 within 10000 ms
	at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:233)
	at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:175)
	... 20 more
aws-bstore-4b 2020-04-16 18:11:56,059 INFO : [IndexLocalNode] - Indexing complete. Records indexed: 23898824

This is the start of the errors on another node, which didn't make it to the "Indexing complete." message like the other node did http:https://aws-scjenkins.ala:9193/job/Complete%20Indexing/job/Complete%20Re-index/1915/consoleFull:

aws-bstore-2b 2020-04-16 17:33:56,280 INFO : [IndexRunner] - FINAL >>> cassandraTime(s)=1449, processingTime[8](s)=115388, solrTime[2](s)=21135, totalTime(s)=14402, index docs committed/in ram/ram MB=10266250/102500/1027, mem free(Mb)=3296, mem total(Mb)=12288, queues (processing/lucene docs/commit batch) 381/0/1
aws-bstore-2b 2020-04-16 17:34:13,881 INFO : [IndexRunner] - Total indexing time for this thread 245.13452 minutes. Records indexed: 20683431
aws-bstore-2b 2020-04-16 17:34:57,896 INFO : [IndexLocalNode] - Indexing completed in 245.90057 minutes
aws-bstore-2b 2020-04-16 17:34:58,667 INFO : [IndexLocalNode] - Writing 201 new fields into updated schema: /data/solr/solr-create/biocache/conf/schema.xml
aws-bstore-2b 2020-04-16 17:34:58,900 INFO : [SolrIndexDAO] - Initialising the solr server aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181 cloudserver:null solrServer:null
aws-bstore-2b 2020-04-16 17:34:59,031 ERROR: [CloudSolrClient] - Request to collection biocache failed due to (404) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:https://aws-sc3b.ala:8983/solr/biocache: No such path /schema/fields/_test_diary_27__number_infested, retry? 0
aws-bstore-2b 2020-04-16 17:34:59,033 INFO : [SolrIndexDAO] - Field not in schema: _test_diary_27__number_infested
aws-bstore-2b 2020-04-16 17:34:59,038 INFO : [SolrIndexDAO] - Adding field: _test_diary_27__number_infested
aws-bstore-2b 2020-04-16 17:35:03,605 ERROR: [CloudSolrClient] - Request to collection biocache failed due to (404) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:https://aws-sc1b.ala:8983/solr/biocache: No such path /schema/fields/_abundance, retry? 0
aws-bstore-2b 2020-04-16 17:35:03,606 INFO : [SolrIndexDAO] - Field not in schema: _abundance
aws-bstore-2b 2020-04-16 17:35:03,606 INFO : [SolrIndexDAO] - Adding field: _abundance
aws-bstore-2b 2020-04-16 17:35:06,096 ERROR: [CloudSolrClient] - Request to collection biocache failed due to (404) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:https://aws-sc3b.ala:8983/solr/biocache: No such path /schema/fields/_test_diary_9_step_number, retry? 0

Have put a complete shutdown on the jenkins queue until we can look into it again tomorrow. If the fix isn't simple, we can revert to the previous version until it is fixed, although it was previously using a snapshot, so it isn't easy to reliably do that other than manually switching symlinks.

@adam-collins
Copy link
Contributor Author

This error, the 404, is normal behaviour when testing if a field is in the live SOLR index.

Unfortunately after adding 58 new fields into the SOLR index, SOLR stopped responding.

@adam-collins
Copy link
Contributor Author

Batch field additions appear to resolve this issue. I am still working on this.

While not necessary to function, the Jenkins job Create Solr Schema and Collection needs an update so that it uses the SOLR collection biocache conf instead of bstore[0] conf as the new collection schema. This is because each bstore updates the SOLR collection biocache conf during indexing and each bstore conf may differ. This should fall back to bstore[0] conf when the SOLR collection biocache conf is unavailable, e.g. indexing on an empty SOLR instance.

@ansell
Copy link
Contributor

ansell commented Apr 20, 2020

The issue for documenting (and possibly fixing) the wide array of locations that the solr schema is managed in and is copied to is being tracked in #275

It needs work, but may be superceded by whatever the mechanism is for the infrastructure upgrade project solr indexing step which does live indexing rather than offline indexing+copy+zookeeper registration of the collection.

@ansell
Copy link
Contributor

ansell commented May 13, 2020

The crash while updating fields occurred again during the reindex this morning:

aws-bstore-2b 2020-05-14 07:46:10,480 INFO : [Cassandra3PersistenceManager] - All threads have completed paging
aws-bstore-2b 2020-05-14 07:46:10,508 INFO : [IndexRunner] - FINAL >>> cassandraTime(s)=1703, processingTime[8](s)=112736, solrTime[2](s)=20377, totalTime(s)=13892, index docs committed/in ram/ram MB=10258750/55000/615, mem free(Mb)=4593, mem total(Mb)=12288, queues (processing/lucene docs/commit batch) 333/0/94
aws-bstore-2b 2020-05-14 07:46:10,851 INFO : [IndexRunner] - Total indexing time for this thread 238.42458 minutes. Records indexed: 20711667
aws-bstore-2b 2020-05-14 07:47:17,555 INFO : [IndexLocalNode] - Indexing completed in 239.57076 minutes
aws-bstore-2b 2020-05-14 07:47:18,415 INFO : [IndexLocalNode] - Writing 380 new fields into updated schema: /data/solr/solr-create/biocache/conf/schema.xml
aws-bstore-2b 2020-05-14 07:47:18,587 INFO : [SolrIndexDAO] - Initialising the solr server aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181 cloudserver:null solrServer:null
aws-bstore-2b 2020-05-14 07:47:18,704 ERROR: [CloudSolrClient] - Request to collection biocache failed due to (404) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:https://aws-sc2b.ala:8983/solr/biocache: No such path /schema/fields/_test_diary_5_test_number_mouldy, retry? 0

@adam-collins
Copy link
Contributor Author

@ansell ansell closed this as completed Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants