-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"not_in_range" failure while resharding database #4624
Comments
Thanks for the detailed report @jcoglan. Would you be able to share few more logs starting a bit before the first error include a bit more after the ones showed if they have any stack function names and line numbers. Do you see anywhere in the logs the string In remsh what does the If possible share the full _dbs document |
I wonder if this happened: after the first split (to Q=16), we ran the internal replicator to top off changes from source to the targets. When doing so we also pushed purges to the new targets, however, in the internal replicator we didn't pick them by the hash function of target range, but copied them as is. So splitting couchdb/src/mem3/src/mem3_rep.erl Lines 317 to 344 in c75c31d
On next split couchdb/src/couch/src/couch_db_split.erl Lines 340 to 344 in c75c31d
T2 -> T21|T22 is the split configuration and purge info for DocID="a" doesn't belong to T2 to start with we'd get that exception we got above in the issue report.
To confirm this theory, the first step would be to double-check that the actual document IDs are on the shards they are supposed to be in. That is, we'd assert that the If this is the cause of the issue we saw above, then we could delete the extra purge info copies on the target shards then set |
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
This PR should fix the issue #4626 |
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes apache#4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
Previously, internal replicator (mem3_rep) replicated purge infos to/from all the target shards. Instead, it should push/pull changes only to appropriate ranges if those purge infos belong there based on database's hash function. Users experienced this error as a failure in database which contains purges, which was split twice in a row. For example, if a Q=8 database is split to Q=16, then split again from Q=16 to Q=32, the second split operation might fail with a `split_state:initial_copy ...{{badkey,not_in_range}` error. The misplaced purge infos would be noticed only during the second split, when the initial copy phase would crash because some purge infos do not hash to neither one of the two target ranges. Moreover, the crash would lead to repeated retries, which generated a huge job history log. The fix consists of three improvements: 1) Internal replicator is updated to filter purge infos based on the db hash. 2) Account for the fact that some users' dbs might already contain misplaced purge infos. Since it's a known bug, we anticipate that error and ignore misplaced purge info during the second shard split operation with a warning emitted in the logs. 3) Make similar range errors fatal, and emit a clear error in the logs and job history so any future range errors are immediately obvious. Fixes #4624
While attempting to shard-split a q=16 database on a 3-node cluster, we found that all reshard jobs failed, and
GET /_reshard/jobs
stopped responding to requests. The logs reveal anot_in_range
failure inmem3_reshard_job
.Description
We are attempting to reshard a database from q=8 to q=32, using the following script: https://gist.github.com/jcoglan/ad2b631664bc436c48e4274718a0acd6. This worked to get from q=8 to q=16, but failed the second step to get to q=32.
/_reshard
shows that all jobs failed:Also,
/_reshard/jobs
does not respond at all, the request hangs with no activity visible in the logs.We observed many messages like the following while the jobs were running:
The database's current shards are as follows:
And the database info looks like this:
Your Environment
The text was updated successfully, but these errors were encountered: