-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release candidate 123256 - 3 #2465
Commits on Oct 24, 2019
-
Avoid churning replication jobs if there is enough room to run pendin…
…g jobs When rescheduling jobs, make sure to stops existing job as much as needed to make room for the pending jobs.
Configuration menu - View commit details
-
Copy full SHA for 1c2646a - Browse repository at this point
Copy the full SHA 1c2646aView commit details -
Merge pull request #23 from cloudant/release-candidate-122026
Release candidate 122026
Configuration menu - View commit details
-
Copy full SHA for ab14f20 - Browse repository at this point
Copy the full SHA ab14f20View commit details
Commits on Nov 8, 2019
-
Return detailed replication stats for running and pending jobs
Previously `_scheduled/docs` returned detailed replication statistics for completed jobs only. To get the same level of details from a running or pending jobs users had to use `_active_tasks`, which is not optimal and required jumping between monitoring endpoints. `info` field was originally meant to hold these statistics but they were not implemented and it just returned `null` as a placeholder. With work for 3.0 finalizing, this might be a good time to add this improvement to avoid disturbing the API afterwards. Just updating the `_scheduler/docs` was not quite enough since, replications started from the `_replicate` endpoint would not be visible there and users would still have to access `_active_tasks` to get inspect them, so let's add the `info` field to the `_scheduler/jobs` as well. After this update, all states and status details from `_active_tasks` and `_replicator` docs should be available under `_scheduler/jobs` and `_scheduler/docs` endpoints.
Configuration menu - View commit details
-
Copy full SHA for eaa2447 - Browse repository at this point
Copy the full SHA eaa2447View commit details -
Configuration menu - View commit details
-
Copy full SHA for bbabde2 - Browse repository at this point
Copy the full SHA bbabde2View commit details -
Do not mark replication jobs as failed if doc processor crashes
Previously if couch_replicator_doc_processor crashed, the job was marked as "failed". We now ignore that case. It's safe to do that since supervisor will restart it anyway, and it will rescan all the docs again. Most of all, we want to prevent the job becoming failed permanently and needing a manual intervention to restart it.
Configuration menu - View commit details
-
Copy full SHA for 5e3c208 - Browse repository at this point
Copy the full SHA 5e3c208View commit details -
Merge pull request #24 from cloudant/release-candidate-122285
Release candidate 122285
Configuration menu - View commit details
-
Copy full SHA for 08f3008 - Browse repository at this point
Copy the full SHA 08f3008View commit details
Commits on Nov 9, 2019
-
* Detect dreyfus/hastings correctly
Configuration menu - View commit details
-
Copy full SHA for 7a22691 - Browse repository at this point
Copy the full SHA 7a22691View commit details
Commits on Nov 10, 2019
-
Merge pull request #25 from cloudant/release-candidate-122285-2
Release candidate 122285-2
Configuration menu - View commit details
-
Copy full SHA for 4c87c30 - Browse repository at this point
Copy the full SHA 4c87c30View commit details
Commits on Nov 11, 2019
-
export get_servers_from_env/1 for ken
Also remove the tests to detect that background index building didn't happen, cause it does now.
Configuration menu - View commit details
-
Copy full SHA for e8c2992 - Browse repository at this point
Copy the full SHA e8c2992View commit details -
Merge pull request #26 from cloudant/release-candidate-122285-3
Release candidate 122285-3
Configuration menu - View commit details
-
Copy full SHA for 3e63d84 - Browse repository at this point
Copy the full SHA 3e63d84View commit details
Commits on Nov 22, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 9864868 - Browse repository at this point
Copy the full SHA 9864868View commit details -
Merge pull request #27 from cloudant/release-candidate-122519
Release candidate 122519
Configuration menu - View commit details
-
Copy full SHA for 6dc33db - Browse repository at this point
Copy the full SHA 6dc33dbView commit details
Commits on Jan 9, 2020
-
Fix missing mango execution stats (part 1)
Adds message handlers to mango / all_docs / mrview fabric to recieve an execution_stats message.
Configuration menu - View commit details
-
Copy full SHA for 2ccfa79 - Browse repository at this point
Copy the full SHA 2ccfa79View commit details -
Merge pull request #28 from cloudant/release-candidate-123256
Fix missing mango execution stats (part 1)
Configuration menu - View commit details
-
Copy full SHA for 491b913 - Browse repository at this point
Copy the full SHA 491b913View commit details
Commits on Jan 10, 2020
-
Use separate requests to write design when replicating
Design doc writes could fail on the target when replicating with non-admin credentials. Typically the replicator will skip over them and bump the `doc_write_failures` counter. However, that relies on the POST request returning a `200 OK` response. If the authentication scheme is implemented such that the whole request fails if some docs don't have enough permission to be written, then the replication job ends up crashing with an ugly exception and gets stuck retrying forever. In order to accomodate that scanario write _design docs in their separate requests just like we write attachments. Fixes: apache#2415
Configuration menu - View commit details
-
Copy full SHA for c97e88e - Browse repository at this point
Copy the full SHA c97e88eView commit details
Commits on Jan 17, 2020
-
Improve replicator error reporting
Previously many HTTP requests failed noisily with `function_clause` errors. Expect some of those failures and handle them better. There are mainly 3 types of improvements: 1) Error messages are shorter. Instead of `function_clause` with a cryptic internal fun names, return a simple marker like `bulk_docs_failed` 2) Include the error body if it was returned. HTTP failures besides the error code may contain useful information in the body to help debug the failure. 3) Do not log or include the stack trace in the message. The error names are enough to identify the place were they are generated so avoid spamming the user and the logs with them. This is done by using `{shutdown, Error}` tuples to bubble up the error the replication scheduler. There is a small but related cleanup of removing source and target monitors since we'd want to handle those error better however those errors are never triggered since we removed local replication endpoints recently. Fixes: apache#2413
Configuration menu - View commit details
-
Copy full SHA for 0388994 - Browse repository at this point
Copy the full SHA 0388994View commit details -
Properly account for replication stats when splitting bulk docs batches
Previously if batch of bulk docs had to be bisected in order to fit a lower max request size limit on the target, we only counted stats for the second batch. So it was possibly we might have missed some `doc_write_failures` updates which can be perceived as a data loss to the customer. So we use the handy-dandy `sum_stats/2` function to sum the return stats from both batches and return that. Issue: apache#2414
Configuration menu - View commit details
-
Copy full SHA for f506ba2 - Browse repository at this point
Copy the full SHA f506ba2View commit details -
Preserve replication job stats when jobs are re-created
Previously we made sure replication job statistics were preserved when the jobs were started and stopped by the scheduler. However, if a db node restarted or user re-created the job, replication stats would be reset to 0. Some statistics like `docs_read` and `docs_written` are perhaps not as critical. However `doc_write_failures` is. That is the indicator that some replication docs have not replicated to the target. Not preserving that statistic meant users could perceive there was a data loss during replication -- data was replicated successfully according to the replication job with no write failures, user deletes source database, then some times later noticed some of their data is missing. These statistics were already logged in the checkpoint history and we just had to initialize a stats object from them when a replication job starts. In that initialization code we pick the highest values from either the running scheduler or the checkpointed log. The reason is that the running stats could be higher if say job was stopped suddenly and failed to checkpoint but scheduler retained the data. Fixes: apache#2414
Configuration menu - View commit details
-
Copy full SHA for 6db8b57 - Browse repository at this point
Copy the full SHA 6db8b57View commit details -
Fix fabric worker failures for partition requests
Previously any failed node or rexi worker error resulted in requests failing immediately even though there were available workers to keep handling the request. This was because the progress check function didn't account for the fact that partition requests only use a handful of shards which, by design, do not complete the full ring. Here we fix both partition info queries and dreyfus search functionality. We follow the pattern from fabric and pass through a set of "ring options" that let the progress function know it is dealing with partitions instead of a full ring.
Configuration menu - View commit details
-
Copy full SHA for 881e0e0 - Browse repository at this point
Copy the full SHA 881e0e0View commit details