-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The replication job for couchdb is stucked #5094
Comments
Can you clarify step 5? I assume you meant that document "penny-003" did not appear in couchdb-A? Further questions;
|
Hi @rnewson , thanks a lot for your quick reply! Yes the "penny-003" did not appear in couchdb-A. 1.I used UI to config the replications and did not keep the response. What I remembered is 0 Documents written and null pending. (I can try to re-produce that and copy the new result here for you later) 2.No. 3.I did not use the filter on replication and I just follow the UI instructions to use a source and target. 4.I have some more discoveries depending on the .ini files that might cause the replication to fail. This is the special case for data syncing , I carried out some tests on replicating data from local to local (with another dummy name). For example, we have a "sw360db" which is a small database, and a "sw360changelogs" which is a large database. The replication job from local existing "sw360db" database to local new "penny1" database succeed. The replication job from local existing "sw360changelogs" database to local new "penny2" database failed. (0 documents written and null pending) I'm suspicious of the [replicator] configurations and carrying out some tests to prove my assumptions.
Steps: 1.Create the one-time replication job ![]() 2.Check the replication job status ![]()
![]() I got this error when creating the replication job. Request: curl --request POST Response: { |
Ok, so it sounds like the replication simply crashes at the start, which explains the state of the databases, we don't get as far as knowing which documents on the source need replicating to the target. I don't see anything in your configuration that explains the issue. I see you are using https but your configuration did not include the settings to enable that natively in couchdb. is there something else providing https or did you just omit those settings from the configuration above? Can you test independently of the replicator? e.g, with ApacheBench, something like |
Hi @rnewson ,thanks for your suggestions. Actually we created the DNS pointing to the ip address in a separate place on cloud configuration panel. I tried use the ApacheBench to see the results ,here attached the response(70007). Actually for other database the replication were quite fast and not sure if the setup .ini params affects the efficiency. I used local way to set replication job but also failed.
For sw360changelogs Test Results
And I know the replication might be difficult for big databases and could we declare a small limit under [replicator] label and make it faster for us to get each partition for the whole changes, is it supported by setting this in [replicator] for .ini files now? Thank you very much! |
This does not sound like a replicator problem. Your couchdb is not contactable at all, focus on testing connectivity with curl and apachebench, figure out why that is not working first, and then the replication problem will also be solved. There is no "local way" for replication, internally the try |
Hi @rnewson ,thanks a lot for the hints! ![]() ![]() The simple .ini file that works.
|
you say 4 minutes to respond, is that accurate? I would expect retrieving the entire changes response body will take time, but that is not a problem (the replicator processes it as a stream, no matter how long it is). the req timedout error is about whether the response even starts, not finishes. Try curl again and add |
I wonder if it's the |
Thanks for pointing that, I'll do some compare tests then. |
hm ok, one tip, you really should remove |
sidebar: we should exclude _changes response from that setting, or perhaps remove it entirely. |
Ok thanks a lot! Let me try the re-setup without the param. |
sorry I didn't notice it sooner, but thanks to @nickva for spotting it. |
Description
[NOTE]: # The replication job for couchdb is stucked.
Steps to Reproduce
source couchdb-A : a version 3.3.3 cluster with 3 nodes on RHEL 9 vms
source couchdb-B : a version 3.3.3 cluster with 3 nodes on RHEL 9 vms
1.Create a database in the source couchdb-A named "penny" and added two documents naming {"_id":"penny-001"} and {"_id":"penny-002"}.
2.Config a one-time replication job in target couchdb-B named "penny" and run the pulling job replicating form couchdb-A, the 2 documents are located into"penny" in couchdb-B very quickly.
3.Delete the previous replication job.
4.Added one more document into "penny" in couchdb-B named {"_id":"penny-003"}.
5.Reverse the replication order, make couchdb-B to be source, make couchdb-A to be target. Config a one-time replication job from source couchdb-B "penny" to target couchdb-A "penny", the data does not locate in couchdb-A "penny" even after waiting a very long time.
Your Environment
I am testing in RHEL 9 vms and very puzzled by this issue for the replication failure. And it seems something might be wrong in source "couchdb-A" or target "couchdb-B". I'm not sure whether we could do the replication from each order or we should only follow one order to sync data, but I think the official document said it should only follows the _changes api to get data so that we should be able to sync as long as we could get the response from _changes.
One discovery from my side (for couchdb-B) :
https://username:[email protected]/penny/_changes?feed=longpoll&filter=CustomFilters/RelCompLicFilter&limit=2000&since=0
{
"error": "not_found",
"reason": "missing"
}
This api is failed so I think this could be the root cause for syncing data from B to A, but I do not know how to fix it. Hope you could give us some valuable suggestions. Thank you very much!
The ini files we used is, does some of the labels affects the replication behavior ?
The text was updated successfully, but these errors were encountered: