Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication job crashes #4023

Open
tudordumitriu opened this issue May 13, 2022 · 2 comments
Open

Replication job crashes #4023

tudordumitriu opened this issue May 13, 2022 · 2 comments

Comments

@tudordumitriu
Copy link

We are trying to migrated some DBs between 2 CouchDB servers and if for some of the dbs things run smoothly for some bigger ones (15k docs tops) the replication jobs stop and from the _scheduler/docs we can see only the following errors reported:
info: {error: "{worker_died,<0.1956.3>,{bad_return_value,{invalid_json,{1,invalid_json}}}}"}
error: "{worker_died,<0.1956.3>,{bad_return_value,{invalid_json,{1,invalid_json}}}}"

Description

The DBs are structurally the same but I'd like to find out what is the exact error or the document causing it.
We have also checked the server logs and the error reported is similar to the above one

  • CouchDB version used: 3.2.0
  • Browser name and version: Chrome
  • Operating system and version: Ubuntu, Docker, K8S, Azure AKS
@nickva
Copy link
Contributor

nickva commented May 13, 2022

@tudordumitriu a replication worker is one of the 4 (by default) processes spawned by each replication job. They perform GET /_revs_diff requests on the target to get the missing revisions, then a GET with open_revs to the source to fetch all the missing revisions, then, finally, a POST /_bulk_docs to the target to insert the docs. So it could be any of those requests which returned an invalid_json response.

invalid_json can often mean that the connection is abruptly terminated, for example if a rate limit is reached or the connection times out, maybe max size is reached and the response it terminated and so on. It's hard to say which one of those or other error happened without extra log or information. Would you be able to get more logs form the servers, or ideally capture the request/response bodies?

@tudordumitriu
Copy link
Author

Hi @nickva
Thanks for the answer, but the problem was not the connection but there was actually an invalid json document, from the server to server replication point of view, and I shall explain.
First, we did disable all ip rate limiters, firewalls and so on, but nothing got better.
So, if we did try to back up the db to a local db, the replication was working perfectly (the only difference was the url and credentials)
But if we were trying to replicate (by pull) from the target server, it was just stopping and maybe the document was logged but I have never managed to get to it.
Now, when tried to do the replication by push, from the source to target, I did notice the document in the logs.
What is strange is that document was created from an iOS platform (I suspect the iOS file paths) and was replicated from PouchDB to CouchDB and back to other PouchDB dbs (various platforms) without a problem (including the above mentioned local replication).
The file is attached, hope it helps.
Crash.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants