Why do deleted documents appear in the return result of executing the _find command immediately? #5090

lpcy · 2024-06-16T10:39:05Z

First, execute the DELETE command to delete the document. Then immediately execute _find, and the deleted document appears in the result. But when we execute the _find later, it disappears.

Expected Behaviour

After the document is deleted, it immediately disappears from the return result of _find.

Your Environment

CouchDB version used: 3.3.3
Browser name and version: chrome 126
Operating system and version: windows 10

big-r81 · 2024-06-16T15:33:55Z

Do you have a minimal working example, like a script to reproduce this?

rnewson · 2024-06-16T16:56:44Z

This is possible if N>1 (i.e, you have a cluster, not a standalone single node) for a period of time. Once the DELETE has happened at all N nodes subsequent queries (assuming you didn't specify update=false or stale=ok) will not return that document. There can be a period where a DELETE is completed (i.e, you get a 200 OK response) but one or more nodes have not yet processed it, a _find at that time might get a response from one of those nodes (all queries are inherently R=1, they read just one of the copies).

lpcy · 2024-06-17T03:35:38Z

This is possible if N>1 (i.e, you have a cluster, not a standalone single node) for a period of time. Once the DELETE has happened at all N nodes subsequent queries (assuming you didn't specify update=false or stale=ok) will not return that document. There can be a period where a DELETE is completed (i.e, you get a 200 OK response) but one or more nodes have not yet processed it, a _find at that time might get a response from one of those nodes (all queries are inherently R=1, they read just one of the copies).

Thank you, I'm sorry but I realized that I made an error: delete in the code is asynchronous.
At the same time, there is a new question: Is the tombstone information always retained? Is there any way to clean it up? It seems that Google's method is to synchronize to a new database while excluding deleted documents, which seems cumbersome. Currently, I am using a single node.

lpcy · 2024-06-17T03:38:23Z

Do you have a minimal working example, like a script to reproduce this?

I rewrote the script and found it to be working properly. Sorry, it's my problem: delete is asynchronous.

rnewson · 2024-06-17T07:27:22Z

"Tombstone" is a loose term, more precisely it is a document with the deleted flag set to true, and may contain other data. They are preserved forever, just as non-deleted documents are, to ensure that replication works correctly. You can replicate with a filter to drop them (or any other subset of documents) as long as you're aware of that consequence.

Delete is not asynchronous (any more than doc create or update is), I'm referring to the way we only wait for the first 2 of the total 3 responses in a 3 or more node cluster, which seems not to apply in your case.

If this is a single node setup then your opening comment is a bit more interesting. when the DELETE response is returned the document has been marked as deleted, and so any subsequent request should reflect that, including indexes (_view, _find, etc). Are you querying with stale=ok or update=false parameters? Assuming not, how long is the delay between the deleted document appearing in results after deletion and it finally being gone?

lpcy · 2024-06-17T08:47:54Z

"Tombstone" is a loose term, more precisely it is a document with the deleted flag set to true, and may contain other data. They are preserved forever, just as non-deleted documents are, to ensure that replication works correctly. You can replicate with a filter to drop them (or any other subset of documents) as long as you're aware of that consequence.

Delete is not asynchronous (any more than doc create or update is), I'm referring to the way we only wait for the first 2 of the total 3 responses in a 3 or more node cluster, which seems not to apply in your case.

If this is a single node setup then your opening comment is a bit more interesting. when the DELETE response is returned the document has been marked as deleted, and so any subsequent request should reflect that, including indexes (_view, _find, etc). Are you querying with stale=ok or update=false parameters? Assuming not, how long is the delay between the deleted document appearing in results after deletion and it finally being gone?

Thank you for your patient answer. Actually, what I meant was that the reason for my original question was that I used DELETE in the asynchronous environment of JavaScript, which caused _find to retrieve old data at the same time. This issue can be ignored.
The "Tombstone" you mentioned is for replication, but if I don't have replication requirements, is there a simple command to clear them? Will not cleaning them have an impact on database performance?

rnewson · 2024-06-17T09:06:59Z

Ah, that makes sense, thank you for clarifying.

If you don't need to keep deleted documents, as you never replicate, you can use the purge endpoint. The main downside to keeping them is the disk space they will continue to occupy. This is quite small (assuming you used the DELETE method which also empties the document body) but it is not zero.

Alternative strategies;

if your data is temporal/time-based, you could make a database for distinct time periods (say, monthly), and when your oldest database contains only deleted documents you simply delete the entire database.
periodically replicate the database to a new database but with a filter that rejects deleted documents, then switch usage to the new database.

lpcy · 2024-06-17T09:26:19Z

Ah, that makes sense, thank you for clarifying.

If you don't need to keep deleted documents, as you never replicate, you can use the purge endpoint. The main downside to keeping them is the disk space they will continue to occupy. This is quite small (assuming you used the DELETE method which also empties the document body) but it is not zero.

Alternative strategies;

if your data is temporal/time-based, you could make a database for distinct time periods (say, monthly), and when your oldest database contains only deleted documents you simply delete the entire database.

periodically replicate the database to a new database but with a filter that rejects deleted documents, then switch usage to the new database.

Thank you again for your answer, I've got it.

lpcy added bug needs-triage labels Jun 16, 2024

lpcy closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do deleted documents appear in the return result of executing the _find command immediately? #5090

Why do deleted documents appear in the return result of executing the _find command immediately? #5090

lpcy commented Jun 16, 2024

big-r81 commented Jun 16, 2024

rnewson commented Jun 16, 2024

lpcy commented Jun 17, 2024

lpcy commented Jun 17, 2024

rnewson commented Jun 17, 2024

lpcy commented Jun 17, 2024

rnewson commented Jun 17, 2024 •

edited

Loading

lpcy commented Jun 17, 2024

Why do deleted documents appear in the return result of executing the _find command immediately? #5090

Why do deleted documents appear in the return result of executing the _find command immediately? #5090

Comments

lpcy commented Jun 16, 2024

Expected Behaviour

Your Environment

big-r81 commented Jun 16, 2024

rnewson commented Jun 16, 2024

lpcy commented Jun 17, 2024

lpcy commented Jun 17, 2024

rnewson commented Jun 17, 2024

lpcy commented Jun 17, 2024

rnewson commented Jun 17, 2024 • edited Loading

lpcy commented Jun 17, 2024

rnewson commented Jun 17, 2024 •

edited

Loading