-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soft-deletion for database #2666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks so much to @nickva for reviewing on my wip PRs. It is quite helpful to give my immediate feedback to continue my work.
Thinking about this some more I wonder if it would make sense to separate enabling HCA from soft-deletion. Those are rather independent features. So for example, we'd have a PR to enable HCA. We'd test that make sure that works. Then, as a separate piece of work, we'd enable soft-deletion. Would that be reasonable? |
902ec20
to
dd7b9e5
Compare
Yes, it is totally reasonable. We can have 2 PRs, i.e. one is for enabling HCA, and another is for soft-deletion. The only point I have is where to call |
2b73bbd
to
58da795
Compare
bfde239
to
ee295f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some changes but overall looks quite good!
88cf83e
to
0a225e8
Compare
0a225e8
to
3d53721
Compare
3d53721
to
2f25251
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the JSON structure posted to _undelete
is a bit awkward:
{
"source": {
"db": "source_database_name",
"sourceTS": "2020-03-24T12:00:00"
},
"target": "target_database_name"
}
Nesting the source timestamp and db names seems a bit awkward and CouchDB traditionally uses underscores and not upper case in parameter names. How about something like:
{
"source": "source_database_name",
"source_timestamp": "2020-03-24T12:00:00",
"target": "target_database_name"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to hear from @ricellis and others about the URL patterns here. Currently we've got:
GET /_deleted_dbs
GET /_deleted_dbs/DbName
POST /_deleted_dbs/_undelete
I think the first two are fine but the _undelete
call seems a bit awkward. Maybe something like:
POST /_undelete_db
Also for the |
2f25251
to
8cbae8c
Compare
Agree that is probably better avoided. I was thinking that maybe we could keep the functionality in the {
"undelete": {
"source": "source_database_name",
"source_timestamp": "2020-03-24T12:00:00",
"target": "target_database_name"
}
} but ultimately I think that probably ends up more opaque than just using differently named endpoint @mikerhodes as the initial proposer of the Unrelated to the endpoint name the example shows no TZ data in the ISO-8601 timestamp, is that an accurate representation of the output/input we're expecting? |
There are 2 different formats for ISO-8601 timestamp, We can add more description in document to state the format we used here. |
Isn't this the same for adding a While I prefer keeping new
Alternatively you could have |
18c9d1c
to
031cb85
Compare
Just for simplification, the _deleted_dbs wasn't returning DbInfo blobs directly in the past. In order to get DbInfo blobs, need to call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few minor issues.
src/fabric/src/fabric2_db.erl
Outdated
end. | ||
|
||
|
||
deleted_dbs_info(DbName, Options) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to after list_dbs_info so that it matches the order in the exports list.
src/fabric/src/fabric2_db.erl
Outdated
]} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor nit, but this should be a single line ]}}
at two indents.
src/fabric/src/fabric2_fdb.erl
Outdated
DeleteDbKey = erlfdb_tuple:pack(DeletedDbTupleKey, LayerPrefix), | ||
case erlfdb:wait(erlfdb:get(Tx, DeleteDbKey)) of | ||
not_found -> | ||
erlang:error({not_found}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You resolved this but you don't appear to have addressed the issue.
src/fabric/src/fabric2_fdb.erl
Outdated
DeleteDbKey = erlfdb_tuple:pack(DeletedDbTupleKey, LayerPrefix), | ||
case erlfdb:wait(erlfdb:get(Tx, DeleteDbKey)) of | ||
not_found -> | ||
erlang:error({not_found}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a note that if you pick the exceptions that we shouldn't be raising {not_found}
as a 1-tuple. It should be either not_found
or {not_found, missing}
to match the existing exceptions in other places.
src/fabric/src/fabric2_fdb.erl
Outdated
erlfdb:clear(Tx, DbKey), | ||
bump_db_version(Db); | ||
_Val -> | ||
erlang:error({deleted_database_exists, DbName}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message isn't quite right since we can have multiple deleted databases, just not at the same timestamp. How about {deletion_frequency_exceeded, DbName}
?
src/fabric/src/fabric2_util.erl
Outdated
iso8601_timestamp/0, | ||
do_recovery/0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor nit but lest move these above pmap/N
into their own section. Don't forget to move the implementations as well.
Also, a couple quick thoughts on the HTTP API. Looking at it a bit harder I realized that we've basically duplicated some functionality because these two requests should be functionally equivalent:
And I could see a convincing argument that the second form would be more "natural" within the existing APIs for views and so on. I've also heard some concerns that having the
|
I wasn't super fond of the chttpd tests for deleted dbs but not in a way specific enough to review very well. So I just took a quick pass through reformatting/styling etc on what looks better to me. Feel free to completely ignore this though as its purely style related. But in case you're interested, this is how I'd change things so that there's a lot less noise in that test module: https://gist.github.com/davisp/257ea1631d443031f5c3995b70093bbe |
Also, something I noticed while playing with the tests is that the |
A quick proposal with responses based on https://gist.github.com/mikerhodes/39c66576eee468a5aeaa23c5715aec9a I think this could be trivially adapted to make the responses more I guess we have |
dd4488c
to
eefd115
Compare
Thanks @davisp and @mikerhodes for your comments. Today I have one commit 1f439c4 which essentially implemented below API spec.
From the functional set exposed to users, I think that it is similar to what @mikerhodes drafted in [1]. The difference is the API spec. From user case and implementation's point of view, there are several points I want to mention.
User can gradually use the progressive approach like mentioned in [1] to get the detailed information of the deleted db.
However, looks that user most like should review information of the deletion time, doc number, disk size to decide to restore it or permanently delete it.
In theory, we can implement both of them to get information of deleted db. @davisp also found one situation where we have to return 200 and [] if the specified db is not in deleted_db list when selecting Hope this can give some explanation how we are getting to the current status of API spec. [1] https://gist.github.com/mikerhodes/39c66576eee468a5aeaa23c5715aec9a |
Can I see an example of the expected body now? If |
Yes, below is one example.
|
OK, so now the
That also removes the awkward naming around Also given Then the body would be just (or also with an optional
Although I think the parameter name in the body should match that used on the It seems like it would be nice to use paths consistently throughout as @mikerhodes proposed, but I can see the rationale for preferring to use |
So, apparently I'm the only one that's been assuming that this is a mirror of the @mikerhodes I don't think the @ricellis If we stay with the Consistency in the @jiangphcn I'm can't decide whether you're suggesting we support the |
hey @davisp I think that you are not the only one that's been assuming that this is a mirror of the GET _dbs_info API end point and working from that perspective. From #2666, you can see that I am for that and implemented with this approach. In addition, I don't support the suggestion about
|
eefd115
to
1f439c4
Compare
@davisp @jiangphcn Apologies if I missed that we were mirroring I didn't particularly think Anyway, I think we've reached a decent stopping point here (given that I think mostly I've just got everyone to justify an already good solution tbh rather than moving us forward), so let's move forward and not make @jiangphcn implement yet another thing 😉 Nice work all in all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangphcn Everything looks pretty good for the most part. I do have a few cleanup suggestions to try and streamline a lot of the names and match existing bits for consistency. Its a Sunday so I got lazy and just did this in an editor rather than go through with comments.
db-softdeletion...davisp-db-softdeletion
I know that diff looks huge but its really not that bad. Most of it is just things like moving function export/implementations around. I did do a bit of fancy footwork in chttpd_misc.erl
to factor out the commonalities with the _dbs_info
handler. And there was a slight behavior change in that I renamed source_timestamp
to timestamp
but it suddenly occurs to me you might have changed the query string parameter for deletions. Either or is fine by me.
Everything looks quite good now other than the last sweep of cleanup changes. Also, I didn't do anything crazy like try and compile those commits. I just wanted to do enough to be clear what I was suggesting and what not. Also the fabric tests weren't changed which will need some work with those changes that removed the multiple versions of list_deleted_dbs/list_deleted_db_infos or whatever they were originally.
thanks @davisp and @mikerhodes for comments over weekend. I come with up one new commit |
2272dcd
to
3b0f7d5
Compare
Instead of automatically and immediately removing data and index in database after a delete operation, soft-deletion allows to restore the deleted data back to original state due to a “fat finger”or undesired delete operation, up to defined periods, such as 48 hours. Co-Authored-By: Paul J. Davis <[email protected]>
Co-Authored-By: Paul J. Davis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Soft deleteInstead of automatically and immediately removing the data and index in the database after a delete operation, soft delete restores the deleted data back to the original state after an accidental delete or undesired delete operation, up to a defined period, such as 48 hours.
|
Query parameters | Description |
---|---|
descending (boolean) |
Return the databases in descending order by key. Default is false. |
endkey (json) |
Stop returning databases when the specified key is reached. |
end_key (json) |
Alias for endkey parameter. |
key (json) |
Return the databases for the specified key. It can be database name or database name + timestamp. |
limit (number) |
Limit the number of the returned databases to the specified number. |
skip (number) |
Skip this number of databases before starting to return the results. Default is 0. |
startkey (json) |
Return databases starting with the specified key. |
start_key (json) |
Alias for startkey . |
Response Headers:
Content-Type –
application/json
text/plain; charset=utf-8
{: codeblock}
Code | Description |
---|---|
200 OK |
Request completed successfully. |
400 Bad Request |
Bad Request. Invalid payload in request. |
401 Unauthorized |
CouchDB Server Administrator privileges required. |
404 NotFound |
Invalid deleted database or timestamp. |
See the following example request:
GET /_deleted_dbs HTTP/1.1
Accept: application/json
Host: localhost:5984
{: codeblock}
See the following example response:
HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 52
Content-Type: application/json
Date: Tue, 23 Mar 2020 06:57:48 GMT
Server: CouchDB (Erlang/OTP)
[
{
"cluster": {
"n": 0,
"q": 0,
"r": 0,
"w": 0
},
"compact_running": false,
"data_size": 193164408719,
"db_name": "invoices",
"disk_format_version": 0,
"disk_size": 46114703224,
"instance_start_time": "0",
"purge_seq": 0,
"update_seq": "981...uUQ",
"doc_del_count": 5564,
"doc_count": 9818541,
"sizes": {
"external": 193164408719,
"views": 34961621142
},
"deleted": true,
"timestamp": "2020-04-07T23:19:12Z"
},
{
"cluster": {
"n": 0,
"q": 0,
"r": 0,
"w": 0
},
"compact_running": false,
"data_size": 0,
"db_name": "contacts",
"disk_format_version": 0,
"disk_size": 0,
"instance_start_time": "0",
"purge_seq": 0,
"update_seq": "000000000000000000000000",
"doc_del_count": 0,
"doc_count": 0,
"sizes": {
"external": 2,
"views": 0
},
"deleted": true,
"timestamp": "2020-04-07T23:19:02Z"
},
]
{: codeblock}
POST /_deleted_dbs
{: #post-_deleted_dbs_undelete}
Send a POST
request to restore a deleted database.
Parameters:
None
Request Headers:
Accept –
application/json
text/plain
{
"undelete": {
"timestamp" : "2020-03-23T02:26:35Z",
"source" : "db01",
"target" : "db02"
}
}
- "timestamp" - timestamp when database was deleted
- "target" could be optional and default to the source database name.
{: codeblock}
Response Headers:
Content-Type –
application/json
text/plain; charset=utf-8
{: codeblock}
Response JSON object | Description |
---|---|
ok (boolean) |
Operation status. Available in case of success. |
error (string) |
Error type. Available if response code is 4xx. |
reason (string) |
Error description. Available if response code is 4xx. |
Code | Message |
---|---|
200 Undeleted |
Database undeleted successfully. |
400 Bad Request |
Bad Request. Invalid payload in request. |
401 Unauthorized |
CouchDB Server Administrator privileges required. |
404 NotFound |
Invalid deleted timestamp. |
412 Precondition Failed |
Database already exists. |
DELETE /_deleted_dbs/{db}
{: #get-_deleted_dbs-db}
Send a DELETE
request to permanently delete the database instance which was soft-deleted with the specified timestamp.
Parameters | Description |
---|---|
timestamp |
Timestamp when database was deleted |
Request Headers:
Content-Type –
application/json
{: codeblock}
Response Headers:
Content-Type –
application/json
text/plain; charset=utf-8
{: codeblock}
Response JSON object | Description |
---|---|
ok (boolean) |
Operation status. Available in case of success. |
error (string) |
Error type. Available if response code is 4xx. |
reason (string) |
Error description. Available if response code is 4xx. |
Code | Message |
---|---|
200 Deleted |
Database permanently deleted successfully. |
400 Bad Request |
Bad Request. Invalid payload in request. |
401 Unauthorized |
CouchDB Server Administrator privileges required. |
404 NotFound |
Invalid deleted database or timestamp. |
Overview
Instead of automatically and immediately removing data and index in database after a delete operation, soft-deletion allows to restore the deleted data back to original state due to a “fat finger”or undesired delete operation, up to defined periods, such as 48 hours.
In CouchDB 3.0, soft-deletion of database is implemented in [1]. The .couch file is renamed with the ..deleted.couch file after soft-deletion is enabled, and such file can be changed back to .couch for the purpose of restore. If restore is not needed and some specified period passed, the ..deleted.couch file can be deleted to achieve deletion of database permanently.
In CouchDB 4.0, with the introduction of FoundationDB, the data model and storage is changed. In order to support soft-deletion, we propose below solution and then implement them.
Testing recommendations
make check skip_deps+=couch_epi apps=fabric tests=crud_test_
make check skip_deps+=couch_epi apps=chttpd tests=deleted_dbs_test_
Related Issues or Pull Requests
#2707
Checklist
rel/overlay/etc/default.ini