Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soft-deletion for database #2666

Merged
merged 2 commits into from
Apr 7, 2020
Merged

soft-deletion for database #2666

merged 2 commits into from
Apr 7, 2020

Conversation

jiangphcn
Copy link
Contributor

@jiangphcn jiangphcn commented Mar 16, 2020

Overview

Instead of automatically and immediately removing data and index in database after a delete operation, soft-deletion allows to restore the deleted data back to original state due to a “fat finger”or undesired delete operation, up to defined periods, such as 48 hours.

In CouchDB 3.0, soft-deletion of database is implemented in [1]. The .couch file is renamed with the ..deleted.couch file after soft-deletion is enabled, and such file can be changed back to .couch for the purpose of restore. If restore is not needed and some specified period passed, the ..deleted.couch file can be deleted to achieve deletion of database permanently.

In CouchDB 4.0, with the introduction of FoundationDB, the data model and storage is changed. In order to support soft-deletion, we propose below solution and then implement them.

Testing recommendations

make check skip_deps+=couch_epi apps=fabric tests=crud_test_

======================== EUnit ========================
Test database CRUD operations
  fabric2_db_crud_tests:36: crud_test_ (create_db)...[0.032 s] ok
  fabric2_db_crud_tests:37: crud_test_ (open_db)...[0.020 s] ok
  fabric2_db_crud_tests:38: crud_test_ (delete_db)...[0.027 s] ok
  fabric2_db_crud_tests:39: crud_test_ (recreate_db)...[0.064 s] ok
  fabric2_db_crud_tests:40: crud_test_ (undelete_db)...[0.038 s] ok
  fabric2_db_crud_tests:41: crud_test_ (remove_deleted_db)...[0.037 s] ok
  fabric2_db_crud_tests:42: crud_test_ (old_db_handle)...[0.154 s] ok
  fabric2_db_crud_tests:43: crud_test_ (list_dbs)...[0.023 s] ok
  fabric2_db_crud_tests:44: crud_test_ (list_dbs_user_fun)...[0.014 s] ok
  fabric2_db_crud_tests:45: crud_test_ (list_dbs_user_fun_partial)...ok
  fabric2_db_crud_tests:46: crud_test_ (list_deleted_dbs)...[0.029 s] ok
  fabric2_db_crud_tests:47: crud_test_ (list_deleted_dbs_user_fun)...[0.027 s] ok
  fabric2_db_crud_tests:48: crud_test_ (list_deleted_dbs_user_fun_partial)...ok
  fabric2_db_crud_tests:49: crud_test_ (list_dbs_info)...[0.049 s] ok
  fabric2_db_crud_tests:50: crud_test_ (list_dbs_info_partial)...ok
  fabric2_db_crud_tests:51: crud_test_ (list_dbs_tx_too_old)...[0.078 s] ok
  fabric2_db_crud_tests:52: crud_test_ (list_dbs_info_tx_too_old)...[0.757 s] ok
  fabric2_db_crud_tests:53: crud_test_ (list_deleted_dbs_info)...[0.060 s] ok
  fabric2_db_crud_tests:54: crud_test_ (get_info_wait_retry_on_tx_too_old)...[0.098 s] ok
  fabric2_db_crud_tests:55: crud_test_ (get_info_wait_retry_on_tx_abort)...[0.072 s] ok
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
  [done in 2.674 s]
=======================================================
  All 20 tests passed.

make check skip_deps+=couch_epi apps=chttpd tests=deleted_dbs_test_

======================== EUnit ========================
chttpd deleted dbs tests
  chttpd_deleted_dbs_test:72: should_return_error_for_unsupported_method...[0.107 s] ok
  chttpd_deleted_dbs_test:82: should_list_deleted_dbs...[0.052 s] ok
  chttpd_deleted_dbs_test:95: should_list_deleted_dbs_info...[0.022 s] ok
  chttpd_deleted_dbs_test:105: should_undelete_db...[0.040 s] ok
  chttpd_deleted_dbs_test:130: should_remove_deleted_db...[0.040 s] ok
  chttpd_deleted_dbs_test:145: should_undelete_db_to_target_db...[0.040 s] ok
  chttpd_deleted_dbs_test:172: should_not_undelete_db_to_existing_db...[0.034 s] ok
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
  [done in 0.357 s]
=======================================================
  All 7 tests passed.

Related Issues or Pull Requests

#2707

Checklist

Copy link
Contributor Author

@jiangphcn jiangphcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks so much to @nickva for reviewing on my wip PRs. It is quite helpful to give my immediate feedback to continue my work.

src/fabric/src/fabric2_fdb.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_fdb.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric_util.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_fdb.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_fdb.erl Outdated Show resolved Hide resolved
@nickva
Copy link
Contributor

nickva commented Mar 19, 2020

Thinking about this some more I wonder if it would make sense to separate enabling HCA from soft-deletion. Those are rather independent features. So for example, we'd have a PR to enable HCA. We'd test that make sure that works. Then, as a separate piece of work, we'd enable soft-deletion. Would that be reasonable?

@jiangphcn jiangphcn force-pushed the db-softdeletion branch 3 times, most recently from 902ec20 to dd7b9e5 Compare March 20, 2020 12:53
@jiangphcn
Copy link
Contributor Author

Thinking about this some more I wonder if it would make sense to separate enabling HCA from soft-deletion. Those are rather independent features. So for example, we'd have a PR to enable HCA. We'd test that make sure that works. Then, as a separate piece of work, we'd enable soft-deletion. Would that be reasonable?

Yes, it is totally reasonable. We can have 2 PRs, i.e. one is for enabling HCA, and another is for soft-deletion. The only point I have is where to call DbPrefixAllocator = erlfdb_hca:create(?ERLFDB_EXTEND(DbId, <<"hca">>)),. In https://github.com/cloudant-labs/couchdb-erlfdb/pull/16/files#diff-cf5acaebd8465ae3276370bbf8d7aaf1R383, it was supposed that this can be called once or limited times, but after moving to fabric2_fdb.erl , I stil try to find right place to call it.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some changes but overall looks quite good!

src/fabric/include/fabric2.hrl Outdated Show resolved Hide resolved
src/fabric/include/fabric2.hrl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_db.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_db.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_fdb.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_util.erl Outdated Show resolved Hide resolved
src/fabric/test/fabric2_db_crud_tests.erl Show resolved Hide resolved
src/chttpd/src/chttpd_db.erl Outdated Show resolved Hide resolved
src/chttpd/src/chttpd_misc.erl Outdated Show resolved Hide resolved
src/fabric/src/fabric2_db.erl Outdated Show resolved Hide resolved
@jiangphcn jiangphcn force-pushed the db-softdeletion branch 2 times, most recently from 88cf83e to 0a225e8 Compare March 24, 2020 08:15
@jiangphcn jiangphcn changed the title [WIP] soft-deletion for database soft-deletion for database Mar 24, 2020
@jiangphcn
Copy link
Contributor Author

thanks Paul @davisp for your review and comments. Could you please check 3d53721 I tried to use to address your comments?

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the JSON structure posted to _undelete is a bit awkward:

{
  "source": {
    "db": "source_database_name",
    "sourceTS": "2020-03-24T12:00:00"
  },
  "target": "target_database_name"
}

Nesting the source timestamp and db names seems a bit awkward and CouchDB traditionally uses underscores and not upper case in parameter names. How about something like:

{
  "source": "source_database_name",
  "source_timestamp": "2020-03-24T12:00:00",
  "target": "target_database_name"
}

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hear from @ricellis and others about the URL patterns here. Currently we've got:

  1. GET /_deleted_dbs
  2. GET /_deleted_dbs/DbName
  3. POST /_deleted_dbs/_undelete

I think the first two are fine but the _undelete call seems a bit awkward. Maybe something like:

POST /_undelete_db

@davisp
Copy link
Member

davisp commented Mar 25, 2020

Also for the POST /_deleted_dbs/_undelete I'll also note that we do have a set of system database names that start with an underscore (_dbs/_nodes/_users). Obviously we don't have an _undelete database but it is technically mixing that API endpoint into the valid range of database names.

@ricellis
Copy link

Obviously we don't have an _undelete database but it is technically mixing that API endpoint into the valid range of database names.

Agree that is probably better avoided.

I was thinking that maybe we could keep the functionality in the _deleted_dbs namespace by POST /_deleted_dbs with something like this:

{
  "undelete": {
    "source": "source_database_name",
    "source_timestamp": "2020-03-24T12:00:00",
    "target": "target_database_name"
  }
}

but ultimately I think that probably ends up more opaque than just using differently named endpoint POST /_undelete_db so I'm OK with _undelete_db.

@mikerhodes as the initial proposer of the _deleted_dbs/_restore pattern are you OK with POST /_undelete_db?

Unrelated to the endpoint name the example shows no TZ data in the ISO-8601 timestamp, is that an accurate representation of the output/input we're expecting?

@jiangphcn
Copy link
Contributor Author

Unrelated to the endpoint name the example shows no TZ data in the ISO-8601 timestamp, is that an accurate representation of the output/input we're expecting?

There are 2 different formats for ISO-8601 timestamp, 1994-11-05T08:15:30-05:00 and 1994-11-05T13:15:30Z. They all correspond to November 5, 1994, 8:15:30 am, US Eastern Standard Time. The detail can be found at https://www.w3.org/TR/NOTE-datetime. We are using 1994-11-05T13:15:30Z where TZ information exists as well.

We can add more description in document to state the format we used here.

@mikerhodes
Copy link
Contributor

Also for the POST /_deleted_dbs/_undelete I'll also note that we do have a set of system database names that start with an underscore (_dbs/_nodes/_users). Obviously we don't have an _undelete database but it is technically mixing that API endpoint into the valid range of database names.

Isn't this the same for adding a /_undelete_db into the / namespace? In both cases you're basically creating a new reserved name in the list of databases, although there is certainly a lot of prior art for this in / so it wouldn't be as surprising and likely to be forgotten later.

While I prefer keeping new _ endpoints in / to a minimum, I can live with having a few new endpoints in the / namespace. Before we make the final call, one additional option which keeps things contained to a single new entry in / while removing the dbname vs. action ambiguity @davisp mentioned, and keeping things somewhat RESTful, would be to extract the dbs bit into a further sub-namespace:

  1. GET /_deleted_dbs/dbs
  2. GET /_deleted_dbs/dbs/DbName
  3. POST /_deleted_dbs/undelete

Alternatively you could have /_deleted/dbs (etc.) if you wanted to avoid the dbs repetition, though I feel that _deleted is a bit generic for my tastes.

@jiangphcn
Copy link
Contributor Author

Just for simplification, the _deleted_dbs wasn't returning DbInfo blobs directly in the past. In order to get DbInfo blobs, need to call GET _deleted_dbs/{db}/. But with new commit 031cb85, DbInfo Blobs is added as well for GET _deleted_dbs.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few minor issues.

end.


deleted_dbs_info(DbName, Options) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to after list_dbs_info so that it matches the order in the exports list.

Comment on lines 1118 to 1119
]}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit, but this should be a single line ]}} at two indents.

DeleteDbKey = erlfdb_tuple:pack(DeletedDbTupleKey, LayerPrefix),
case erlfdb:wait(erlfdb:get(Tx, DeleteDbKey)) of
not_found ->
erlang:error({not_found});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You resolved this but you don't appear to have addressed the issue.

DeleteDbKey = erlfdb_tuple:pack(DeletedDbTupleKey, LayerPrefix),
case erlfdb:wait(erlfdb:get(Tx, DeleteDbKey)) of
not_found ->
erlang:error({not_found});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a note that if you pick the exceptions that we shouldn't be raising {not_found} as a 1-tuple. It should be either not_found or {not_found, missing} to match the existing exceptions in other places.

erlfdb:clear(Tx, DbKey),
bump_db_version(Db);
_Val ->
erlang:error({deleted_database_exists, DbName})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message isn't quite right since we can have multiple deleted databases, just not at the same timestamp. How about {deletion_frequency_exceeded, DbName}?

Comment on lines 44 to 45
iso8601_timestamp/0,
do_recovery/0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit but lest move these above pmap/N into their own section. Don't forget to move the implementations as well.

@davisp
Copy link
Member

davisp commented Apr 2, 2020

Also, a couple quick thoughts on the HTTP API. Looking at it a bit harder I realized that we've basically duplicated some functionality because these two requests should be functionally equivalent:

GET /_deleted_dbs/DbName`
GET /_deleted_dbs?key=DbName`

And I could see a convincing argument that the second form would be more "natural" within the existing APIs for views and so on.

I've also heard some concerns that having the undelete vs delete semantics of our POST /_deleted_dbs setup makes auditing traffic hard/impossible without introspecting the request body which seems like a legitimate critique. How about if we tweaked things to be like such:

  • GET /_deleted_dbs - List deleted dbs infos as currently.
  • POST /_deleted_dbs - Only accepts the undelete JSON body
  • DELETE /_deleted_dbs/DbName?timestamp=2020-01-01T00:00:00 - Remove deleted db

/cc @mikerhodes @ricellis @nickva

@davisp
Copy link
Member

davisp commented Apr 2, 2020

I wasn't super fond of the chttpd tests for deleted dbs but not in a way specific enough to review very well. So I just took a quick pass through reformatting/styling etc on what looks better to me. Feel free to completely ignore this though as its purely style related. But in case you're interested, this is how I'd change things so that there's a lot less noise in that test module:

https://gist.github.com/davisp/257ea1631d443031f5c3995b70093bbe

@davisp
Copy link
Member

davisp commented Apr 2, 2020

Also, something I noticed while playing with the tests is that the GET /_deleted_dbs/DbName is not very ergonomic. Due to how things are implemented, if we don't have any deleted snapshots we still return a 200 response with an empty array [] as the body. I would have expected a 404 in that situation given that the DbName is part of the URL. I realize that's purely due to how things are implemented, but it suggests to me that the GET /_deleted_dbs?key=DbName approach is better since we'd be more likely to understand that a 200 and empty array returned is expected since its more like a view query where results might or might not exist.

@mikerhodes
Copy link
Contributor

mikerhodes commented Apr 3, 2020

  • The main problem I can see for GET /_deleted_dbs?key=DbName is that we lose the mirror with including the database name like we have for DELETE /_deleted_dbs/DbName?timestamp=2020-01-01T00:00:00.

  • I also wondered whether the timestamp is required for the DELETE? If so, passing it as part of the path would be more natural (which I'm aware was one of the original suggestions I wasn't so keen on 😁).

  • If we look to _all_dbs as a model (I pondered this as we're looking at databases and the API is much simpler to copy), it isn't correct that these are equivalent perhaps:

    GET /_deleted_dbs/DbName
    GET /_deleted_dbs?key=DbName
    

    Because these return different values in the existing API:

    GET /DbName
    GET /_all_dbs?key=DbName
    

    So there's a question here as to whether /_deleted_dbs is more like a _all_dbs or _all_docs? Are we talking databases or some other type of entity, that /_deleted_dbs is more like talking about a set of metadata about deleted databases rather than teh databases themselves.

A quick proposal with responses based on _all_dbs that is as REST as I could make it, with all the lessons from the discussion above tried to be taken onboard:

https://gist.github.com/mikerhodes/39c66576eee468a5aeaa23c5715aec9a

I think this could be trivially adapted to make the responses more _all_docs/view like if we wanted using the same paths.

I guess we have _dbs_info as a guide too, though that doesn't tell us what /_deleted_dbs/DbName should be...

@jiangphcn
Copy link
Contributor Author

jiangphcn commented Apr 3, 2020

Thanks @davisp and @mikerhodes for your comments. Today I have one commit 1f439c4 which essentially implemented below API spec.

GET /_deleted_dbs - List deleted dbs infos
GET /_deleted_dbs?key=dbName - List specified deleted db infos
POST /_deleted_dbs - Only accepts the undelete JSON body to *restore* the deleted db
DELETE /_deleted_dbs/DbName?timestamp=2020-01-01T00:00:00 - Remove deleted db

From the functional set exposed to users, I think that it is similar to what @mikerhodes drafted in [1]. The difference is the API spec.

From user case and implementation's point of view, there are several points I want to mention.

  1. GET /_deleted_dbs - List deleted dbs infos
    We directly return dbs info instead of the deleted dbname here. The same case is for GET /_deleted_dbs?key=dbName. This is mainly following the approach in _dbs_info. I think that most likely user is not only interested in dbname, but also in some detailed information about deleted database.
  {
    "key": "db01",
    "timestamp": "2020-04-03T12:09:15Z",
    "value": {
      "update_seq": "000000000000000000000000",
      "doc_del_count": 0,
      "doc_count": 0,
      "sizes": {
        "external": 2,
        "views": 0
      }
    }

User can gradually use the progressive approach like mentioned in [1] to get the detailed information of the deleted db.

GET /_deleted_dbs
GET /_deleted_dbs/DbName
GET /_deleted_dbs/DbName/2020-01-01T00:00:00

However, looks that user most like should review information of the deletion time, doc number, disk size to decide to restore it or permanently delete it.

  1. with regard to
GET /_deleted_dbs/DbName
GET /_deleted_dbs?key=DbName

In theory, we can implement both of them to get information of deleted db. @davisp also found one situation where we have to return 200 and [] if the specified db is not in deleted_db list when selecting GET /_deleted_dbs/DbName. This is not expected error code. So we switch to GET /_deleted_dbs?key=DbName.

Hope this can give some explanation how we are getting to the current status of API spec.

[1] https://gist.github.com/mikerhodes/39c66576eee468a5aeaa23c5715aec9a

@ricellis
Copy link

ricellis commented Apr 3, 2020

POST /_deleted_dbs - Only accepts the undelete JSON body to restore the deleted db

Can I see an example of the expected body now? If undelete is the only "action" for POST is there any reason to call it out separately?

@jiangphcn
Copy link
Contributor Author

jiangphcn commented Apr 3, 2020

POST /_deleted_dbs - Only accepts the undelete JSON body to restore the deleted db

Can I see an example of the expected body now? If undelete is the only "action" for POST is there any reason to call it out separately?

Yes, below is one example.

{
        "undelete": {
                "source_timestamp": "2020-04-01T07:13:42Z",
                "source": "db01"
        }
}

@ricellis
Copy link

ricellis commented Apr 3, 2020

OK, so now the POST can only do an undelete couldn't the body just be:

{
"source_timestamp": "2020-04-01T07:13:42Z",
"source": "db01",
}

That also removes the awkward naming around undelete that I know was bothering some people.

Also given
DELETE /_deleted_dbs/{dbName}
shouldn't we also
POST /_deleted_dbs/{dbName}
since they are operating on the same resource?

Then the body would be just (or also with an optional target):

{
"source_timestamp": "2020-04-01T07:13:42Z"
}

Although I think the parameter name in the body should match that used on the DELETE request as well; either both timestamp or both source_timestamp.

It seems like it would be nice to use paths consistently throughout as @mikerhodes proposed, but I can see the rationale for preferring to use key=dbname to be consistent with the view-like APIs.

@davisp
Copy link
Member

davisp commented Apr 3, 2020

So, apparently I'm the only one that's been assuming that this is a mirror of the GET _dbs_info API end point and working from that perspective.

@mikerhodes I don't think the _all_dbs approach is very useful. When would you ever want just a list of database names that have soft deletions or just a list of timestamps? Its RESTy in all the bad ways that lead us to adding another API endpoint that dumps all of that information in a single response like we already had to do for GET|POST /_dbs_info.

@ricellis If we stay with the POST _deleted_dbs then I don't think its a good idea to remove the "undelete" key as the API loses its self-documentation aspect. Plus we'd be hamstrung adding other operations in the future (not that I can think of any). If we want to revert to POST _undelete then I'd be fine dropping the JSON key.

Consistency in the "timestamp" key is a good idea. I'd vote for "timestamp" since there's no target timestamp to disambiguate and DELETE /_deleted_dbs/DbName?source_timestamp=foo reads awkwardly.

@jiangphcn I'm can't decide whether you're suggesting we support the GET /_deleted_dbs/DbName/Timestamp API or not. I don't think we should add alternative APIs for things that can be accomplished with a single view based API so I would vote against adding them. But you might also saying we don't need them, in which case we're in agreement and you can ignore me.

@jiangphcn
Copy link
Contributor Author

jiangphcn commented Apr 3, 2020

hey @davisp

I think that you are not the only one that's been assuming that this is a mirror of the GET _dbs_info API end point and working from that perspective. From #2666, you can see that I am for that and implemented with this approach.

In addition, I don't support the suggestion about the GET /_deleted_dbs/DbName/Timestamp API. As i mentioned before, it looks that user most likely should review information of the deletion time, doc number, disk size to decide to restore it or permanently delete it. It is convenient to return DbInfo one time instead of using the progressive approach like mentioned in [1] to get the detailed information of the deleted db.

GET /_deleted_dbs
GET /_deleted_dbs/DbName
GET /_deleted_dbs/DbName/2020-01-01T00:00:00

@mikerhodes
Copy link
Contributor

@davisp @jiangphcn Apologies if I missed that we were mirroring _dbs_info in our API. That makes things a bit clearer around use of key. Flipping between internal structures (DbInfo) and external APIs (/_dbs_info) probably got in the way of my understanding.

I didn't particularly think _all_dbs was that useful, but it seemed like we were not following any particular pattern so I just picked the one that required me to type fewest characters to demonstrate the idea (an idea which doesn't make much sense now I understand the motivation behind the older API, which I'm not sure anyone really spelled out in the mailing list or here before now, but I probably missed something somewhere 😞 ).

Anyway, I think we've reached a decent stopping point here (given that I think mostly I've just got everyone to justify an already good solution tbh rather than moving us forward), so let's move forward and not make @jiangphcn implement yet another thing 😉 Nice work all in all.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangphcn Everything looks pretty good for the most part. I do have a few cleanup suggestions to try and streamline a lot of the names and match existing bits for consistency. Its a Sunday so I got lazy and just did this in an editor rather than go through with comments.

db-softdeletion...davisp-db-softdeletion

I know that diff looks huge but its really not that bad. Most of it is just things like moving function export/implementations around. I did do a bit of fancy footwork in chttpd_misc.erl to factor out the commonalities with the _dbs_info handler. And there was a slight behavior change in that I renamed source_timestamp to timestamp but it suddenly occurs to me you might have changed the query string parameter for deletions. Either or is fine by me.

Everything looks quite good now other than the last sweep of cleanup changes. Also, I didn't do anything crazy like try and compile those commits. I just wanted to do enough to be clear what I was suggesting and what not. Also the fabric tests weren't changed which will need some work with those changes that removed the multiple versions of list_deleted_dbs/list_deleted_db_infos or whatever they were originally.

@jiangphcn
Copy link
Contributor Author

jiangphcn commented Apr 6, 2020

thanks @davisp and @mikerhodes for comments over weekend. I come with up one new commit
e8de2fc where mainly covers the suggestion from Paul about code refactor. One thing is related to combine implementation of /_deleted_dbs and /_deleted_dbs?key. This mainly follows the way of _dbs_info while there is slight difference for /_deleted_dbs?key because the key in key/value pair for _deleted_dbs is not only dbname, but also includes timestamp. So i introduce the way of 527d281#diff-20d0112d14281f71e2bd09f75da80f27R286-R295 to find dbinfo for specified key.

@jiangphcn jiangphcn force-pushed the db-softdeletion branch 4 times, most recently from 2272dcd to 3b0f7d5 Compare April 7, 2020 00:45
jiangphcn and others added 2 commits April 7, 2020 11:23
Instead of automatically and immediately removing data and index in
database after a delete operation, soft-deletion allows to restore
the deleted data back to original state due to a “fat finger”or
undesired delete operation, up to defined periods, such as 48 hours.

Co-Authored-By: Paul J. Davis <[email protected]>
Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@jiangphcn jiangphcn merged commit 6c1d7a9 into prototype/fdb-layer Apr 7, 2020
@jiangphcn jiangphcn deleted the db-softdeletion branch April 7, 2020 17:40
@jiangphcn
Copy link
Contributor Author

Soft delete

Instead of automatically and immediately removing the data and index in the database after a delete operation, soft delete restores the deleted data back to the original state after an accidental delete or undesired delete operation, up to a defined period, such as 48 hours.

GET /_deleted_dbs

Send a GET request to find a list of all the deleted databases in the {{site.data.keyword.cloudant_short_notm}} instance.

Request Headers:

Accept –

    application/json
	text/plain

{: codeblock}

Query parameters Description
descending (boolean) Return the databases in descending order by key. Default is false.
endkey (json) Stop returning databases when the specified key is reached.
end_key (json) Alias for endkey parameter.
key (json) Return the databases for the specified key. It can be database name or database name + timestamp.
limit (number) Limit the number of the returned databases to the specified number.
skip (number) Skip this number of databases before starting to return the results. Default is 0.
startkey (json) Return databases starting with the specified key.
start_key (json) Alias for startkey.

Response Headers:

Content-Type –
    application/json
    text/plain; charset=utf-8

{: codeblock}

Code Description
200 OK Request completed successfully.
400 Bad Request Bad Request. Invalid payload in request.
401 Unauthorized CouchDB Server Administrator privileges required.
404 NotFound Invalid deleted database or timestamp.

See the following example request:

GET /_deleted_dbs HTTP/1.1
Accept: application/json
Host: localhost:5984

{: codeblock}

See the following example response:

HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 52
Content-Type: application/json
Date: Tue, 23 Mar 2020 06:57:48 GMT
Server: CouchDB (Erlang/OTP)

[
  {
    "cluster": {
      "n": 0,
      "q": 0,
      "r": 0,
      "w": 0
    },
    "compact_running": false,
    "data_size": 193164408719,
    "db_name": "invoices",
    "disk_format_version": 0,
    "disk_size": 46114703224,
    "instance_start_time": "0",
    "purge_seq": 0,
    "update_seq": "981...uUQ",
    "doc_del_count": 5564,
    "doc_count": 9818541,
    "sizes": {
      "external": 193164408719,
      "views": 34961621142
    },
    "deleted": true,
    "timestamp": "2020-04-07T23:19:12Z"
  },
  {
    "cluster": {
      "n": 0,
      "q": 0,
      "r": 0,
      "w": 0
    },
    "compact_running": false,
    "data_size": 0,
    "db_name": "contacts",
    "disk_format_version": 0,
    "disk_size": 0,
    "instance_start_time": "0",
    "purge_seq": 0,
    "update_seq": "000000000000000000000000",
    "doc_del_count": 0,
    "doc_count": 0,
    "sizes": {
      "external": 2,
      "views": 0
    },
    "deleted": true,
    "timestamp": "2020-04-07T23:19:02Z"
  },
]

{: codeblock}

POST /_deleted_dbs

{: #post-_deleted_dbs_undelete}

Send a POST request to restore a deleted database.

Parameters:
None

Request Headers:

Accept –
    application/json
    text/plain

{
    "undelete": {
        "timestamp" : "2020-03-23T02:26:35Z",
        "source" : "db01",
        "target" : "db02"
    }
}
-  "timestamp" - timestamp when database was deleted
-  "target" could be optional and default to the source database name.

{: codeblock}

Response Headers:

Content-Type –
    application/json
    text/plain; charset=utf-8

{: codeblock}

Response JSON object Description
ok (boolean) Operation status. Available in case of success.
error (string) Error type. Available if response code is 4xx.
reason (string) Error description. Available if response code is 4xx.
Code Message
200 Undeleted Database undeleted successfully.
400 Bad Request Bad Request. Invalid payload in request.
401 Unauthorized CouchDB Server Administrator privileges required.
404 NotFound Invalid deleted timestamp.
412 Precondition Failed Database already exists.

DELETE /_deleted_dbs/{db}

{: #get-_deleted_dbs-db}

Send a DELETE request to permanently delete the database instance which was soft-deleted with the specified timestamp.

Parameters Description
timestamp Timestamp when database was deleted

Request Headers:

Content-Type –
    application/json

{: codeblock}

Response Headers:

Content-Type –
    application/json
    text/plain; charset=utf-8

{: codeblock}

Response JSON object Description
ok (boolean) Operation status. Available in case of success.
error (string) Error type. Available if response code is 4xx.
reason (string) Error description. Available if response code is 4xx.
Code Message
200 Deleted Database permanently deleted successfully.
400 Bad Request Bad Request. Invalid payload in request.
401 Unauthorized CouchDB Server Administrator privileges required.
404 NotFound Invalid deleted database or timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants