Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couchdb 3288 mixed cluster upgrade #495

Merged
merged 9 commits into from
Sep 27, 2017
Merged

Conversation

davisp
Copy link
Member

@davisp davisp commented Apr 27, 2017

Overview

This is preparatory work for adding pluggable storage engines. The motivation for this ground work is to enable users to perform rolling reboots of clusters. As such there are a few things that need to change to make that happen.

  1. Remove the public definition of #db{}
  2. Add a new fabric_rpc:create_db/2 that accepts an options list
  3. Create a template of the new #db{} for forwards compatibility

The removal of the #db{} definition is what's responsible for 99% of these changes. And while there is a decent amount of code churn here its nearly all quite straightforward changes from Db#db.name to couch_db:name(Db) type changes so theoretically this should be a pretty quick review.

The only somewhat difficult bits are when we were matching a #db{} record in a function head. For the most part this was just to extract fields from the record so those are easily reviewed. Though there are a few cases in the replicator that were using it to tell the different between #db{} and #httpd{}. For those we have to switch the order of clauses to pull out the #httpd{} logic first.

Testing recommendations

There's no behavior changes so the existing test suite covers the existing logic.

JIRA issue number

https://issues.apache.org/jira/browse/COUCHDB-3288

Related Pull Requests

I'm going to open a PR for COUCHDB-3287 which is the main PSE work that will be asking to merge into this PR. This is purely for the sake of review-ability. When we go to merge these we'll merge this PR first, and then merge 3287 into master. I expect the review of this work to take at least a couple weeks since it is quite a large change to the core bit of the database.

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;



-record(new_pse_db, {
name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding vsn field and follow the convention that vsn is always a first field in the record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this I can see how it could be good for upgrades, I'll add it.

}).


-define(NEW_PSE_DB, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding vsn field and follow the convention that vsn is always a first field in the record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What convention is that? I know we pondered it once but I don't think I've ever actually seen that used anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this convention yet. I am just proposing to fix the index of a vsn field so we can write a generic get_record_version which would work with any versioned record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so its not a convention but you think its a good idea ;P I'm not against it but I wonder if we shouldn't make that change separate across a bunch of records rather than do it piece meal?

false -> get_index(Module, DbName, DDoc)
end;
get_index(Module, <<"shards/", _/binary>> = DbName, DDoc)
when is_binary(DbName), is_record(DDoc, doc) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to add is_binary(DbName) guard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None. Will remove it. Probably got pulled in from a different clause or before I added the shards/ prefix check.

@@ -53,8 +53,7 @@ compact(State) ->
{ok, Fd} = couch_mrview_util:open_file(CompactFName),
ESt = couch_mrview_util:reset_index(Db, Fd, State),

{ok, DbReduce} = couch_btree:full_reduce(Db#db.id_tree),
Count = element(1, DbReduce),
{ok, Count} = couch_db:get_doc_count(Db),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a problem. The reduce result could be a tuple of either size 2 or 3 (depending on age of view file). get_doc_count uses match so it would crash for tuples of size 2. See https://github.com/cloudant/dbcore/commit/a452a4ab3a46f88a203d0eaac69317e5069f7b1f#diff-0568f6a6952a18b42d6a57157a930333R255. We've found such views in production IRC (@eiri knows more details).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. We'll want to patch couch_db and then couch_bt_engine for PSE for 3287

@@ -59,11 +59,11 @@
db_uri(#httpdb{url = Url}) ->
couch_util:url_strip_password(Url);

db_uri(#db{name = Name}) ->
db_uri(Name);
db_uri(DbName) when is_binary(DbName) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a clause with when is_list(DbName) guard?
Is there any possibility that couch_db:name would return list instead of binary (for example for fake_db or something).
In such case the current implementation would crash with badmatch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are a lot of places like this. Please ignore it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iilyak For clarity on these sorts of clause ordering changes I was trying to keep fairly consistent with the existing logic. Previously this either accepted a #db{} or a binary. Though the binary function clause/badmatch would have come from binary_to_list rather the db_uri. So there are subtle changes like that but the domain of acceptable values should be identical.

clustered_db(DbName, UserCtx, []).

clustered_db(DbName, UserCtx, SecProps) ->
{ok, #db{name = DbName, user_ctx = UserCtx, security = SecProps}}.
Copy link
Contributor

@nickva nickva May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler just return the Db instead of {ok, Db} since it can never be anything but ok? Or can this return error at some point.

It will simplify some places where we have to use a separate line for {ok, Db} = couch_db:clustered_db(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that to mimic the couch_db:open/2 format. I've not got a huge preference either way though I'd lean towards keeping the same pattern for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense


is_db(#db{}) ->
true;
is_db(_) ->
Copy link
Contributor

@nickva nickva May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a special clause for the PSE db record here? Something like is_tuple() then element(1, Db) =:= db

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narp. There are very few operations that we want to support on shared database records (and eventually I'd like to remove them but didn't want to mix PRs). But its limited on purpose to be name, user_ctx, and security objects that can be read.

StartTime ->
true = ets:insert(couch_dbs, Db),
true = ets:update_element(couch_dbs, DbName, {#entry.db, Db}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ets:update_element will return false if it could not find the entry and it would crash on a bad match, while previously ets:insert always returned true and inserted the entry in the table (even if it wasn't in the table already).

Is there any change the entry could be deleted between the call to ets:lookup and ets:update_element?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. We just asserted it was in the table when we extracted StartTime and this is the only process that modifies the table so it should be there. Or, rather, if its not there then we should crash hard cause ets has lost its mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Makes sense

@davisp davisp force-pushed the COUCHDB-3288-mixed-cluster-upgrade branch 2 times, most recently from 1558cb7 to 18f5226 Compare May 12, 2017 15:33
@davisp
Copy link
Member Author

davisp commented May 12, 2017

Rebased onto master for the new replicator and couch_server work. Also squashed the few requested changes to their respective commits.

@nickva
Copy link
Contributor

nickva commented May 12, 2017

This is a nice cleanup of code even outside the context of PSE. More functions instead of records everywhere, couch_server looks more sane, some new helpers in couch_db module.

I ran couchdyno's replicator tests https://github.com/cloudant-labs/couchdyno/blob/master/README_tests.md

Then a few custom scripts which do concurrent db and doc creations. Here is one of them:

https://gist.github.com/nickva/d0e0813c7039ba824489fb017159f1be

It turns out 500 conflict errors is something we return on db creation sometimes, that was a surprise for me. But master behaves the same, so it's not an issue with this PR

Then with ./dev/run setup killed node3 and re-spawned it with master code and ran some tests on the mixed cluster.

Didn't find any issues during testing

+1

Nice work @davisp

@davisp davisp force-pushed the COUCHDB-3288-mixed-cluster-upgrade branch 6 times, most recently from fc3939c to 56ce55a Compare September 13, 2017 16:13
@davisp
Copy link
Member Author

davisp commented Sep 13, 2017

This has been rebased onto master as of now. Is anyone against me merging this to master in preparation for merging PSE in a week or two? Note that this PR is totes not scary as its mostly just moving code and removing the public definition of the #db{} record.

Copy link
Contributor

@iilyak iilyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wohali
Copy link
Member

wohali commented Sep 20, 2017

@davisp have at it, while we wait for Fauxton to be remediated, now is the perfect time to merge it.

Since we're getting ready to add API functions to couch_db.erl now is a
good time to clean up the exports list so that changes are more easily
tracked.

COUCHDB-3288
These functions were originally implemented in fabric_rpc.erl where they
really didn't belong. Moving them to couch_db.erl allows us to keep the
unit tests intact rather than just removing them now that the #db record
is being made private.

COUCHDB-3288
This removes introspection of the #db record by couch_server. While its
required for the pluggable storage engine upgrade, its also nice to
remove the hacky overloading of #db record fields for couch_server
logic.

COUCHDB-3288
This completes the removal of public access to the db record from the
couch application. The large majority of which is removing direct access
to the #db.name, #db.main_pid, and #db.update_seq fields.

COUCHDB-3288
davisp and others added 3 commits September 27, 2017 14:57
A mixed cluster (i.e., during a rolling reboot) will want to include
this commit in a release before deploying PSE code to avoid spurious
erros during the upgrade.

COUCHDB-3288
This change is to account for differences in the #db record when a
cluster is operating in a mixed version state (i.e., when running a
rolling reboot to upgrade).

There are only a few operations that are valid on #db records that are
shared between nodes so rather than attempt to map the entire API
between the old and new records we're limiting to just the required API
calls.

COUCHDB-3288
Previously attachment uploading from a PSE to non-PSE node would
fail as the attachment streaming API changed between version.

This commit handles downgrading attachment streams from PSE nodes so that
non-PSE nodes can write them.

COUCHDB-3288
@davisp davisp force-pushed the COUCHDB-3288-mixed-cluster-upgrade branch from 56ce55a to 49b05fe Compare September 27, 2017 19:57
@davisp davisp merged commit c07f3f5 into master Sep 27, 2017
@nickva nickva deleted the COUCHDB-3288-mixed-cluster-upgrade branch December 18, 2019 19:46
nickva pushed a commit to nickva/couchdb that referenced this pull request Sep 7, 2022
* minor bump to release notes
* Document Clouseau install process

Co-authored-by: Joan Touzet <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants