-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CouchDB map indexes on FDB #2060
CouchDB map indexes on FDB #2060
Conversation
src/couch_views/src/couch_views.erl
Outdated
process_args(#{} = Args) -> | ||
Args1 = maps:filter(fun (_, V) -> V /= undefined end, Args), | ||
|
||
maps:merge(#{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming the map inside here as 'Default' would clarify the intention here.
src/couch_views/src/couch_views.erl
Outdated
|
||
|
||
process_args(#{} = Args) -> | ||
Args1 = maps:filter(fun (_, V) -> V /= undefined end, Args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract this into a function for clarity? (remove_undefined_values/1, say).
View#mrview.id_num | ||
end, Views), | ||
|
||
lists:foreach(fun (Doc) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some batching here likely to be important for throughput.
|
||
|
||
get_seq_key(Sig, DbPrefix) -> | ||
erlfdb_tuple:pack({?DB_VIEWS, Sig, ?VIEW_UPDATE_SEQ}, DbPrefix). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and elsewhere, we might want a level of indirection rather than a 16 byte sig value in all keys.
5044095
to
c60306c
Compare
src/couch_views/src/couch_views.erl
Outdated
map_query(Db, DDoc, ViewName, Callback, Acc0, Args0) -> | ||
Args = process_args(Args0), | ||
#{name := DbName} = Db, | ||
{ok, Mrst} = couch_views_util:ddoc_to_mrst(DbName, DDoc), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we call this later in
couch_views_worker
:
get_indexing_info(JobData) ->
#{
<<"db_name">> := DbName,
<<"ddoc_id">> := DDocId
} = JobData,
{ok, Db} = fabric2_db:open(DbName, []),
{ok, DDoc} = fabric2_db:open_doc(Db, DDocId),
{ok, Mrst} = couch_views_util:ddoc_to_mrst(DbName, DDoc),
{ok, Db, Mrst}.
When you create the JobData in couch_views_jobs
, you already have access to Mrst:
create_job_data(Db, Mrst, LastSeq) ->
#{name := DbName} = Db,
#mrst{
idx_name = DDocId
} = Mrst,
#{
db_name => DbName,
ddoc_id => DDocId,
last_seq => LastSeq
}.
Curious why you don't just pass it in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking why not store the Mrst
in fdb for the worker?
tx_db := TxDb | ||
} = State, | ||
|
||
fabric2_db:fold_changes(TxDb, SinceSeq, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is inside a transaction. Is this a requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a requirement. But it is nice to get all the docs that we want to process in 1 transaction.
Thanks everyone for the feedback. I'm updated this PR based on the feedback. |
This exposes a single place where we can check for whether a given database or database name is a replicator or users database.
If a start or end key is not specified we still need to scope the range read to the given `RangePrefix`.
% License for the specific language governing permissions and limitations under | ||
% the License. | ||
|
||
-module(couch_views_encoding). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly this module is a replacement for couch_ejson_compare.erl
. The couch_ejson_compare.erl
is used from other applications as well (mango and fabric). Should it be more generic encoding module?
Feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can definitely be used in mango. How would we make more generic? Everything here would be used the same for any other index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we make more generic?
We could rename the module and move it into another app. I think it could be in one of:
couch
- might not be a good fit because we would have to adderlfdb
into list of applications incouch.app.src
couch_index
- it is a generic abstraction to work with all index types. Might not be a good fit because Paul considers introduction of couch_index a bad move.fabric
- could be a good place since it already includeserlfdb
.
Feel free to ignore this proposal. It is good enough for the first version. We can move it latter when we start using it from mango.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ideas. Lets see how Mango goes and then we can look to change this.
% inclusive_end=false we have to set the | ||
% EndKeyDocId to <<255>> so that we don't | ||
% include matching rows. | ||
[{end_key_gt, {EndKey1, <<255>>}}]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious how hard it is to switch to fdb primitives here. We have erlfdb_key:first_greater_than/1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need another function in erlfdb_key
to support this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iilyak You'll want to check out the code in fabric_fdb:get_fold_opts/2
which translates between CouchDB semantics and FoundationDB semantics.
?assertEqual(Expect, Result2). | ||
|
||
|
||
% should_give_ext_size_seq_indexed_test(Db) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it is commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy from couch_mrview tests. We will have to eventually implement seq_indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think seq_indexes ever worked in clustered couchdb so I don't necessarily know if we'll re-implement them or not. I'd probably remove the test since we'd need to re-implement that feature from scratch if/when we decide to add it back in.
undefined -> <<255>>; | ||
EK2 -> EK2 | ||
undefined -> | ||
<<RangePrefix/binary, 16#FF>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could switch to native FDB primitives here. Not sure how hard it would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm we already use erlfdb_key:first_greater_than(EndKey2)
below. Why do we need 16#FF
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to build this into the erlfdb api to be able to use. But I agree its a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Narps. This is application logic, not fdb logic. first_greater_than gives you the first greater than the key. 16#FF on the end key is preventing the read from going outside the prefix defined range which would be perfectly valid for foundationdb but invalid for our required semantics.
These key ranges are fairly subtle. A lot of the various sorts and range limits were figured out by staring at the actual hex encodings of keys in fdb. This is mostly due to CouchDB having a much more complicated sorting behavior than strict byte comparisons but also with some surprising "Doh!" moments when figuring out when and when not to handle document ids in view rows and the like.
{mod, {couch_views_app, []}}, | ||
{registered, [ | ||
couch_views_sup, | ||
couch_views_worker_server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did search the codebase and couldn't find a place where we register the couch_views_worker_server
name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It was renamed to couch_views_server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by @garrensmith
acc := UserAcc0 | ||
} = Acc, | ||
|
||
% We're asserting there that this row is paired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I don't really understand this function yet. I wanted to mention (so I don't forget to get back to it latter) that we have this comment in two places fold_rev
and fold_fwd
. The thing I don't understand yet is: Why we have it in different clauses:
- in
fold_fwd
we do it when#{next := val}
- in
fold_rev
we do it when#{next := key}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is a little tricky. I've changed it so that it is #{next := value}
for consistency. Though it wasn't a bug. What is actually happening here is that each map index row is saved as two rows in fdb. The first row stores the emitted key as a value and the second row stores the emitted value as the fdb value. So when we go forward through the index, we expect to get the key row first then the value row. THat is what the #{next := key}
is for. When we going in reverse, we expect to get the value first and then the key. So we can only call the user callback once we have both key and value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also added a comment here to explain this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elixir and eunit tests pass with decent coverage.
Tested manually the basics: view updates, start end keys, inclusive end, descending, limits, update=lazy. Range check was a nice addition too.
Nice job @garrensmith and @davisp!
P.S. The only thing to change in the future, not part of this PR, is if we have lots of background indexers, with lots of jobs, say 500+, we might switch to a separate acceptor instead of having all max workers waiting in couch_jobs:accept. So we'd have N acceptors per indexer pod/container/node where N might be less than max_workers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Awesome job @garrensmith and @davisp. It was a great pleasure to read. Very good style.
This adds couch_views which builds map indexes and stores them in FDB. Co-authored-by: Paul J. Davis <[email protected]>
Overview
This adds the
couch_views
application which builds map indexes on top of FoundationDB. The RFC for this is here.This work is dependant on CouchDB Jobs #2051
Testing recommendations
Create map functions and test the results are the same as CouchDB 2.x
Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.ini