Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fdb b+tree reduce #3018

Closed
wants to merge 1 commit into from
Closed

Conversation

garrensmith
Copy link
Member

Overview

Reduce on FDB using ebtree. This is a new reduce implementation that uses the ebtree to do builtin reduce.
This requires that #3017 is merged first.

Testing recommendations

Related Issues or Pull Requests

Checklist

@garrensmith garrensmith changed the base branch from master to prototype/fdb-layer July 21, 2020 11:56
@garrensmith garrensmith force-pushed the fdb-btree-reduce branch 3 times, most recently from 736b4c6 to a0df7ba Compare July 22, 2020 14:32


create_val(Key, Val) ->
KeySize = erlang:external_size(Key),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is intended to be the billing size, it should be the couch_ejson_size:encoded_size/1 function like currently. check with eric who changed the external size away from external_size the last time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've followed how the sizing is calculated in couch_views_indexer. There is a note there to change to couch_ejson_size:encoded_size/. So we should probably change both in a separate PR.

@garrensmith garrensmith marked this pull request as ready for review July 23, 2020 07:41
@garrensmith
Copy link
Member Author

This PR is ready for a first review. Some notes about the design:

  1. I don't store the map values in the b-tree. I only store the reduce results. That means the leaf nodes contain the size of the reduce values and the reduce values. I didn't want do store duplicates of the map k/v in the b-tree that doesn't make any sense to me.

  2. I have a separate b-tree for each reduce function. So if we have a map function that has multiple reduce functions each reduce is in its own b-tree. I've done this to limit the size of the k/v's and nodes in the b-tree so that we are less likely to exceed any of FDB's k/v size limits when we have a higher order for a b-tree.

  3. I don't have very good tests yet for the size calculation. I'm still trying to decide the best approach for that.

Implements built in reduces using ebtree.
Copy link
Member

@rnewson rnewson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few comments on the code itself but my main comments are on the design.

your 1 and 2 strike me as the wrong approach though I think I see your reasoning. The first items claim to "store the reduce results" is misleading to the casual reader as, without something like ebtree and its storage of data on inner nodes, it's not possible to calculate useful intermediation reductions. What I think you're doing is reducing the k-v's emitted by a single document? If so, that is not something that CouchDB has done to date and seems to have limited value, certainly in the common case that a map function emits one row per document.

A simple design that uses less space over all would be to insert the emitted keys and values directly into ebtree and pass in a reducer function that calculates each of the desired reductions specified, and store those in a tuple or a map. couch_views can then call ebtree's functions for lookup (?key=), ranges (for ?startkey=X&endkey=Y&reduce=false) and the various reduce functions as needed (group=true, group_level=X, reduce=true). This ensures we only store each distinct thing once and the logic gets much simpler.

As for exceeding the fdb size limits, that is a valid concern and we must tackle it head on. Once we're able to test this more easily (i.e, after this PR) we'll need to figure out the useful ranges for Order. I suspect we will also add the "chunking" that we do for doc bodies, that is, splitting a #node across multiple k-v entries if they get excessively large. I note that Adam has posted on the dev list with some thoughts about moving some of the node state out into their own rows. This occurred to me throughout the development of ebtree and is certainly worth attempting. The tree itself should be small, it's a design choice in ebtree that the emitted keys, values and intermediate reductions are stored inside the fdb value of the inner node. That can be changed fairly easily if warranted.

Finally, on insertion generally, if it's possible to do even a small amount of batching we'll reap considerable performance rewards. For the case where we update the view atomically with the document, obviously that can't happen, but for new indexes it would be good if we would update 10 or 20 documents per fdb txn that involves an ebtree. Insert performance would be approximately 10x / 20x faster.

tx := Tx
} = TxDb,

[View] = lists:filter(fun(View) -> View#mrview.id_num == ViewId end, Views),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

View = lists:keyfind(ViewId, #mrview.id_num, Views), is the idiomatic way to do this.

end.


% The reduce values are stored as keys in the b-tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify here? I think the reduce value you've calculated outside of ebtree is the reduction over the k-v's emitted by a single document?

@garrensmith
Copy link
Member Author

Thanks for taking a look @rnewson

your 1 and 2 strike me as the wrong approach though I think I see your reasoning. The first items claim to "store the reduce results" is misleading to the casual reader as, without something like ebtree and its storage of data on inner nodes, it's not possible to calculate useful intermediation reductions. What I think you're doing is reducing the k-v's emitted by a single document? If so, that is not something that CouchDB has done to date and seems to have limited value, certainly in the common case that a map function emits one row per document.

A simple design that uses less space over all would be to insert the emitted keys and values directly into ebtree and pass in a reducer function that calculates each of the desired reductions specified, and store those in a tuple or a map. couch_views can then call ebtree's functions for lookup (?key=), ranges (for ?startkey=X&endkey=Y&reduce=false) and the various reduce functions as needed (group=true, group_level=X, reduce=true). This ensures we only store each distinct thing once and the logic gets much simpler.

I'm currently doing this, if a document emits the following from a map function:

([1, 1], 1)
([1, 1], 2)
([3, 3], 1)

Then the reduce results for a _sum would be:

([1, 1], 3)
([3, 3], 1)

I would then store those values like this in ebtree

(([1, 1], doc_id), (KVSize, 3))
(([3, 3], doc_id), (KVSize, 1))

I know this is different to how CouchDB had done in the previously. But I don't see what we would gain from storing the map index a second time. Querying the current map index should be faster than reading from the b-tree since we can do a range scan of the map index. I'm really not comfortable using ebtree for the map index. We would need to determine how performant it would be to do that and what advantages we would get. I would really like to keep ebtree just for the reduce part. I'm very cautious on the idea of using it outside of that. I think if we start loosing a lot of the functionality we get with FDB.

Finally, on insertion generally, if it's possible to do even a small amount of batching we'll reap considerable performance rewards. For the case where we update the view atomically with the document, obviously that can't happen, but for new indexes it would be good if we would update 10 or 20 documents per fdb txn that involves an ebtree. Insert performance would be approximately 10x / 20x faster.

This is already done in couch_views_indexer.

@davisp davisp mentioned this pull request Aug 12, 2020
4 tasks
@davisp davisp mentioned this pull request Sep 18, 2020
4 tasks
@wohali wohali changed the base branch from prototype/fdb-layer to main October 21, 2020 18:09
@garrensmith
Copy link
Member Author

No longer needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants