Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch split #6

Merged
merged 15 commits into from
Dec 20, 2018
Prev Previous commit
Next Next commit
simplify
Signed-off-by: Connor1996 <[email protected]>
  • Loading branch information
Connor1996 committed Nov 28, 2018
commit c193676320c1ceec22be31cd0d250daf4db621fc
36 changes: 9 additions & 27 deletions text/2018-10-25-batch-split.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,33 +130,15 @@ old command type `Split` still needs to be preserved.

### How to produce multiple split keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not overspecify the details in the RFC as doing so can bury the reviewer in minutiae.

IMO, this section is too detailed and not necessary.


This part mainly focuses on `SplitChecker`.

First of all, adjust `trait` so that it can return multiple split keys.

```rust
pub trait SplitChecker {
// ...

// before: fn split_key(&mut self) -> Option<Vec<u8>>
fn split_keys(&mut self) -> Vec<Vec<u8>>;

// before: fn approximate_split_key(&self, _: &Region, _: &DB) -> Result<Option<Vec<u8>>>
fn approximate_split_keys(&self, _: &Region, _: &DB) -> Result<Vec<Vec<u8>>>;
}
```

Then add one config `batch_split_limit` to limit the number of produced split
keys in a batch. If it is unlimited, for a once split check, it scans all over
the Region's range, and in some extreme case, this would cause performance issue.

Now we have four split-checkers: half, keys, size, and table. SizeChecker and
KeysChecker can be rewritten to produce multiple keys, and other checkers'
logic stays unchanged.

The general logic of SizeChecker and KeysChecker are similar. The only
difference between them is one splits Region based on size and the other splits
Region based on the number of keys. So here we mainly describe the logic of
First introducing one config `batch_split_limit` to limit the number of produced
split keys in a batch. If it is unlimited, for a once split check, it scans all
over the Region's range, and in some extreme case, this would cause performance
issue.

SizeChecker and KeysChecker can be rewritten to produce multiple keys, and the
general logic of SizeChecker and KeysChecker are similar. The only difference
between them is one splits Region based on size and the other splits Region
based on the number of keys. So here we mainly describe the logic of
SizeChecker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the following two paragraphs are too verbose and not necessary.
These explanations are more suitable as comments in the code than in the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the algorithm should discussed in RFC stage.


- before: it scans key-value pairs in a Region's range sequentially to
Expand Down