Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch split #6

Merged
merged 15 commits into from
Dec 20, 2018
Prev Previous commit
Next Next commit
some change
Signed-off-by: Connor1996 <[email protected]>
  • Loading branch information
Connor1996 committed Oct 25, 2018
commit 49f184aae5638132fb4acfcdefec241a4fe70122
20 changes: 10 additions & 10 deletions text/2018-10-25-batch-split.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Support `BatchSplit` feature that split one Region into multiple Regions at a ti

# Motivation

Current split only split one Region at a time. It may be very slow when sequential write is too fast, namely, split speed can not keep up with write speed. Slow split can lead to large region. In this case, if a snapshot is triggered, it will occupy a lot of IO and make everything slow. Also, large region is hard for hotspot schedule, so it makes performance even worse.
Current split only split one Region at a time. It may be very slow when sequential write is too fast, namely, split speed can not keep up with write speed. Slow split can lead to large region. In this case, if a snapshot is triggered, it will occupy a lot of IO and make everything slow. Also, large region is hard for scheduling hotspot, so it makes performance even worse.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

# Detailed design

Expand Down Expand Up @@ -56,7 +56,7 @@ message ReportBatchSplitResponse {

Add `AskBatchSplit` to replace `AskSplit` , it is called when TiKV produces some split keys for one Region and asks PD to allocate new `region_id` and `peer_id` for that Region. `split_count` in `AskBatchSplitRequest` indicates the number of Region to be generated, and `AskBatchSplitResponse` returns all new allocated ids to TiKV.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

Add `ReportBatchSplit` to replace `ReportBatchSplit`, it is called when TiKV finish splitting Region. `ReportBatchSplitRequest` takes all metas of new generated Region for PD to update its related information.
Add `ReportBatchSplit` to replace `ReportBatchSplit`, it is called when TiKV finish splitting Region. `ReportBatchSplitRequest` takes all metas of new generated Region for PD to update PD's related information.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

For compatibility issue, the old interface is not deleted but set to deprecated.

Expand Down Expand Up @@ -126,18 +126,18 @@ pub trait SplitChecker {
}
```

Then add one config `batch_split_limit` to limit the number of produced split keys at a time. If it is unlimited, for once split check, it scans all over the Region's range, and in some extreme case it would cause performance issue.
Then add one config `batch_split_limit` to limit the number of produced split keys in a batch. If it is unlimited, for once split check, it scans all over the Region's range, and in some extreme case this would cause performance issue.

Now we have four split-checkers: half, key, size, table. SizeChecker and KeysChecker can be rewritten to produce multiple keys, and other checkers' logic stay unchanged.
Now we have four split-checkers: half, keys, size and table. SizeChecker and KeysChecker can be rewritten to produce multiple keys, and other checkers' logic stay unchanged.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

The general logic of SizeChecker and KeysChecker is similiar, the only difference of them is one splits Region based on size and the other splits Region based on the number of keys. So here we mainly describe the logic of SizeChecker:
The general logic of SizeChecker and KeysChecker are similiar, the only difference between them is one splits Region based on size and the other splits Region based on the number of keys. So here we mainly describe the logic of SizeChecker:
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

- before: it scans key-value pairs in a Region's range sequentially to accumlate their size as `total_size` and stops once the size reachs to `region_max_size` or scans to the end of range. If `total_size` is smaller than `region_max_size` at the end, checker wouldn't produce any split key; if not, it regards the very key at which `total_size` reachs to `region_split_size` as split key.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved
- after: it scans key-value pairs in a Region's range sequentially to accumlate their size as `total_size` and stops once the size reachs to `region_split_size * (batch_split_limit-1) + region_max_size` or scans to the end of range. During the scan process, it reocrds the key every `region_split_size` as split keys, but after finish scanning, it may discards the last split key if the size of rest Region doesn't over `region_max_size - region_split_size`. With this algorithm, if `batch_split_limit` is set to 1, TiKV can perfectly behaves as before that split without batch.
- after: it scans key-value pairs in a Region's range sequentially to accumlate their size as `total_size` and stops once the size reachs to `region_split_size * (batch_split_limit-1) + region_max_size` or scans to the end of range. During the scan process, it reocrds the key as split key every `region_split_size`, but after finishing scanning, it may discards the last split key if the size of rest Region doesn't over `region_max_size - region_split_size`. With this algorithm, if `batch_split_limit` is set to 1, TiKV can perfectly behave as before that split without batch.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

### Compatibility concern

The general process in raftstore changes a little, it mainly replaces `Split` with `BatchSplit`. But one thing should be noted, when rolling update PD version control will refuse `AskBatchSplit` request, thus split can't be performed during this process until all TiKV bump to new version. To let TiKV know whether `AskBatchSplit` fail reason is compatibility or not, we introduce a new error type for `ResponseHeader` :
The general process in raftstore changes a little, it mainly replaces `Split` with `BatchSplit`. But one thing should be noted, when rolling upgrade, PD version control will refuse `AskBatchSplit` request, thus split can't be performed during this process until all TiKV bump to new version. To let TiKV know whether `AskBatchSplit` fail for compatibility or not, we introduce a new error type for `ResponseHeader` :
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

```protobuf
enum ErrorType {
Expand All @@ -146,15 +146,15 @@ enum ErrorType {
}
```

So once TiKV gets `AskBatchSplitResponse` with `ErrorType::INCOMPATIBLE_VERSION`, it uses original `AskSplit` instead of `AskBatchSplit`, and all following processes will degrade to original way. So original code path is not deleted.
So once TiKV gets `AskBatchSplitResponse` with `ErrorType::INCOMPATIBLE_VERSION`, it uses original `AskSplit` instead of `AskBatchSplit`, and all following processes will degrade to original way. So original code path is not deleted.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

### Approximate split key

What we said above can ease the problem, however scanning a large Region can also consumes a lot of time and CPU. Tests show that large region can still easily show up even with batch split implemented, although split is speeded up.
What we said above can ease the problem, however scanning a large Region can also consume a lot of time and CPU. Test shows that large Region can still easily show up even with batch split implemented, although split is speeded up.
Connor1996 marked this conversation as resolved.
Show resolved Hide resolved

When a Region becomes large enough, it's more practical to divide it into smaller chunks quickly. This can be achieved via size estimation, which can be calculated from SST properties. Although it may not be accurate enough, it's okay for a large Region.

So if the size of Region is larger than `region_max_size * batch_split_limit * 2`, TiKV will use approximate way to produce split key. The approximate way is quite similar to the algorithm we describe above, but to estimate TiKV uses approximate size of the Region and the number of keys in the Region's range to calculate the average distance between two SST property keys, and produces a split key every `region_split_size / distance` keys.
So if the size of Region is larger than `region_max_size * batch_split_limit * 2`, TiKV uses approximate way to produce split keys. The approximate way is quite similar to the algorithm we describe above, but to estimate TiKV uses approximate size of the Region and the number of keys in the Region's range to calculate the average distance between two SST property keys, and produces a split key every `region_split_size / distance` keys.

# Drawbacks

Expand Down