-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use couch_rate
application for couch_view
#2662
Use couch_rate
application for couch_view
#2662
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given the code a decent read over and I think I understand the general design - but I could easily have missed some critical requirements for this. The concept behind couch_rater design is really good and will be really useful. I'm not 100% sure about the overall api though.
When I look at the couch_views_indexer file it feels to me like the couch_views_indexer is having to understand too much of how the rate_limiter works.
From what I understand a lot of the requirements this design is similar to how couch_eval
works and I wonder if it might be worth considering changing it a bit to match that design. In my mind I would want the couch_rate for couch_views to be something like:
// Creates new rate limiter. Internally couch_rate would look at the config to get the rate_limiter module for couch_views as well as know it needs to use the ets one for it.
RateLimiterState = couch_rate:new(couch_views);
// later for budget:
Limit = couch_rate:budget(RateLimitState),
// This would then internally use the ets module to get the latest state and use the correct rate liimt algorithm
That way couch_views has no idea how couch_rate works and it doesn't need to. But I also could be missing something for why we can't do that or should do that.
The other thing is how would we turn the rate_limiter off? If someone is happy for all views to go full speed even if it affects other views how would we do that?
src/couch_rate/src/couch_rate_pd.erl
Outdated
Result | ||
end; | ||
_ -> | ||
error("canot find limiter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/canot/cannot
src/couch_views/README.md
Outdated
* `window_size` - how many batches to consider in contention detector | ||
* `timer` - this is used for testing to fast forward time `fun() -> current_time_in_ms() end` | ||
* `target` - the amount in msec which we try to maintain for batch processing time | ||
* `underload_threshold` - a threshold bellow which we would try to increase the budget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bellow/below
src/couch_rate/src/couch_rate.erl
Outdated
|
||
-spec new(id(), module(), map()) -> couch_rate:limiter(). | ||
new(Id, Module, Options) -> | ||
#?LIMITER{id = Id, module = Module, state = Module:new(Id, Options)}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest what is the idea behind the #?LIMITER
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use couch_rate
here instead of #?LIMITER
. Previously the record was defined in couch_rate_limiter
module. It was too long to type. Usually in cases when record names are to long and given the fact that record is only used locally. People (on other erlang projects) tend to use #?MODULE
. I don't like that because MODULE
has nothing to do with record name. While I did review of passage code I noticed how author solves the same problem (see https://github.com/sile/passage/blob/master/src/passage_span.erl#L83). I stole that idea and use it when it suits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thanks for the explanation. That makes sense.
Are you proposing to make
How we would define default options for the case when rate_limiter_opts key is not configured? Would we disable rate_limiter?
Are you worried about configuration aspect or API aspect here? Because we would have to use
You would set |
So I was more thinking around improving the api. I would like everything that happens here: to be in |
Links in github appear to be broken. Can you provide the file name and line number of the fragment you refer to? |
@garrensmith I think I fixed all problems and fully incorporated your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new api is looking really great. I really like it. I've left a few comments but I think we almost there. Nice work
result() | ||
| {error, term()}. | ||
|
||
update(Id, ok, Fun) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ok
? Shouldn't it be the store state? I don't quite follow this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is a store state as returned from https://github.com/apache/couchdb/pull/2662/files/6ffebb3f932db171403aa7768a6c87038c765c07#diff-e3d46220ad28a76b3f69c12cfadda22bR52. I'll change ok
to _StoreState
and update the type spec for create_if_missing
and new
since it should be -> state().
instead of -> ok.
@garrensmith I updated the PR to include all your suggestions but one. The #2662 (comment) would be hard to implement and it would make API a bit ugly. |
@garrensmith The formatting issue should be fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 nice work
da24d74
to
c4ce375
Compare
@garrensmith I added one more commit since you last review. This 7fac8d1 fixes the case when budget is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes and then good to go.
|
||
|
||
fold_changes(State) -> | ||
#{ | ||
view_seq := SinceSeq, | ||
limit := Limit, | ||
tx_db := TxDb | ||
tx_db := TxDb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for changing this around?
key_size_limit() -> | ||
config:get_integer("couch_views", "key_size_limit", ?KEY_SIZE_LIMIT). | ||
|
||
|
||
value_size_limit() -> | ||
config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this style change
couch_rate:wait(Limiter), | ||
update(Db, Mrst0, State0); | ||
Limit -> | ||
{Mrst2, State4} = update_txn(Db, Mrst0, State0#{limit => Limit, limiter => Limiter}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this to {Mrst1, State1}
@garrensmith updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. +1
aed11de
to
19674a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
dfd6fd1
to
85f81d8
Compare
test Stats : 65 : %{rw_ratio: 0.125, target: 3900, write_time: 100} (optimal)statistics
raw data
|
test Stats : 108 : %{rw_ratio: 0.05, target: 3900, write_time: 100} (overloaded)stats
raw data
|
test Stats : 22 : %{rw_ratio: 1.0, target: 400, write_time: 100} (underloaded)statistic
raw data
|
Overview
Current implementation uses global configuration option to determine the batch size for a background index build. This approach is sub-optimal because it is not taking the design doc write amplification factors into account. The amount of writes and therefore transaction time depends on multiple factors, such as:
This makes every database unique. Which requires independent control of a batch size. The idea is to implement a feedback control algorithm (rate limiter) to control batch size and delay between batches.
Testing recommendations
Simulation results
The test suite uses time simulation to get performance results without waiting forever. This statistics can be enabled using the following:
Checklist
rel/overlay/etc/default.ini