-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add godo's rate limiter configuration & retryable http client #546 #967
Add godo's rate limiter configuration & retryable http client #546 #967
Conversation
Fixed linting issues: imports at |
9501b6f
to
e67505b
Compare
Just noticed that I've accidentally commited the built binnary to git. Made a force push to remove it from history. |
FYI: I've removed the default rate limit of 4 req/s since it would slow down the plan of smaller projects that don't reach the rate limit often, it is more useful for large projects that always reach the limit, in other cases the retryable approach feels like a better solution. Still haven't enabled it by default, but I think it could be a good option, doing something like max 12 tries, min wait of 5s and max wait of 10s, as defaults would cover rate limit cases pretty well without prejudicing much the rarer occasions of server errors (5xx). |
Thank you for this submission. Our team needs time to review this and review the godo changes needed. We want to ensure we're providing the best experience for our users before we implement backoffs and retries. |
@mikeberezowski this issue represents a hard blocker with incredibly unwieldy workarounds, is there any timeline for the review on this, and the review on the godo changes, or would we be better served by reaching out to our Account Executive for that information? |
…er-and-http-retryable
…and-http-retryable
I've pushed this fork to the Terraform public registry (mainly for personal use, since it's impacting my current setup). Also just updated the docs for the newly added arguments (had a little coherence issue which I haven't notice at first). Also, as said before, this is a join of two different fixes, if you want I can open a new PR with only one of these (personally, I feel like the retryable option is the best, since it doesn't impact small projects in any way when enabled). I'm open to help and contribute in any improvements/fixes you find suitable. By the way, there was no updates to godo itself, I've used a configuration option for throttling that was already available in the client itself. For the retryable, the same thing applies, since godo allows you to pass the desired HTTP client on the constructor. |
@DanielHLelis Thank you so much for putting this pull request together. I'll be giving this a closer review, but after some conversation inside the team we are OK with the general approach here. I think that the ideal long-term solution is to implement something at the godo level as it has more knowledge of the DigitalOcean API and will be generally useful, but it is clear that there is demand for this from the community and we do not have a solid timeline for addressing it in godo. I agree with leaving this as an opt-in for the moment. This will allow the community to validate the approach, and provide feedback that we can use to eventually enable a solution by default. The configuration options are likely things we would want to expose in any solution we use ( One notable drawback to We do have some manual reties in the Terraform codebase. Before enabling it by default, something we'd want to better understand is how this interacts with existing user configurable timeouts. Thanks again! |
Thanks for the response. About the func digitalOceanAPIBackoff(min, max time.Duration, attemptNum int, resp *http.Response) time.Duration {
if resp != nil && resp.StatusCode == http.StatusTooManyRequests {
// Retrieve API's Rate Limit Reset unix timestamp
if s, ok := resp.Header["Ratelimit-Reset"]; ok {
if resetUnix, err := strconv.ParseInt(s[0], 10, 64); err == nil {
nowUnix := time.Now().Unix()
sleep := time.Second * time.Duration(resetUnix-nowUnix)
log.Printf("[INFO] Reached API Rate Limit, waiting: %s seconds", sleep)
// Cap sleep time to maximum configured value (to prevent too long wait times for mismatched clocks)
if sleep > max {
return max
}
// Avoid negative sleep times (generally indicates a mismatched clock)
if sleep > 0 {
return sleep
}
}
}
}
// Fallback to default backoff strategy
sleep := retryablehttp.DefaultBackoff(min, max, attemptNum, resp)
log.Printf("[INFO] API Error (not Rate Limit), waiting: %s seconds", sleep)
return sleep
} Then we could configure the client to use it by simply overwriting the ...
retryableClient.RetryWaitMax = time.Duration(c.HTTPRetryWaitMax * float64(time.Second))
retryableClient.Backoff = digitalOceanAPIBackoff // HERE, replaces the default backoff
client := retryableClient.StandardClient()
... The problem with that is the fact that the header returns the epoch timestamp of the first request to expire in the last 1 hour period, making it unsuitable to track the necessary wait time for the triggers at the 1 minute limits. Here is the branch with a demo of the beforementioned code, feature/retryable-http-custom-backoff, testing it I confirmed this behavior. About the interaction with other retries inside the code, the retryable client only attempt retries on some client-side errors, HTTP 429 and HTTP 5XX (except for 501). Because of that, it shouldn't interact with any other retries present inside the code, as those usually are done in the 4XX range (mainly 412). I also haven't found retries inside the godo code during the quick look I took at it (so I am not 100% sure about that). |
Right. That's part of why there is a somewhat larger project needed on the DigitalOcean side beyond just clients around all of this. I've kicked the tires on this a bit now, and it is working great to smooth out requests and preventing Terraform from choking on the burst limiting. Let's get this landed! I left a couple small questions/suggestions in line. |
I've implemented the first suggestions about completely avoiding the retryable http client when When it comes to the second suggestion about the floats, I didn't implement it yet. Since Terraform will accept integers on float fields, casting them automatically, it shouldn't be an issue on the user side. |
Do you think it might make sense to read responses from the headers, to guarantee you stay under the limit? https://docs.digitalocean.com/reference/api/intro/#rate-limit you can easily see, by reading the right header, how many requests are left before hitting limits. This could enable the user to hit the max allowable apply speed. |
I've tried it, as suggested by @andrewsomething, (there is even a demo up here in the PR), the problem is that those refer to the overall 1-hour rate limit. The most common issue when running the provider is reaching the burst limit of 250 req/m. For that we'd need different headers conveying the information of the burst limit, which aren't currently available. Overall, the current solution works out well to smooth out the requests and handle the rate limits, with the retryable approach being the most "optimal" in the sense of overall execution times and the throttle being able to ensure consistency (avoiding completely the rate limit) at the cost of increased execution times. It can be later adapted to support the new headers if and when they become available. |
That's a solid point - guess there's not much that can be done without the short-term burst limit. I like this work overall - I do hope it gets merged. It would make this provider usable again for me. |
@DanielHLelis Thanks again for all your work on this! We'll be looking to get a release cut with this soon. |
Hello, as showed in the #546 issue, for medium to large scale projects it is very common to hit the API rate limits, with in some projects it even preventing the execution of Terraform plan at all.
Here I have implementations of two proposed solutions, one implementing a configuration present in the
digitalocean/godo
client and other from @tback, as shown in the #820 PR.The first one is using the Static Rate Limit from
godo
, adding the configuration option to the schema, applying it in theGodoClient
. I also have set the default value to 4 req/s (240 req/m) which is just under the API's 250 req/m limit.The second one is configuring
godo
to use the Hashicorp'sgo-retryablehttp
, adding the configuration options to the schema (for max tries, min wait and max wait), with a default max tries of 0 (which keeps the default behaviour of no retries), min wait of 1s (which is the package's default) and max wait of 30s (which is also the package's default). In order to avoid confusion during configuration and keep things simpler, I've set the wait arguments to be floats indicating the time in seconds.Both solutions are present in separate branches of my fork,
feature/godo-rate-limiter
andfeature/retryable-http
, with this PR being a merge of both.Looking foward to help on resolving this issue and open to help on any adjustments.