Add godo's rate limiter configuration & retryable http client #546 #967

DanielHLelis · 2023-03-28T14:59:15Z

Hello, as showed in the #546 issue, for medium to large scale projects it is very common to hit the API rate limits, with in some projects it even preventing the execution of Terraform plan at all.

Here I have implementations of two proposed solutions, one implementing a configuration present in the digitalocean/godo client and other from @tback, as shown in the #820 PR.

The first one is using the Static Rate Limit from godo, adding the configuration option to the schema, applying it in the GodoClient. I also have set the default value to 4 req/s (240 req/m) which is just under the API's 250 req/m limit.

The second one is configuring godo to use the Hashicorp's go-retryablehttp, adding the configuration options to the schema (for max tries, min wait and max wait), with a default max tries of 0 (which keeps the default behaviour of no retries), min wait of 1s (which is the package's default) and max wait of 30s (which is also the package's default). In order to avoid confusion during configuration and keep things simpler, I've set the wait arguments to be floats indicating the time in seconds.

Both solutions are present in separate branches of my fork, feature/godo-rate-limiter and feature/retryable-http, with this PR being a merge of both.

Looking foward to help on resolving this issue and open to help on any adjustments.

DanielHLelis · 2023-03-28T15:26:28Z

Fixed linting issues: imports at digitalocean/config/config.go.

…and-http-retryable

DanielHLelis · 2023-03-28T18:04:24Z

Just noticed that I've accidentally commited the built binnary to git. Made a force push to remove it from history.

…er-and-http-retryable

DanielHLelis · 2023-03-29T15:24:54Z

FYI: I've removed the default rate limit of 4 req/s since it would slow down the plan of smaller projects that don't reach the rate limit often, it is more useful for large projects that always reach the limit, in other cases the retryable approach feels like a better solution.

Still haven't enabled it by default, but I think it could be a good option, doing something like max 12 tries, min wait of 5s and max wait of 10s, as defaults would cover rate limit cases pretty well without prejudicing much the rarer occasions of server errors (5xx).

mikeberezowski · 2023-03-31T19:15:05Z

Thank you for this submission. Our team needs time to review this and review the godo changes needed. We want to ensure we're providing the best experience for our users before we implement backoffs and retries.

lattwood · 2023-04-05T13:50:43Z

@mikeberezowski this issue represents a hard blocker with incredibly unwieldy workarounds, is there any timeline for the review on this, and the review on the godo changes, or would we be better served by reaching out to our Account Executive for that information?

…er-and-http-retryable

…and-http-retryable

DanielHLelis · 2023-04-05T15:23:09Z

I've pushed this fork to the Terraform public registry (mainly for personal use, since it's impacting my current setup). Also just updated the docs for the newly added arguments (had a little coherence issue which I haven't notice at first).

Also, as said before, this is a join of two different fixes, if you want I can open a new PR with only one of these (personally, I feel like the retryable option is the best, since it doesn't impact small projects in any way when enabled).

I'm open to help and contribute in any improvements/fixes you find suitable.

By the way, there was no updates to godo itself, I've used a configuration option for throttling that was already available in the client itself. For the retryable, the same thing applies, since godo allows you to pass the desired HTTP client on the constructor.

andrewsomething · 2023-04-05T17:02:03Z

@DanielHLelis Thank you so much for putting this pull request together. I'll be giving this a closer review, but after some conversation inside the team we are OK with the general approach here. I think that the ideal long-term solution is to implement something at the godo level as it has more knowledge of the DigitalOcean API and will be generally useful, but it is clear that there is demand for this from the community and we do not have a solid timeline for addressing it in godo.

I agree with leaving this as an opt-in for the moment. This will allow the community to validate the approach, and provide feedback that we can use to eventually enable a solution by default. The configuration options are likely things we would want to expose in any solution we use (http_retry_max, http_retry_wait_min, http_retry_wait_max). So we should be able to transparently replace the solution if needed without breaking existing user configs.

One notable drawback to hashicorp/go-retryablehttp is that it does not understand the ratelimit-reset header used by the DigitalOcean API. It only looks for Retry-After for its backoff implementation. So in some situations, rate limiting will still impact users. Though it should still help to smooth out requests that hit the burst limit of 5% of the hourly total per minute.

We do have some manual reties in the Terraform codebase. Before enabling it by default, something we'd want to better understand is how this interacts with existing user configurable timeouts.

Thanks again!

DanielHLelis · 2023-04-05T22:48:06Z

Thanks for the response. About the RateLimit-Reset header, it is possible to make the hashicorp/go-retryablehttp understand the header. To do that, we could overwrite the default backoff strategy with a custom one that parses the header, it would look something like this:

func digitalOceanAPIBackoff(min, max time.Duration, attemptNum int, resp *http.Response) time.Duration {
	if resp != nil && resp.StatusCode == http.StatusTooManyRequests {
		// Retrieve API's Rate Limit Reset unix timestamp
		if s, ok := resp.Header["Ratelimit-Reset"]; ok {
			if resetUnix, err := strconv.ParseInt(s[0], 10, 64); err == nil {
				nowUnix := time.Now().Unix()
				sleep := time.Second * time.Duration(resetUnix-nowUnix)

				log.Printf("[INFO] Reached API Rate Limit, waiting: %s seconds", sleep)

				// Cap sleep time to maximum configured value (to prevent too long wait times for mismatched clocks)
				if sleep > max {
					return max
				}

				// Avoid negative sleep times (generally indicates a mismatched clock)
				if sleep > 0 {
					return sleep
				}
			}
		}
	}

	// Fallback to default backoff strategy
	sleep := retryablehttp.DefaultBackoff(min, max, attemptNum, resp)
	log.Printf("[INFO] API Error (not Rate Limit), waiting: %s seconds", sleep)
	return sleep
}

Then we could configure the client to use it by simply overwriting the Backoff attribute like so:

	...
	retryableClient.RetryWaitMax = time.Duration(c.HTTPRetryWaitMax * float64(time.Second))
	retryableClient.Backoff = digitalOceanAPIBackoff // HERE, replaces the default backoff

	client := retryableClient.StandardClient()
	...

The problem with that is the fact that the header returns the epoch timestamp of the first request to expire in the last 1 hour period, making it unsuitable to track the necessary wait time for the triggers at the 1 minute limits. Here is the branch with a demo of the beforementioned code, feature/retryable-http-custom-backoff, testing it I confirmed this behavior.

About the interaction with other retries inside the code, the retryable client only attempt retries on some client-side errors, HTTP 429 and HTTP 5XX (except for 501). Because of that, it shouldn't interact with any other retries present inside the code, as those usually are done in the 4XX range (mainly 412). I also haven't found retries inside the godo code during the quick look I took at it (so I am not 100% sure about that).

digitalocean/config/config.go

digitalocean/provider.go

andrewsomething · 2023-04-06T20:47:54Z

The problem with that is the fact that the header returns the epoch timestamp of the first request to expire in the last 1 hour period, making it unsuitable to track the necessary wait time for the triggers at the 1 minute limits.

Right. That's part of why there is a somewhat larger project needed on the DigitalOcean side beyond just clients around all of this.

I've kicked the tires on this a bit now, and it is working great to smooth out requests and preventing Terraform from choking on the burst limiting. Let's get this landed! I left a couple small questions/suggestions in line.

…and-http-retryable

DanielHLelis · 2023-04-06T22:37:46Z

I've implemented the first suggestions about completely avoiding the retryable http client when http_retry_max is not greater than 0.

When it comes to the second suggestion about the floats, I didn't implement it yet. Since Terraform will accept integers on float fields, casting them automatically, it shouldn't be an issue on the user side.
If you still prefer it to be an int, just send me a reply so that I change it here.

benyanke · 2023-04-08T23:41:36Z

Do you think it might make sense to read responses from the headers, to guarantee you stay under the limit?

https://docs.digitalocean.com/reference/api/intro/#rate-limit

you can easily see, by reading the right header, how many requests are left before hitting limits. This could enable the user to hit the max allowable apply speed.

DanielHLelis · 2023-04-09T16:03:19Z

Do you think it might make sense to read responses from the headers, to guarantee you stay under the limit?

https://docs.digitalocean.com/reference/api/intro/#rate-limit

you can easily see, by reading the right header, how many requests are left before hitting limits. This could enable the user to hit the max allowable apply speed.

I've tried it, as suggested by @andrewsomething, (there is even a demo up here in the PR), the problem is that those refer to the overall 1-hour rate limit. The most common issue when running the provider is reaching the burst limit of 250 req/m. For that we'd need different headers conveying the information of the burst limit, which aren't currently available.

Overall, the current solution works out well to smooth out the requests and handle the rate limits, with the retryable approach being the most "optimal" in the sense of overall execution times and the throttle being able to ensure consistency (avoiding completely the rate limit) at the cost of increased execution times.

It can be later adapted to support the new headers if and when they become available.

benyanke · 2023-04-10T03:15:25Z

That's a solid point - guess there's not much that can be done without the short-term burst limit.

I like this work overall - I do hope it gets merged. It would make this provider usable again for me.

andrewsomething · 2023-04-11T15:33:55Z

@DanielHLelis Thanks again for all your work on this! We'll be looking to get a release cut with this soon.

DanielHLelis added 3 commits March 28, 2023 11:15

add godo's rate limit configuration to provider

75c80ed

set default rate limit to 240 req/m

a317e41

docs: add requests_per_second argument

c4ec327

DanielHLelis mentioned this pull request Mar 28, 2023

429 Too many requests or 429 API Rate limit exceeded #546

Closed

DanielHLelis added 2 commits March 28, 2023 15:00

use hashicorp's retryablehttp client on godo

a755376

Merge branch 'feature/retryable-http' into feature/godo-rate-limiter-…

e67505b

…and-http-retryable

DanielHLelis force-pushed the feature/godo-rate-limiter-and-http-retryable branch from 9501b6f to e67505b Compare March 28, 2023 18:02

DanielHLelis added 2 commits March 28, 2023 15:27

disable rate limiter by default

ca8f72c

Merge branch 'feature/godo-rate-limiter' into feature/godo-rate-limit…

3a5756f

…er-and-http-retryable

DanielHLelis added 4 commits April 5, 2023 12:10

docs: improve requests per second text

5ab080d

docs: fix and improve http_retry_max description

4ffc16f

Merge branch 'feature/godo-rate-limiter' into feature/godo-rate-limit…

5d62c41

…er-and-http-retryable

Merge branch 'feature/retryable-http' into feature/godo-rate-limiter-…

c634b5b

…and-http-retryable

andrewsomething reviewed Apr 6, 2023

View reviewed changes

digitalocean/config/config.go Outdated Show resolved Hide resolved

digitalocean/provider.go Show resolved Hide resolved

DanielHLelis added 3 commits April 6, 2023 19:25

Merge branch 'main' into feature/retryable-http

55f0c9d

configs: completely disable retryable client when max retries is 0

0bb19f1

Merge branch 'feature/retryable-http' into feature/godo-rate-limiter-…

66da242

…and-http-retryable

andrewsomething self-requested a review April 11, 2023 14:45

andrewsomething approved these changes Apr 11, 2023

View reviewed changes

andrewsomething merged commit f66b7e9 into digitalocean:main Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add godo's rate limiter configuration & retryable http client #546 #967

Add godo's rate limiter configuration & retryable http client #546 #967

DanielHLelis commented Mar 28, 2023 •

edited

Loading

DanielHLelis commented Mar 28, 2023

DanielHLelis commented Mar 28, 2023

DanielHLelis commented Mar 29, 2023

mikeberezowski commented Mar 31, 2023

lattwood commented Apr 5, 2023

DanielHLelis commented Apr 5, 2023 •

edited

Loading

andrewsomething commented Apr 5, 2023

DanielHLelis commented Apr 5, 2023 •

edited

Loading

andrewsomething commented Apr 6, 2023 •

edited

Loading

DanielHLelis commented Apr 6, 2023 •

edited

Loading

benyanke commented Apr 8, 2023

DanielHLelis commented Apr 9, 2023

benyanke commented Apr 10, 2023

andrewsomething commented Apr 11, 2023

Add godo's rate limiter configuration & retryable http client #546 #967

Add godo's rate limiter configuration & retryable http client #546 #967

Conversation

DanielHLelis commented Mar 28, 2023 • edited Loading

DanielHLelis commented Mar 28, 2023

DanielHLelis commented Mar 28, 2023

DanielHLelis commented Mar 29, 2023

mikeberezowski commented Mar 31, 2023

lattwood commented Apr 5, 2023

DanielHLelis commented Apr 5, 2023 • edited Loading

andrewsomething commented Apr 5, 2023

DanielHLelis commented Apr 5, 2023 • edited Loading

andrewsomething commented Apr 6, 2023 • edited Loading

DanielHLelis commented Apr 6, 2023 • edited Loading

benyanke commented Apr 8, 2023

DanielHLelis commented Apr 9, 2023

benyanke commented Apr 10, 2023

andrewsomething commented Apr 11, 2023

DanielHLelis commented Mar 28, 2023 •

edited

Loading

DanielHLelis commented Apr 5, 2023 •

edited

Loading

DanielHLelis commented Apr 5, 2023 •

edited

Loading

andrewsomething commented Apr 6, 2023 •

edited

Loading

DanielHLelis commented Apr 6, 2023 •

edited

Loading