Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

429 Too many requests or 429 API Rate limit exceeded #546

Closed
chasebolt opened this issue Dec 30, 2020 · 24 comments
Closed

429 Too many requests or 429 API Rate limit exceeded #546

chasebolt opened this issue Dec 30, 2020 · 24 comments
Assignees

Comments

@chasebolt
Copy link

Terraform Version

Terraform v0.13.5
+ provider registry.terraform.io/-/aws v3.22.0
+ provider registry.terraform.io/-/digitalocean v2.3.0
+ provider registry.terraform.io/digitalocean/digitalocean v2.3.0
+ provider registry.terraform.io/hashicorp/aws v3.22.0

Expected Behavior

Terraform to complete successfully

Actual Behavior

Terraform is getting errors back from DO API stating either Too Many Requests or API Rate limit exceeded.

Steps to Reproduce

Manage 100 domain records and it will fail while trying to refresh the state.

Important Factoids

I am managing 10 domains with about 10-20 records each. I have dropped parallelism down to 1 and I still can't get a successful run without hitting one of these two errors. This isn't an absurdly large terraform run, and I am unable to get past just a handful of state refreshes before it errors out.

@zeppelinen
Copy link

+1.
Same problem for me. I need to manage 200+ droplets.

@lfarnell
Copy link
Contributor

Based on the Docs

Requests through the API are rate limited per OAuth token. Current rate limits:

5,000 requests per hour
250 requests per minute (5% of the hourly total)

So one thing you can possibly do is reach out to support and see if they can increase the limit for requests per minute. The only other thing you might want to investigate is looking at breaking down the terraform code into a module per domain so that you won't hit the limit. I realize this might not be possible or suitable for your use case by something to think about.

@joaomnmoreira
Copy link

+1.

Hundreds of domain records (same domain) to manage.

@ghomem
Copy link

ghomem commented Aug 4, 2021

+1

This happens to me while managing a single large domain (~ 150 records)

@atmosx
Copy link

atmosx commented Oct 14, 2021

module.spaces.digitalocean_cdn.assets: Refreshing state... [id=0ac3dd96-3612-4638-a802-10dadc4aecea]
╷
│ Error: Error retrieving DatabaseCluster: GET https://api.digitalocean.com/v2/databases/1cc4ed58-cd77-43c0-9a61-61062b2c12f7: 429 Too many requests
│
│   with module.databases.digitalocean_database_cluster.infra1,
│   on ../../modules/databases/main.tf line 11, in resource "digitalocean_database_cluster" "infra1":
│   11: resource "digitalocean_database_cluster" "infra1" {
│
╵
╷
│ Error: Error retrieving DatabaseCluster: GET https://api.digitalocean.com/v2/databases/33ce4a3f-1760-4e78-8110-5f4b356da009: 429 Too many requests
│
│   with module.databases.digitalocean_database_cluster.router1,
│   on ../../modules/databases/main.tf line 33, in resource "digitalocean_database_cluster" "router1":
│   33: resource "digitalocean_database_cluster" "router1" {
│
╵
╷
│ Error: Error retrieving Kubernetes cluster: GET https://api.digitalocean.com/v2/kubernetes/clusters/d52645d0-93e9-4594-aa9a-7b9dbecef8c5: 429 Too many requests
│
│   with module.kubernetes.digitalocean_kubernetes_cluster.app,
│   on ../../modules/kubernetes/kubernetes.tf line 2, in resource "digitalocean_kubernetes_cluster" "app":
│    2: resource "digitalocean_kubernetes_cluster" "app" {
│
╵
╷
│ Error: Error reading CDN: GET https://api.digitalocean.com/v2/cdn/endpoints/0ac3dd96-3612-4638-a802-10dadc4aecea: 429 Too many requests
│
│   with module.spaces.digitalocean_cdn.assets,
│   on ../../modules/spaces/main.tf line 44, in resource "digitalocean_cdn" "assets":
│   44: resource "digitalocean_cdn" "assets" {
│
╵
Releasing state lock. This may take a few moments...

I'm literally blocked here and I don't understand why. Our tf stack is not big by any means. DO has extremely low rate limits everywhere (e.g. using Loki with spaces is impossible). We're getting random API errors from k8s API, spaces API, CDN API returns 50x every so often. It's getting really frustrating.

@djmaze
Copy link

djmaze commented Oct 14, 2021

Same problem here.

So one thing you can possibly do is reach out to support and see if they can increase the limit for requests per minute.

I just tried that. The answer was simply that they are doing this "to protect the platform". So they won't increase the limits.

I now asked if they could implement at least something like a burst limit which would allow to overreach the rate limit for a short amount of time. That would probably help with use cases such as Terraform.

@atmosx
Copy link

atmosx commented Oct 14, 2021

Same problem here.

So one thing you can possibly do is reach out to support and see if they can increase the limit for requests per minute.

I just tried that. The answer was simply that they are doing this "to protect the platform". So they won't increase the limits.

I now asked if they could implement at least something like a burst limit which would allow to overreach the rate limit for a short amount of time. That would probably help with use cases such as Terraform.

I'm the only one having this problem in a team of three. My coworkers can plan. I have issued tokens that are used by applications, I wonder if they're using the API and I'm hitting a global limit.

Since @djmaze and me are hit by this at the same point in time, I wonder if there's something else going on.

@atmosx
Copy link

atmosx commented Oct 14, 2021

@djmaze works now my limit was reset apparently a few minutes but I'll be keeping an eye. Here's how you can see the rate:

curl -H "Authorization: Bearer $DIGITALOCEAN_ACCESS_TOKEN" -v -I "https://api.digitalocean.com/v2/images?private=true"

...
< ratelimit-limit: 5000
ratelimit-limit: 5000
< ratelimit-remaining: 4998
ratelimit-remaining: 4998
< ratelimit-reset: 1634198529
ratelimit-reset: 1634198529
...

API documentation: https://docs.digitalocean.com/reference/api/api-reference/#section/Introduction/Rate-Limit

@djmaze
Copy link

djmaze commented Oct 14, 2021

I am getting this reproducibly when running our Terraform plan two times in a row during one or two minutes. The first run works, the second one fails. This was always like this and has not changed for me.

@atmosx
Copy link

atmosx commented Oct 14, 2021

Posting this here in case anyone else hits the same issue. We're using a remote secrets store (doppler) and the problem was that the entire team and a few apps were using a common token, fetched by the secrets store.

tback added a commit to tback/terraform-provider-digitalocean that referenced this issue Mar 31, 2022
@gergelypolonkai
Copy link

Today this hit me for the first API request while refreshing terraform state, without me doing anything on DO except a Kubernetes “login”.

I saw @tback creating a possible fix for it; any chance it will be released soon? (The last release is from two days before the fix landed)

@Lavode
Copy link
Contributor

Lavode commented Feb 28, 2023

We too regularly run into the per-minute rate-limiting of API calls. As such we too would greatly appreciate a solution to this. Be it in the form of an option for increased rate limits on DO's side, the Terraform provider automatically retrying failed calls, or even just adding support for rate limiting to the provider. We'd much prefer a deployment taking a few minutes longer, over it potentially failing.

Some background on our use-case, should it help: We regularly spawn on the order of 100 VMs for a few hours at a time, in order to profile distributed applications of ours. These deployments are automated with Terraform. Each VM we provision will lead to multiple API calls, as we also set up DNS records for each, assign it to a project, and so on.

As we regularly get rate limited this then requires multiple calls to terraform apply - spaced apart a few minutes each time - for the stack to be deployed fully. This is inconvenient when doing it manually, and a nightmare in CI.

@tback
Copy link

tback commented Feb 28, 2023

I just installed the provider manually with a provider override and it worked for me as long as it took me to migrate away from DO.

@runako
Copy link

runako commented Mar 3, 2023

+1 Would pay extra to not have to worry about rate-limiting the API, which I use to pay provision resources and therefore pay Digital Ocean more money. If there was an option to pay $X/mo for a much higher/unlimited API, I would choose that option at this moment. My other option is to spend time building rate-limiting into my side of the app, which is going to take way more of my time than $X/mo.

@benyanke
Copy link

I have similarly filed support requests, as my modestly sized terraform stack is encountering these issues. Unfortunately their own terraform provider isn't usable on their platform.

Perhaps if backoff was added to the provider it would slow things down but not impossibly fail?

@DanielHLelis
Copy link
Contributor

I've hit the same problems of rate limits (to the point at which I couldn't even run a single plan anymore) and opened a PR with proposed changes that fixes/mitigate this issue. If anyone would like to take a look at the fork: DanielHLelis/terraform-provider-digitalocean-ratelimit.
It basically limits the number of requests per second and/or uses the retryable HTTP client from Hashicorp to retry the request after an error (like the 422).

To try it you just need to install it and do the override as explained in the CONTRIBUTING.md.

Hoping to get it into mainstream soon.

@benyanke
Copy link

Can you link the PR here too?

@DanielHLelis
Copy link
Contributor

Can you link the PR here too?

Here: #967

@benyanke
Copy link

benyanke commented Apr 8, 2023

Updating here to say that I spent over a week going back and forth with support, and they were not at all helpful. Making me wonder if I should move my DNS elsewhere, if DNS + IAC isn't going to be well supported.

andrewsomething pushed a commit that referenced this issue Apr 11, 2023
)

* add godo's rate limit configuration to provider

* set default rate limit to 240 req/m

* docs: add requests_per_second argument

* use hashicorp's retryablehttp client on godo

* disable rate limiter by default

* docs: improve requests per second text

* docs: fix and improve http_retry_max description

* configs: completely disable retryable client when max retries is 0
@andrewsomething
Copy link
Member

andrewsomething commented Apr 21, 2023

We've just released version 2.28.0 of this provider. It adds experimental support for automatically retrying requests that fail with 429 or 500-level response codes. It can be enabled by setting the DIGITALOCEAN_HTTP_RETRY_MAX environment variable or the http_retry_max argument in the provider configuration.

Please let us know if you have any feedback on this functionality. We will be looking to enable it by default in a future release.

Additionally, it adds support for configuring client-side rate-limiting to enforce quality of service. It can be enabled by setting the DIGITALOCEAN_REQUESTS_PER_SECOND environment variable or the requests_per_second argument in the provider configuration.

Thanks to @DanielHLelis for working with us on this!

@benyanke
Copy link

Thanks! This seems to be working initially - I'll do more testing and report back.

@madalozzo
Copy link

That's amazing @DanielHLelis! I've run the last release with 'requests_per_second' parameter and it appears to be working very well. I think that now I can merge all my terraform in one eheh

@jonathanheilmann
Copy link

I reached the rate limit too, after setting requests_per_second the error has gone. Looks like it's doing its job well :)

@andrewsomething
Copy link
Member

andrewsomething commented Sep 11, 2023

With the recently released version 2.30.0 of this provider, we have now enabled retries by default. Setting the DIGITALOCEAN_HTTP_RETRY_MAX environment variable or the http_retry_max argument in the provider configuration to 0 will disable this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests