Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DigitalOcean API errors handling #794

Open
wojtekregis opened this issue Feb 18, 2022 · 5 comments
Open

DigitalOcean API errors handling #794

wojtekregis opened this issue Feb 18, 2022 · 5 comments

Comments

@wojtekregis
Copy link

Is your feature request related to a problem? Please describe.

In recent weeks, api.digitalocean.com is rather unstable, often showing Cloudflare's HTML document with HTTP 504 code. The day before yesterday is was out of service for several hours.

I have opened multiple tickets only to be thanked for patience and understanding time and again, and asked to provide more logs despite the problem being reproducible from DigitalOcean's own virtual machines by sending requests to DigitalOcean's API which is proxied by Cloudflare.
I have no reason not to believe DO's support (Team Lead) blaming Cloudflare for this problem but it's been weeks since this statement and the errors are still very much present if not more frequent.
The Terraform provider does not handle HTML pages thrown by api.digitalocean.com well and in most cases such errors result in broken state requiring manual labor. In extreme cases, state using "local" backend was completely gone from ext4 fs.

Describe the solution you'd like

The provider should be capable of handling API errors or HTML responses in such way that Terraform state stays consistent with already deployed resources.

Describe alternatives you've considered

I have opened multiple tickets with DigitalOcean regarding API instability and waited close to 4 weeks for a solution.

Additional context

When Cloudflare is able to connect to DigitalOcean and API responds with HTTP 500, state is saved.

2022-02-07T11:04:59.525Z [INFO]  provider.terraform-provider-digitalocean_v2.16.0: 2022/02/07 11:04:59 [DEBUG] DigitalOcean API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 500 Internal Server Error
Content-Length: 59
Cf-Cache-Status: DYNAMIC
Cf-Ray: xxx-yyy
Content-Type: application/json
Date: Mon, 07 Feb 2022 11:04:59 GMT
Expect-Ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4412
Ratelimit-Reset: xxx
Server: cloudflare
Set-Cookie: __cf_bm=xxx
X-Gateway: Edge-Gateway
X-Request-Id: xxx
X-Response-From: service

{
 "id": "Internal Server Error",
 "message": "Server Error"
}
@gammons
Copy link

gammons commented Sep 22, 2022

hi @wojtekregis - apologies, I just recently saw this issue. Do you know if this is still an issue? Is it with a specific request that we could try on our end to replicate? 🙇

@cnunciato
Copy link
Contributor

cnunciato commented Nov 16, 2022

FWIW, I've hit it 5xx errors multiple times today trying to create DB clusters:

  digitalocean:index:DatabaseCluster (cluster):
    error: 1 error occurred:
        * Error creating database cluster: POST https://api.digitalocean.com/v2/databases: 504 <!DOCTYPE html>
    <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
    <!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
    <!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
    <!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
    <head>
    ...

@keithgg
Copy link

keithgg commented Nov 28, 2022

Adding a 👍 to this issue. It's been worse recently.

Usually, you're able to bypass the issue after waiting 10 minutes or so, but nowadays the issue is not only more persistent, but lasts longer as well.

I experience the issue mostly with the Databases API (MongoDB in particular).

@dscain
Copy link

dscain commented Feb 1, 2023

I am seeing the same issue when trying to deploy a new App since yesterday. Is it possibly related to the following? #808

The state also gets updated normally.

`-----------------------------------------------------: timestamp=2023-02-01T12:51:28.254Z
2023-02-01T12:51:28.980Z [INFO] provider.terraform-provider-digitalocean_v2.26.0: 2023/02/01 12:51:28 [WARN] Invalid log level: "1". Defaulting to level: TRACE. Valid levels are: [TRACE DEBUG INFO WARN ERROR]: timestamp=2023-02-01T12:51:28.980Z
2023-02-01T12:51:28.981Z [INFO] provider.terraform-provider-digitalocean_v2.26.0: 2023/02/01 12:51:28 [DEBUG] DigitalOcean API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 500 Internal Server Error
Content-Length: 59
Cf-Cache-Status: DYNAMIC
Cf-Ray: xxx
Content-Type: application/json
Date: Wed, 01 Feb 2023 12:51:28 GMT
Ratelimit-Limit: 5000
Ratelimit-Remaining: 4990
Ratelimit-Reset: xxx
Server: cloudflare
Set-Cookie: xxx
X-Gateway: Edge-Gateway
X-Request-Id: xxx
X-Response-From: service

{
"id": "Internal Server Error",
"message": "Server Error"
}`

@dscain
Copy link

dscain commented Feb 1, 2023

So I just found that in my case the issue was the I was trying to deploy an App for which the spec had a "service" where the "repository" of the "image" property was set to one for which an image was not existent. In this case, after fixing this in the .tf file, it was possible to deploy without 500. I will log this issue in a separate ticket. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants