Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update retryOnAwsCode to Five Minutes #999

Merged
merged 1 commit into from
Jun 20, 2023
Merged

Update retryOnAwsCode to Five Minutes #999

merged 1 commit into from
Jun 20, 2023

Conversation

danaelhe
Copy link
Member

@danaelhe danaelhe commented Jun 20, 2023

Users have been experiencing errors being returned on successful spaces create requests that take longer than 1 minute. Github issue is #996

The only referenced function in resourceDigitalOceanBucketCreate that's using a 1 minute timeout is the retryOnAwsCode so updating that to 5 minutes to hopefully mitigate the errors on successful bucket creations.

Users have been experiencing errors being returned on successful spaces create requests that take longer than 1 minute. Github issue is #996 

The only child function that `resourceDigitalOceanBucketCreate` using a 1 minute timeout is the `retryOnAwsCode` so updating that to 5 minutes to hopefully mitigate the errors on successful bucket creations.
@andrewsomething
Copy link
Member

andrewsomething commented Jun 20, 2023

This is an odd one... The create method doesn't actually use this helper, and has its own 5 minute timeout for both the create call and polling:

err = resource.RetryContext(ctx, 5*time.Minute, func() *resource.RetryError {
log.Printf("[DEBUG] Trying to create new Spaces bucket: %q", name)
_, err := svc.CreateBucket(input)
if awsErr, ok := err.(awserr.Error); ok {
if awsErr.Code() == "OperationAborted" {
log.Printf("[WARN] Got an error while trying to create Spaces bucket %s: %s", name, err)
return resource.RetryableError(
fmt.Errorf("[WARN] Error creating Spaces bucket %s, retrying: %s", name, err))
}
}
if err != nil {
return resource.NonRetryableError(err)
}
return nil
})
if err != nil {
return diag.Errorf("Error creating Spaces bucket: %s", err)
}
err = resource.RetryContext(ctx, 5*time.Minute, func() *resource.RetryError {
_, err := svc.HeadBucket(&s3.HeadBucketInput{Bucket: aws.String(name)})
if awsErr, ok := err.(awserr.Error); ok {
if awsErr.Code() == "NotFound" {
log.Printf("[DEBUG] Waiting for Spaces bucket to be available: %q, retrying: %v", name, awsErr.Message())
return resource.RetryableError(err)
}
}
if err != nil {
return resource.NonRetryableError(err)
}
return nil
})
log.Println("Bucket created")

But I was actually able to reproduce this finally! Despite the 5 minute timeout shown above, it still times out in 1 minute. Looking at the debug logs:

digitalocean_spaces_bucket.test_space: Creating...
2023-06-20T15:56:33.012-0400 [INFO]  Starting apply for digitalocean_spaces_bucket.test_space
2023-06-20T15:56:33.012-0400 [DEBUG] digitalocean_spaces_bucket.test_space: applying the planned Create change
2023-06-20T15:56:33.014-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:33 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:33.014-0400
2023-06-20T15:56:33.014-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:33 [DEBUG] Trying to create new Spaces bucket: "test-06c3b488-0fa4-11ee-a2a7-43d3c6fea7e3": timestamp=2023-06-20T15:56:33.014-0400
2023-06-20T15:56:34.964-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:34 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:34.963-0400
2023-06-20T15:56:35.068-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 Bucket created: timestamp=2023-06-20T15:56:35.068-0400
2023-06-20T15:56:35.069-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Spaces put bucket ACL: {
  ACL: "private",
  Bucket: "test-06c3b488-0fa4-11ee-a2a7-43d3c6fea7e3"
}: timestamp=2023-06-20T15:56:35.069-0400
2023-06-20T15:56:35.069-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:35.069-0400
2023-06-20T15:56:35.185-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:35.185-0400
2023-06-20T15:56:35.293-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:35.293-0400
2023-06-20T15:56:35.669-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [TRACE] Waiting 500ms before next try: timestamp=2023-06-20T15:56:35.669-0400
2023-06-20T15:56:36.277-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:36 [TRACE] Waiting 1s before next try: timestamp=2023-06-20T15:56:36.277-0400
2023-06-20T15:56:37.389-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:37 [TRACE] Waiting 2s before next try: timestamp=2023-06-20T15:56:37.389-0400
2023-06-20T15:56:39.500-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:39 [TRACE] Waiting 4s before next try: timestamp=2023-06-20T15:56:39.500-0400
digitalocean_spaces_bucket.test_space: Still creating... [10s elapsed]
2023-06-20T15:56:43.617-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:43 [TRACE] Waiting 8s before next try: timestamp=2023-06-20T15:56:43.617-0400
2023-06-20T15:56:51.830-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:51 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:56:51.830-0400
digitalocean_spaces_bucket.test_space: Still creating... [20s elapsed]
2023-06-20T15:57:01.953-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:01 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:01.953-0400
digitalocean_spaces_bucket.test_space: Still creating... [30s elapsed]
2023-06-20T15:57:12.074-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:12 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:12.074-0400
digitalocean_spaces_bucket.test_space: Still creating... [40s elapsed]
2023-06-20T15:57:22.394-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:22 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:22.394-0400
digitalocean_spaces_bucket.test_space: Still creating... [50s elapsed]
2023-06-20T15:57:32.509-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:32 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:32.509-0400
digitalocean_spaces_bucket.test_space: Still creating... [1m0s elapsed]
2023-06-20T15:57:35.295-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:35 [WARN] WaitForState timeout after 1m0s: timestamp=2023-06-20T15:57:35.293-0400
2023-06-20T15:57:35.295-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:35 [WARN] WaitForState starting 30s refresh grace period: timestamp=2023-06-20T15:57:35.293-0400
2023-06-20T15:57:35.295-0400 [ERROR] provider.terraform-provider-digitalocean: Response contains error diagnostic: diagnostic_summary="NoSuchBucket: 
        status code: 404, request id: , host id: " tf_provider_addr=provider tf_rpc=ApplyResourceChange @module=sdk.proto diagnostic_severity=ERROR diagnostic_detail= tf_proto_version=5.3 tf_req_id=27a8b2e6-26c2-4685-8108-15dc9a7b82f4 tf_resource_type=digitalocean_spaces_bucket @caller=/home/asb/projects/go/src/github.com/digitalocean/terraform-provider-digitalocean/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:55 timestamp=2023-06-20T15:57:35.294-0400
2023-06-20T15:57:35.303-0400 [DEBUG] State storage *statemgr.Filesystem declined to persist a state snapshot
2023-06-20T15:57:35.303-0400 [ERROR] vertex "digitalocean_spaces_bucket.test_space" error: NoSuchBucket: 
        status code: 404, request id: , host id:
╷
│ Error: NoSuchBucket: 
│       status code: 404, request id: , host id: 
│ 
│   with digitalocean_spaces_bucket.test_space,
│   on main.tf line 28, in resource "digitalocean_spaces_bucket" "test_space":
│   28: resource "digitalocean_spaces_bucket" "test_space" {
│ 
╵

The polling for the new Spaces bucket is successful. It is the step that is applying the ACLs that fails:

2023-06-20T15:56:35.069-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Spaces put bucket ACL: {
  ACL: "private",
  Bucket: "test-06c3b488-0fa4-11ee-a2a7-43d3c6fea7e3"
}:

With this change, it eventually succeeded after 3m30s. I don't understand why Spaces is returning a successful response to HeadBucket and then a NoSuchBucket for PutBucketAcl, but increasing the time out here seems to work.

@danaelhe
Copy link
Member Author

This is an odd one... The create method doesn't actually use this helper, and has its own 5 minute timeout for both the create call and polling:

err = resource.RetryContext(ctx, 5*time.Minute, func() *resource.RetryError {
log.Printf("[DEBUG] Trying to create new Spaces bucket: %q", name)
_, err := svc.CreateBucket(input)
if awsErr, ok := err.(awserr.Error); ok {
if awsErr.Code() == "OperationAborted" {
log.Printf("[WARN] Got an error while trying to create Spaces bucket %s: %s", name, err)
return resource.RetryableError(
fmt.Errorf("[WARN] Error creating Spaces bucket %s, retrying: %s", name, err))
}
}
if err != nil {
return resource.NonRetryableError(err)
}
return nil
})
if err != nil {
return diag.Errorf("Error creating Spaces bucket: %s", err)
}
err = resource.RetryContext(ctx, 5*time.Minute, func() *resource.RetryError {
_, err := svc.HeadBucket(&s3.HeadBucketInput{Bucket: aws.String(name)})
if awsErr, ok := err.(awserr.Error); ok {
if awsErr.Code() == "NotFound" {
log.Printf("[DEBUG] Waiting for Spaces bucket to be available: %q, retrying: %v", name, awsErr.Message())
return resource.RetryableError(err)
}
}
if err != nil {
return resource.NonRetryableError(err)
}
return nil
})
log.Println("Bucket created")

But I was actually able to reproduce this finally! Despite the 5 minute timeout shown above, it still times out in 1 minute. Looking at the debug logs:

digitalocean_spaces_bucket.test_space: Creating...
2023-06-20T15:56:33.012-0400 [INFO]  Starting apply for digitalocean_spaces_bucket.test_space
2023-06-20T15:56:33.012-0400 [DEBUG] digitalocean_spaces_bucket.test_space: applying the planned Create change
2023-06-20T15:56:33.014-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:33 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:33.014-0400
2023-06-20T15:56:33.014-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:33 [DEBUG] Trying to create new Spaces bucket: "test-06c3b488-0fa4-11ee-a2a7-43d3c6fea7e3": timestamp=2023-06-20T15:56:33.014-0400
2023-06-20T15:56:34.964-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:34 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:34.963-0400
2023-06-20T15:56:35.068-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 Bucket created: timestamp=2023-06-20T15:56:35.068-0400
2023-06-20T15:56:35.069-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Spaces put bucket ACL: {
  ACL: "private",
  Bucket: "test-06c3b488-0fa4-11ee-a2a7-43d3c6fea7e3"
}: timestamp=2023-06-20T15:56:35.069-0400
2023-06-20T15:56:35.069-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:35.069-0400
2023-06-20T15:56:35.185-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:35.185-0400
2023-06-20T15:56:35.293-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Waiting for state to become: [success]: timestamp=2023-06-20T15:56:35.293-0400
2023-06-20T15:56:35.669-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [TRACE] Waiting 500ms before next try: timestamp=2023-06-20T15:56:35.669-0400
2023-06-20T15:56:36.277-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:36 [TRACE] Waiting 1s before next try: timestamp=2023-06-20T15:56:36.277-0400
2023-06-20T15:56:37.389-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:37 [TRACE] Waiting 2s before next try: timestamp=2023-06-20T15:56:37.389-0400
2023-06-20T15:56:39.500-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:39 [TRACE] Waiting 4s before next try: timestamp=2023-06-20T15:56:39.500-0400
digitalocean_spaces_bucket.test_space: Still creating... [10s elapsed]
2023-06-20T15:56:43.617-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:43 [TRACE] Waiting 8s before next try: timestamp=2023-06-20T15:56:43.617-0400
2023-06-20T15:56:51.830-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:51 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:56:51.830-0400
digitalocean_spaces_bucket.test_space: Still creating... [20s elapsed]
2023-06-20T15:57:01.953-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:01 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:01.953-0400
digitalocean_spaces_bucket.test_space: Still creating... [30s elapsed]
2023-06-20T15:57:12.074-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:12 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:12.074-0400
digitalocean_spaces_bucket.test_space: Still creating... [40s elapsed]
2023-06-20T15:57:22.394-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:22 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:22.394-0400
digitalocean_spaces_bucket.test_space: Still creating... [50s elapsed]
2023-06-20T15:57:32.509-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:32 [TRACE] Waiting 10s before next try: timestamp=2023-06-20T15:57:32.509-0400
digitalocean_spaces_bucket.test_space: Still creating... [1m0s elapsed]
2023-06-20T15:57:35.295-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:35 [WARN] WaitForState timeout after 1m0s: timestamp=2023-06-20T15:57:35.293-0400
2023-06-20T15:57:35.295-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:57:35 [WARN] WaitForState starting 30s refresh grace period: timestamp=2023-06-20T15:57:35.293-0400
2023-06-20T15:57:35.295-0400 [ERROR] provider.terraform-provider-digitalocean: Response contains error diagnostic: diagnostic_summary="NoSuchBucket: 
        status code: 404, request id: , host id: " tf_provider_addr=provider tf_rpc=ApplyResourceChange @module=sdk.proto diagnostic_severity=ERROR diagnostic_detail= tf_proto_version=5.3 tf_req_id=27a8b2e6-26c2-4685-8108-15dc9a7b82f4 tf_resource_type=digitalocean_spaces_bucket @caller=/home/asb/projects/go/src/github.com/digitalocean/terraform-provider-digitalocean/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:55 timestamp=2023-06-20T15:57:35.294-0400
2023-06-20T15:57:35.303-0400 [DEBUG] State storage *statemgr.Filesystem declined to persist a state snapshot
2023-06-20T15:57:35.303-0400 [ERROR] vertex "digitalocean_spaces_bucket.test_space" error: NoSuchBucket: 
        status code: 404, request id: , host id:
╷
│ Error: NoSuchBucket: 
│       status code: 404, request id: , host id: 
│ 
│   with digitalocean_spaces_bucket.test_space,
│   on main.tf line 28, in resource "digitalocean_spaces_bucket" "test_space":
│   28: resource "digitalocean_spaces_bucket" "test_space" {
│ 
╵

The polling for the new Spaces bucket is successful. It is the step that is applying the ACLs that fails:

2023-06-20T15:56:35.069-0400 [INFO]  provider.terraform-provider-digitalocean: 2023/06/20 15:56:35 [DEBUG] Spaces put bucket ACL: {
  ACL: "private",
  Bucket: "test-06c3b488-0fa4-11ee-a2a7-43d3c6fea7e3"
}:

With this change, it eventually succeeded after 3m30s. I don't understand why Spaces is returning a successful response to HeadBucket and then a NoSuchBucket for PutBucketAcl, but increasing the time out here seems to work.

Yes, the Spaces response is a head scratcher....but that's great you were able to reproduce it and confirm the change! Thanks Andrew!

@danaelhe danaelhe merged commit 073b8d5 into main Jun 20, 2023
2 checks passed
@danaelhe danaelhe deleted the spaces_retry branch June 20, 2023 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants