Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform Test hanging during execution #35380

Closed
novekm opened this issue Jun 25, 2024 · 8 comments
Closed

Terraform Test hanging during execution #35380

novekm opened this issue Jun 25, 2024 · 8 comments
Labels
bug new new issue not yet triaged waiting-response An issue/pull request is waiting for a response from the community

Comments

@novekm
Copy link

novekm commented Jun 25, 2024

Terraform Version

Terraform v1.8.2
on darwin_arm64

Terraform Configuration Files

# Create a unit and e2e test that creates the minimum required resources and validates that the names of the users and groups match the expected values

# HINT: Make sure to run `terraform init` in this directory before running `terraform test`. Also, ensure you use constant values (e.g. string, number, bool, etc.) within your tests where at all possible or you may encounter errors.

variables {
  sso_groups = {
    TestGroup1 : {
      group_name        = "TestGroup1"
      group_description = "TestGroup1 IAM Identity Center Group"
    },
    TestGroup2 : {
      group_name        = "TestGroup2"
      group_description = "Test IAM Identity Center Group"
    },
  }

  // Create desired USERS in IAM Identity Center
  sso_users = {
    TestUser1 : {
      group_membership = ["TestGroup1", "TestGroup2", ]
      user_name        = "TestUser1"
      given_name       = "Test"
      family_name      = "User1"
      email            = "[email protected]"
    },
    TestUser2 : {
      group_membership = ["TestGroup2", ]
      user_name        = "TestUser2"
      given_name       = "Test"
      family_name      = "User2"
      email            = "[email protected]"
    },
  }

  // Create permissions sets backed by AWS managed policies
  permission_sets = {
    AdministratorAccess = {
      description          = "Provides AWS full access permissions.",
      session_duration     = "PT4H", // how long until session expires - this means 4 hours. max is 12 hours
      aws_managed_policies = ["arn:aws:iam::aws:policy/AdministratorAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
    ViewOnlyAccess = {
      description          = "Provides AWS view only permissions.",
      session_duration     = "PT3H", // how long until session expires - this means 3 hours. max is 12 hours
      aws_managed_policies = ["arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
  }

  // Assign users/groups access to accounts with the specified permissions
  account_assignments = {
    TestGroup1 : {
      principal_name  = "TestGroup1"                              // name of the user or group you wish to have access to the account(s)
      principal_type  = "GROUP"                                   // entity type (user or group) you wish to have access to the account(s). Valid values are "USER" or "GROUP"
      permission_sets = ["AdministratorAccess", "ViewOnlyAccess"] // permissions the user/group will have in the account(s)
      account_ids = [                                             // account(s) the group will have access to. Permissions they will have in account are above line
        "286510435300",
        # local.account2_account_id,
        # local.account3_account_id, // these are defined in a locals.tf file, example is in this directory
        # local.account4_account_id,
      ]
    },
    TestGroup2 : {
      principal_name  = "TestGroup2"
      principal_type  = "GROUP"
      permission_sets = ["ViewOnlyAccess"]
      account_ids = [
        "286510435300",
        # local.account2_account_id,
        # local.account3_account_id,
        # local.account4_account_id,
      ]
    },
  }
}

run "unit_tests" {
  command = plan

  # Check that the group_name for the "TestGroup1" group starts with "TestGroup1"
  assert {
    condition     = startswith(aws_identitystore_group.sso_groups["TestGroup1"].display_name, "TestGroup1")
    error_message = "The Identity Store Group name (${aws_identitystore_group.sso_groups["TestGroup1"].display_name}  didn't match the expected value."
  }

  # Check that the user_name for the "TestUser1" user starts with "TestUser1"
  assert {
    condition     = startswith(aws_identitystore_user.sso_users["TestUser1"].user_name, "TestUser1")
    error_message = "The Identity Store user name (${aws_identitystore_user.sso_users["TestUser1"].user_name}  didn't match the expected value."
  }
}

run "e2e_tests" {
  command = apply

  # Check that the group_name for the "TestGroup1" group starts with "TestGroup1"
  assert {
    condition     = startswith(aws_identitystore_group.sso_groups["TestGroup1"].display_name, "TestGroup1")
    error_message = "The Identity Store Group name (${aws_identitystore_group.sso_groups["TestGroup1"].display_name}  didn't match the expected value."
  }

  # Check that the user_name for the "TestUser1" user starts with "TestUser1"
  assert {
    condition     = startswith(aws_identitystore_user.sso_users["TestUser1"].user_name, "TestUser1")
    error_message = "The Identity Store user name (${aws_identitystore_user.sso_users["TestUser1"].user_name}  didn't match the expected value."
  }
}

Debug Output

(condensed)
2024-06-24T19:57:20.508-0400 [TRACE] provider.terraform-provider-aws_v5.52.0_x5: Served request: tf_provider_addr=registry.terraform.io/hashicorp/aws tf_rpc=ApplyResourceChange @module=sdk.proto tf_proto_version=5.6 tf_req_id=d95bab45-e5b2-67ca-fb3d-aa7c10d5382c tf_resource_type=aws_ssoadmin_permission_set @caller=github.com/hashicorp/terraform-plugin-go@v0.23.0/tfprotov5/tf5server/server.go:878 timestamp=2024-06-24T19:57:20.508-0400
2024-06-24T19:57:20.508-0400 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/aws" is in the global cache
2024-06-24T19:57:20.508-0400 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for aws_ssoadmin_permission_set.pset["AdministratorAccess"]
2024-06-24T19:57:20.508-0400 [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: removing state object for aws_ssoadmin_permission_set.pset["AdministratorAccess"]
2024-06-24T19:57:20.508-0400 [TRACE] vertex "aws_ssoadmin_permission_set.pset[\"AdministratorAccess\"] (destroy)": visit complete
2024-06-24T19:57:20.508-0400 [TRACE] vertex "data.aws_ssoadmin_instances.sso_instance (destroy)": starting visit (*terraform.NodeDestroyResourceInstance)
2024-06-24T19:57:20.508-0400 [TRACE] vertex "data.aws_ssoadmin_instances.sso_instance (destroy)": belongs to
2024-06-24T19:57:20.508-0400 [TRACE] NodeDestroyResourceInstance: removing state object for data.aws_ssoadmin_instances.sso_instance
2024-06-24T19:57:20.508-0400 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)": starting visit (*terraform.graphNodeCloseProvider)
2024-06-24T19:57:20.508-0400 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)": does not belong to any module instance
2024-06-24T19:57:20.508-0400 [TRACE] GRPCProvider: Close
2024-06-24T19:57:20.508-0400 [TRACE] vertex "data.aws_ssoadmin_instances.sso_instance (destroy)": visit complete
2024-06-24T19:57:20.509-0400 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-24T19:57:20.515-0400 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.52.0/darwin_arm64/terraform-provider-aws_v5.52.0_x5 pid=92706
2024-06-24T19:57:20.515-0400 [DEBUG] provider: plugin exited
2024-06-24T19:57:20.515-0400 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)": visit complete
2024-06-24T19:57:20.515-0400 [TRACE] vertex "root": starting visit (*terraform.nodeCloseModule)
2024-06-24T19:57:20.515-0400 [TRACE] vertex "root": does not belong to any module instance
2024-06-24T19:57:20.515-0400 [TRACE] vertex "root": visit complete
2024-06-24T19:57:20.516-0400 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/aws" is in the global cache
2024-06-24T19:57:20.516-0400 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/awscc" is in the global cache
2024-06-24T19:57:20.516-0400 [DEBUG] TestFileRunner: completed apply for tests/create_users_and_groups_unit_and_e2e_tests_p2.tftest.hcl/e2e_tests
tests/create_users_and_groups_unit_and_e2e_tests_p2.tftest.hcl... fail

Failure! 2 passed, 2 failed.
2024-06-24T19:57:20.517-0400 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-24T19:57:20.517-0400 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-24T19:57:20.521-0400 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.52.0/darwin_arm64/terraform-provider-aws_v5.52.0_x5 pid=92600
2024-06-24T19:57:20.521-0400 [DEBUG] provider: plugin exited
2024-06-24T19:57:20.522-0400 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.52.0/darwin_arm64/terraform-provider-aws_v5.52.0_x5 pid=92663
2024-06-24T19:57:20.522-0400 [DEBUG] provider: plugin exited

Expected Behavior

Successful execution of the tests

Actual Behavior

terraform test is hanging during execution

❯ tft
tests/create_users_and_groups_unit_and_e2e_tests_p1.tftest.hcl... in progress
  run "unit_tests"... pass

I'm not sure if this is potential due to cancelling during the test. How does Terraform test handle state management and when a user has to cancel a test? I'm assuming it would work the same as cancelling an apply, which would leave the resources in the AWS account.

Steps to Reproduce

  1. terraform test

Additional Context

For additional context, this test and the test that I cancelled (due to it stalling) are being run in the same AWS account. How does Terraform test handle state management and when a user has to cancel a test?

Also here is the directory structure:

├── data.tf
├── examples
│   ├── create-users-and-groups
│   └── existing-users-and-groups
├── locals.tf
├── main.tf
├── outputs.tf
├── providers.tf
├── tests
│   ├── create_users_and_groups_unit_and_e2e_tests_p1.tftest.hcl
│   └── create_users_and_groups_unit_and_e2e_tests_p2.tftest.hcl
└── variables.tf

5 directories, 8 files

References

No response

@novekm novekm added bug new new issue not yet triaged labels Jun 25, 2024
@novekm
Copy link
Author

novekm commented Jun 25, 2024

@liamcervante I think you helped me with some Terraform test stuff in the past - do you have any ideas why this might be happening?

@novekm
Copy link
Author

novekm commented Jun 25, 2024

Update: I tried to test in a different project that was also having the same issue. The tests point to examples that are in an examples directory. This is the configuration of the test files:

01_create_users_and_groups.tftest.hcl

# run "unit_test" {
#   command = plan
#   module {
#     source = "./examples/create-users-and-groups"
#   }
# }

# run "e2e_test" {
#   command = apply
#   module {
#     source = "./examples/create-users-and-groups"
#   }
# }

02_existing_users_and_groups.tftest.hcl

# run "unit_test" {
#   command = plan
#   module {
#     source = "./examples/existing-users-and-groups"
#   }
# }

# run "e2e_test" {
#   command = apply
#   module {
#     source = "./examples/existing-users-and-groups"
#   }
# }

03_inline_policy.tftest.hcl

# run "unit_test" {
#   command = plan
#   module {
#     source = "./examples/inline-policy"
#   }
# }

# run "e2e_test" {
#   command = apply
#   module {
#     source = "./examples/inline-policy"
#   }
# }

04_google_workspace.tftest.hcl

run "unit_test" {
  command = plan
  module {
    source = "./examples/google-workspace"
  }
}

run "e2e_test" {
  command = apply
  module {
    source = "./examples/google-workspace"
  }
}

I tried the following steps to resolve:

  1. Log into the target AWS account and manually destroy the resources that were created before cancelling the test. I figured these were likely causing conflicts during subsequent runs of terraform test
  2. Assume an IAM role for a different AWS account and re-run terraform test. This got past the hang, and failed as I expected because in the test, I'm trying to using a data source for a resource that doesn't existing in the other AWS account
  3. Finally, assume IAM role for original AWS account and re-run terraform test

The following steps resolves the issue where it was hanging, however now it looks look terraform test is trying to still run the tests that are commented out. As listed in the above code snippets, the code for the first 3 tests are commented out, leaving only the last test. I assumed this would only run the last test (did this while troubleshooting) however the following is displayed in the terminal when running terraform test:

❯ tft
tests/01_create_users_and_groups.tftest.hcl... in progress
tests/01_create_users_and_groups.tftest.hcl... tearing down
tests/01_create_users_and_groups.tftest.hcl... pass
tests/02_existing_users_and_groups.tftest.hcl... in progress
tests/02_existing_users_and_groups.tftest.hcl... tearing down
tests/02_existing_users_and_groups.tftest.hcl... pass
tests/03_inline_policy.tftest.hcl... in progress
tests/03_inline_policy.tftest.hcl... tearing down
tests/03_inline_policy.tftest.hcl... pass
tests/04_google_workspace.tftest.hcl... in progress
  run "unit_test"... pass
  run "e2e_test"... pass
tests/04_google_workspace.tftest.hcl... tearing down
tests/04_google_workspace.tftest.hcl... pass

Success! 2 passed, 0 failed.

Is this expected? I assumed that tests with commented out/blank file contents would just be skipped. As nothing should be created in that case, the tearing down message is a bit confusing

@novekm
Copy link
Author

novekm commented Jun 25, 2024

Update: after removing the comments from the above tests, it's hanging again (has been 20+min since running terraform test again):

❯ tft
tests/01_create_users_and_groups.tftest.hcl... in progress
  run "unit_test"... pass
  run "e2e_test"... pass
tests/01_create_users_and_groups.tftest.hcl... tearing down
tests/01_create_users_and_groups.tftest.hcl... pass
tests/02_existing_users_and_groups.tftest.hcl... in progress
  run "unit_test"... pass
  run "e2e_test"... pass
tests/02_existing_users_and_groups.tftest.hcl... tearing down
tests/02_existing_users_and_groups.tftest.hcl... pass
tests/03_inline_policy.tftest.hcl... in progress
tests/03_inline_policy.tftest.hcl... tearing down
tests/03_inline_policy.tftest.hcl... pass
tests/04_google_workspace.tftest.hcl... in progress
  run "unit_test"... pass
  run "e2e_test"... pass
tests/04_google_workspace.tftest.hcl... tearing down
tests/04_google_workspace.tftest.hcl... pass

Success! 6 passed, 0 failed.
❯ tft
tests/01_create_users_and_groups.tftest.hcl... in progress
  run "unit_test"... pass
  run "e2e_test"... pass
tests/01_create_users_and_groups.tftest.hcl... tearing down
tests/01_create_users_and_groups.tftest.hcl... pass
tests/02_existing_users_and_groups.tftest.hcl... in progress
  run "unit_test"... pass
  run "e2e_test"... pass
tests/02_existing_users_and_groups.tftest.hcl... tearing down
tests/02_existing_users_and_groups.tftest.hcl... pass
tests/03_inline_policy.tftest.hcl... in progress
  run "unit_test"... pass

It's stuck on the tests/03_inline_policy.tftest.hcl test. Initially it was hanging on the first test (tests/01_create_users_ang_groups.tftest.hcl any ideas what could cause tests to hang? It would be helpful if it showed something like elapsed time to more accurately determine how long tests are running

@liamcervante
Copy link
Member

liamcervante commented Jun 25, 2024

Hi @novekm, are you able to share the Terraform configuration files that are being tested? I'd have a better idea at what might be happening then.

When cancelling a test Terraform does attempt to clean up any resources that it created, but if you request a hard-cancel (that is pressing ctrl-c twice) then it'll quite without waiting for confirmation that everything was able to be deleted. In that case it should print a list of resources that it couldn't confirm were actually deleted.

Terraform Test isn't particularly clever in the way it executes - you can replicate the behaviour of the test command simply by executing terraform apply yourself. You could try executing the troublesome configuration directly, using the arguments you've provided in the test file. If running terraform apply manually works, then we can say for sure that terraform test is doing something to cause a problem, otherwise running terraform apply might reveal something more about what is happening.

@liamcervante liamcervante added the waiting-response An issue/pull request is waiting for a response from the community label Jun 25, 2024
@novekm
Copy link
Author

novekm commented Jun 25, 2024

Hi Liam, here are all the configuration files that are gun tested: https://github.com/aws-ia/terraform-aws-iam-identity-center/tree/main/examples and there's the inline policy configuration:

data "aws_organizations_organization" "org" {}

# Create Inline Policy
# IMPORTANT - This policy has an explicit deny. This is used as an example only.
# Ensure you understand the impact of this policy before deploying.
data "aws_iam_policy_document" "restrictAccessInlinePolicy" {
  statement {
    sid = "Restrict"
    actions = [
      "*",
    ]
    effect = "Deny"
    resources = [
      "*",
    ]
    condition {
      test     = "NotIpAddress"
      variable = "aws:SourceIp"
      values = [
        // replace with your own IP address
        "0.0.0.0/0",
      ]
    }
    condition {
      test     = "Bool"
      variable = "aws:ViaAWSService"
      values = [
        "false"
      ]
    }
    condition {
      test     = "StringNotLike"
      variable = "aws:userAgent"
      values = [
        "*exec-env/CloudShell*"
      ]
    }
  }
}

# locals {
#   active_accounts = [for a in data.aws_organizations_organization.org.accounts : a if a.status == "ACTIVE"]
#   tags = {
#     "Owner" = "SRE Team"
#   }
# }


module "aws-iam-identity-center" {
  source = "../.." // local example
  # source = "aws-ia/iam-identity-center/aws" // remote example

  existing_sso_groups = {
    AWSControlTowerAdmins : {
      group_name = "AWSControlTowerAdmins"
    }
  }

  sso_groups = {
    Admin : {
      group_name        = "Admin"
      group_description = "Admin Group"
    },
    Dev : {
      group_name        = "Dev"
      group_description = "Dev Group"
    },
  }
  sso_users = {
    nuzumaki : {
      group_membership = ["Admin", "Dev", "AWSControlTowerAdmins"]
      user_name        = "nuzumaki"
      given_name       = "Naruto"
      family_name      = "Uzumaki"
      email            = "[email protected]"
    },
    suchiha : {
      group_membership = ["Dev", "AWSControlTowerAdmins"]
      user_name        = "suchiha"
      given_name       = "Sasuke"
      family_name      = "Uchiha"
      email            = "[email protected]"
    },
  }

  existing_permission_sets = {
    AWSAdministratorAccess : {
      permission_set_name = "AWSAdministratorAccess"
    },
  }

  permission_sets = {
    AdministratorAccess = {
      description          = "Provides full access to AWS services and resources",
      session_duration     = "PT3H",
      aws_managed_policies = ["arn:aws:iam::aws:policy/AdministratorAccess"]
      inline_policy        = data.aws_iam_policy_document.restrictAccessInlinePolicy.json
      tags                 = { ManagedBy = "Terraform" }
    },
    PowerUserAccess = {
      description          = "Provides full access to AWS services and resources, but does not allow management of Users and groups",
      session_duration     = "PT3H",
      aws_managed_policies = ["arn:aws:iam::aws:policy/PowerUserAccess"]
      tags                 = { ManagedBy = "Terraform" }
    },
    ViewOnlyAccess = {
      description          = "This policy grants permissions to view resources and basic metadata across all AWS services",
      session_duration     = "PT3H",
      aws_managed_policies = ["arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]
      managed_policy_arn   = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"

      permissions_boundary = {
        managed_policy_arn = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"
      }
      tags = { ManagedBy = "Terraform" }
    },
    ReadOnlyAccess = {
      description          = "This policy grants permissions to view resources and basic metadata across all AWS services",
      session_duration     = "PT3H",
      aws_managed_policies = ["arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"]

      managed_policy_arn = "arn:aws:iam::aws:policy/job-function/ViewOnlyAccess"
      tags               = { ManagedBy = "Terraform" }
    },
  }
  account_assignments = {
    Admin : {
      principal_name = "Admin"
      principal_type = "GROUP"
      principal_idp  = "INTERNAL"
      permission_sets = [
        "AdministratorAccess",
        "PowerUserAccess",
        "ViewOnlyAccess",
        // existing permission set
        "AWSAdministratorAccess",
      ]
      account_ids = [
        // replace with your own account id
        local.account1_account_id,
        # local.account2_account_id
        # local.account3_account_id
        # local.account4_account_id
      ]
    },
    Dev : {
      principal_name = "Dev"
      principal_type = "GROUP"
      principal_idp  = "INTERNAL"
      permission_sets = [
        "PowerUserAccess",
        "ViewOnlyAccess",
      ]
      account_ids = [
        // replace with your own account id
        local.account1_account_id,
        # local.account2_account_id
        # local.account3_account_id
        # local.account4_account_id
      ]
    },
  }
}

I'll rerun Terraform apply and post the outcome

@novekm
Copy link
Author

novekm commented Jun 25, 2024

Hi @liamcervante, taking another look at this now. I ran terraform apply against the inline policy test and it was hanging there as well. It looks like when the test was cancelled, there was still a few duplicate resources present in the account that needed to be destroyed. I have destroyed the resources and now the test appears to complete successfully.

Related to this, is there a cli command to manually output a like of the resources that must be manually destroyed if you have to cancel a terraform test? Also, is there a way to run the test with elapsed time, as is standard when running terraform apply?

@liamcervante
Copy link
Member

is there a cli command to manually output a like of the resources that must be manually destroyed if you have to cancel a terraform test?

Thanks for following up! Unfortunately, the destroy failure story isn't the best at the moment so the only way to really discover the left over resources is to look at the output of the test command that was cancelled. This is something that is in active discovery as we're looking at a better way to handle this. You can follow this issue for updates on this. In addition, you can get in touch with our Product Manager directly if you have any ideas about how you'd want this to work.

Also, is there a way to run the test with elapsed time, as is standard when running terraform apply

There isn't a way to do this currently, but I think this should be a fairly straight forward request. Could you file this as a separate feature request in this repository? That way we can include it in our planning and prioritisation.

I think I can close this issue given you've resolved the problematic behaviour and the desired behaviour is captured in another ticket. Let me know if there's anything that I've missed!

@liamcervante liamcervante closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2024
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug new new issue not yet triaged waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests

2 participants