Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform test: Token cache is not renewed in AzureRM during tearing down #35361

Closed
joselcaguilar opened this issue Jun 19, 2024 · 6 comments
Closed
Labels
bug new new issue not yet triaged waiting-response An issue/pull request is waiting for a response from the community

Comments

@joselcaguilar
Copy link

joselcaguilar commented Jun 19, 2024

Terraform Version

Terraform: 1.8.5

Terraform Configuration Files

...terraform config...

Debug Output

vgw_integration.tftest.hcl... in progress
  run "check_sku"... pass

vgw_integration.tftest.hcl... tearing down
Terraform encountered an error destroying resources created while executing
vgw_integration.tftest.hcl/check_sku.
╷
│ Error: building account: could not acquire access token to parse claims: clientCredentialsToken: received HTTP status 401 with response: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-06-18T22:49:12.3407908Z, assertion valid from 2024-06-18T22:23:49.0000000Z, expiry time of assertion 2024-06-18T22:33:49.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: *** Correlation ID: *** Timestamp: 2024-06-18 22:49:12Z","error_codes":[700024],"timestamp":"2024-06-18 22:49:12Z","trace_id":"***","correlation_id":"***","error_uri":"[https://login.microsoftonline.com/error?code=700024"}](https://login.microsoftonline.com/error?code=700024%22})
│ 
│   with provider["registry.terraform.io/hashicorp/azurerm"],
│   on main.tf line 16, in provider "azurerm":
│   16: provider "azurerm" {
│ 

Expected Behavior

Renew the AAD token between tests are passed and tearing down the resources in terraform test.

Actual Behavior

The token is expired when we try to use terraform test using azurerm_virtual_network_gateway but it will happen to other resources which require a long time to deploy/destroy. The command terraform test is executed in CI/CD pipeline authenticated with AzCLI 2.61.0.

The test pass successfully but tearing down process fails due to the AAD token expiration because it's not renewed when tearing down process starts.

Steps to Reproduce

  1. terraform test -> The tearing down step doesn't work properly for resources which need a long time to deploy and destroy, so the test fail completely.

Additional Context

It is working as expected when terraform apply and terraform destroy are executed separately in a CI/CD pipeline.

References

No response

@joselcaguilar joselcaguilar added bug new new issue not yet triaged labels Jun 19, 2024
@joselcaguilar joselcaguilar changed the title terraform test: Token cache is not renewed in AzureRM when the resources are tearing down terraform test: Token cache is not renewed in AzureRM during tearing down Jun 19, 2024
@apparentlymart
Copy link
Contributor

Hi @joselcaguilar! Thanks for opening this issue.

You mentioned that you have a working situation when you run plan and apply separately in your automation. Is the token renewal something you implemented yourself in that case, or is it something being handled automatically for you?

If a remote system supports renewing a token then it's typically the provider plugin's responsibility to do that, since Terraform Core itself doesn't know anything about Azure and so doesn't know what its conventions are. With my question above I'm trying to understand whether token renewal is normally handled automatically by the azurerm provider, in which case we could investigate why that isn't working the same way when the azurerm provider is used in the test harness, or whether this renewal is something you are normally handling yourself outside of Terraform in which case I'm not sure how we could adapt that model to the tesr harness which is effectively running multiple plan/apply rounds in series using the same provider configuration each time.

@crw crw added the waiting-response An issue/pull request is waiting for a response from the community label Jun 19, 2024
@tombuildsstuff
Copy link
Contributor

@apparentlymart

With my question above I'm trying to understand whether token renewal is normally handled automatically by the azurerm provider

FWIW the AzureRM Provider does automatically renew tokens as they expire, the only exception to that is when running in Azure DevOps, where the API only returns a token that's valid for 10m from the start of the job/stage has started - as such it'd be worth confirming whether @joselcaguilar is running this from within Azure DevOps or not, since I suspect that's the issue here.

When running terraform apply and terraform destroy separately, they'd (presumably) be separate stages, which get a separate 10m token - which if @joselcaguilar is running inside of Azure DevOps, is why this'd be working.

@joselcaguilar
Copy link
Author

joselcaguilar commented Jun 20, 2024

Hey @apparentlymart @tombuildsstuff! Thanks for reaching out.

Yes, I'm using ADO Pipelines in order to add context, Terraform is being executed using AzureCLI@2 task, the version of AzCLI is 2.61.0 (latest).
I mean, if I execute the same TF config (i.e.- the example in azurerm_virtual_network_gateway) with the following commands:

  • terraform test: Fails due to token is expired during tearing down process. It fails after 26mins. aprox.
  • terraform apply and terraform destroy: In 2 different steps work as expected.

Furthermore, if I execute other resources with less time required to be deployed/destroyed such as Azure Key Vault using this sample config from ADO too, both commands work successfully:

  • terraform test: Work as expected, completely.
  • terraform apply and terraform destroy: Work as expected.

Last but not least, I'm using OIDC auth with a SPN, so, technically I have 1h. for token expiration which is much more than needed to apply and destroy the virtual_network_gateway.
So, my suggestion is to review the terraform test or AzureRM provider, I'm not fully sure who is the responsible here when we want to use terraform test with resources that need enough time to apply and destroy such as Virtual Network Gateway, SQL Managed Instance, etc.

The token is not being refreshed in terraform test between deployment and tearing down process.

@apparentlymart
Copy link
Contributor

Thanks for all of that extra context.

The terraform test harness is doing effectively the same thing as running terraform apply a few times and then running terraform destroy. It does differ in some details, but I don't think those details are relevant to what we are discussing here.

One implication of that is that Terraform should already be reconfiguring the provider for each phase, because each phase gets its own provider plugin processes. The details are different when mocks are involved, but I'm assuming this test configuration is using the real provider or else this token question wouldn't arise.

When Terraform "configures" a provider, it sends the information taken from the provider block and the provider itself decides what to do with it, possibly also incorporating other ambient information like environment variables and configuration files. If that configuration includes enough information for the provider to automatically renew credentials then the provider can do that but Terraform Core is not aware of it and does not participate in it.

If the configuration includes only static credentials that the provider cannot renew then terraform test would provide those same credentials each time the provider gets configured, because it has no other information.

With all that said I find myself unsure about what exactly we could change here to improve the situation you described. If you configure the provider in a way that allows it to renew credentials then it should already work; if that isn't true then figuring out why would be the goal here. If you can't configure the Azure provider in a way that allows it to renew its credentials then that seems to be a show-stopper and I don't think there's anything we could change in Terraform Core to improve that situation, though I'm open to ideas if you think I'm missing something.

@manicminer
Copy link
Member

Hi @joselcaguilar, thanks for providing the additional context. This is unfortunately a known limitation of Azure DevOps pipelines, specifically the Open ID Connect implementation it offers.

When using OIDC for authentication, the Azure Terraform providers receive an ID token which is provided by the vendor (in this case, ADO). We use that ID token to sign an assertion in order to acquire access tokens with which we can authorize requests to various Azure APIs. The ID token is our only means of authentication and so when the access token expires, we need to obtain a new one by sending a new assertion, again signed using the ID token.

However, ID tokens have a validity period (i.e. they have an exp claim), after which any signed assertion will not be accepted. Whilst most vendors either provide a long-lasting ID token, or they provide a mechanism with which to vend new tokens (e.g. GitHub), unfortunately Azure DevOps offers neither - the single ID token that is provided at the start of a pipeline task is valid for only 10 minutes.

Unfortunately, because of this, there is unfortunately no means for Terraform, or any Terraform providers (or any other software), to use OIDC to authenticate for any Azure API beyond this initial 10 minute window. Whilst we do cache access tokens for as long as possible, which typically confers a runtime of around 60 minutes in Azure Public, unfortunately there is no workaround at this time. Our recommendation is not to use OIDC in Azure DevOps until this limitation is lifted, and instead use a service connection or some other statically provisioned credential (e.g. application/service principal with client certificate).

I'm sorry that we don't have better news on this! To make OIDC usable in ADO pipelines will require changes by Microsoft. Should this ultimately be resolved by extending the validity of the ID token, the Azure providers will work without any other changes needed, and if ADO adopts a similar pattern to GitHub Actions, then it's likely the providers will also be able to use that without modification. For any other scenario, we will endeavor to support it and I expect we'll be able to do so quickly.

Thanks again for the details report, however per the above, since this limitation lies with the ADO platform, I'm going to close this issue for the time being. You may be able to get further insight via your Azure Account Manager, or other support channel(s) you might have at your disposal.

@manicminer manicminer closed this as not planned Won't fix, can't repro, duplicate, stale Jun 20, 2024
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug new new issue not yet triaged waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests

5 participants