Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform 0.14.6, bad provider configuration resolution #27785

Closed
Poil opened this issue Feb 16, 2021 · 4 comments
Closed

Terraform 0.14.6, bad provider configuration resolution #27785

Poil opened this issue Feb 16, 2021 · 4 comments
Labels
bug v0.14 Issues (primarily bugs) reported against v0.14 releases waiting for reproduction unable to reproduce issue without further information

Comments

@Poil
Copy link

Poil commented Feb 16, 2021

Hi,

With Terraform 0.14.6, I have a problem when using the mysql provider in a module, it works at first run, but at second run I have this error during the plan stage Could not connect to server: dial tcp 127.0.0.1:3306: connect: connection refused
With Terraform 0.13.6, I have no problem it works with the same code

Best regards

Terraform Version

Terraform v0.14.6
+ provider registry.terraform.io/hashicorp/azurerm v2.47.0
+ provider registry.terraform.io/hashicorp/random v3.0.1
+ provider registry.terraform.io/terraform-providers/mysql v1.9.0

Terraform Configuration Files

In my stack

module "mysql" {
  source = "/home/xxxx/git/projects/cloud/azure/terraform/modules/db-maria"
(...)

In my module

provider "mysql" {
  alias = "create-users"

  endpoint = format("%s:3306", azurerm_mariadb_server.mariadb_server.fqdn)
  username = local.administrator_login
  password = local.administrator_password

  tls = var.force_ssl
}

resource "random_password" "db_passwords" {
  count = var.create_databases_users ? length(var.databases_names) : 0

  special = false
  length  = 32
}

resource "mysql_user" "users" {
  count = var.create_databases_users ? length(var.databases_names) : 0

  provider = mysql.create-users

  user               = (var.enable_user_suffix ? format("%s_user", var.databases_names[count.index]) : var.databases_names[count.index])
  plaintext_password = random_password.db_passwords[count.index].result
  host               = "%"

  depends_on = [azurerm_mariadb_database.mariadb_db, azurerm_mariadb_firewall_rule.mariadb_fw_rule]
}

resource "mysql_grant" "roles" {
  count = var.create_databases_users ? length(var.databases_names) : 0

  provider = mysql.create-users

  user       = (var.enable_user_suffix ? format("%s_user", var.databases_names[count.index]) : var.databases_names[count.index])
  host       = "%"
  database   = var.databases_names[count.index]
  privileges = ["ALL"]

  depends_on = [mysql_user.users]
}

Steps to Reproduce

  1. terraform init
  2. terraform apply
  3. terraform apply
@Poil Poil added bug new new issue not yet triaged labels Feb 16, 2021
@Poil
Copy link
Author

Poil commented Feb 16, 2021

I also have the same bug if I'm using a submodule from mysql to create user

In my mysql module

provider "mysql" {
  endpoint = format("%s:3306", azurerm_mariadb_server.mariadb_server.fqdn)
  username = local.administrator_login
  password = local.administrator_password

  tls = var.force_ssl
}

module "users" {
  source = "./modules/db-users"

  for_each = toset(var.databases_names)

  enable_user_suffix = false
  user               = each.key
  database           = each.key
}

@apparentlymart
Copy link
Contributor

Hi @Poil! Thanks for reporting this.

Unfortunately the configuration you wrote here is an example of something that Terraform has never reliably supported and should arguably be outright banned, but we've ended up leaving it in a "best effort" state out of a sense of pragmatism given that, as you've seen, it does sometimes work. However, that "sometimes" is very important: whether it works or not depends a lot on what else is going on in the same plan, and so that's why the Provider Configuration documentation simplifies this to say that this is not allowed at all:

You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration).

azurerm_mariadb_server.mariadb_server.fqdn is an example of an attribute that can't be known until azurerm_mariadb_server.mariadb_server has been created for the first time, and so is subject to this limitation.

I expect that you saw this working originally due to the intersection of the following two facts:

  • The authors of the mysql provider have, again out of a sense of pragmatism, attempted to design around the requirement that a provider must be configured to do planning by making the provider defer trying to connect to the given endpoint until as late as possible, and in particular avoid doing anything relying on the MySQL server during the plan step.
  • When you initially apply a configuration like this, there aren't yet any existing resource instances belonging to that provider configuration and so Terraform doesn't make any calls to refresh their state prior to creating the plan. Refreshing is an operation that requires connecting to the MySQL server, so planning a configuration like you showed can only work if there aren't any resource instances to refresh.

The problem here is that these assumptions will not hold if anything you do later on causes azurerm_mariadb_server.mariadb_server.fqdn to become unknown again but the mysql provider configuration still has resource instances associated with it. That then gets Terraform into a bind, because it asks the mysql provider to refresh those prior to planning, which causes the mysql provider to attempt to connect to the server, but then the provider mistakenly tries to connect to 127.0.0.1:3306 because it selects that as a default value of endpoint to use when the chosen endpoint isn't available.

Resolving this will require introducing a new concept into Terraform to handle the situation where particular resource instances can't be planned at all on a particular run, and must be deferred until some other objects have been created first. One such model is described in the long-standing design proposal #4149, but since it's a significant change to Terraform's execution model, and therefore riddled with risk and unknowns, we've not been able to make any progress on it so far and would need to take the time for more detailed research.


In the meantime, I think the best we can do here is to try to determine why upgrading Terraform caused azurerm_mariadb_server.mariadb_server.fqdn to become unknown. I don't believe we intentionally changed anything that should make that true, and so there may be a more specific bug to be fixed related to that if we can figure out how to reproduce it.

One thing we could try in order to hopefully get some more information here is to try to create a targeted plan that excludes the resource instances belonging to the mysql provider configuration, which will hopefully show a successful plan to change azurerm_mariadb_server.mariadb_server which will give a clue as to why it's being updated or replaced:

terraform plan -target=azurerm_mariadb_server.mariadb_server

My hope is that running the above will cause Terraform to successfully produce a partial plan rather than to fail with the error you saw. If so, it would be helpful if you could share that plan output here so we can see what the provider is proposing to change and hopefully also understand why it's doing that.

I'll note also that if you are able to successfully produce a targeted plan then it's likely that you could then apply that plan to escape the error situation you've encountered: the endpoint will then be known again and so a subsequent terraform plan should succeed. As #4149 notes, selective -target is often an effective way to force the sort of deferral behaviors that the issue proposes to do automatically. However, if you do apply the plan then it's unlikely we'll be able to do any further debugging, so this would be a tradeoff between getting the problem fixed quickly for you now vs. having a fuller explanation of the problem that might unearth a bug; ultimately we'll have to leave that tradeoff up to you, because we wouldn't want to ask you to stay in a blocked situation where you can't do your job just to get some more info.

@Poil
Copy link
Author

Poil commented Feb 17, 2021

Hi,

Thanks for your detailed answer.

I downgraded my stack to 0.13.6 now if I switch back to 0.14.6 the plan (with and without targeting) it works.
I will try tomorrow to duplicate my stack with only the AzureRM-MariaDB and MySQL resources.

Best regards,

@jbardin jbardin added waiting for reproduction unable to reproduce issue without further information and removed new new issue not yet triaged labels Mar 10, 2021
@apparentlymart apparentlymart added the v0.14 Issues (primarily bugs) reported against v0.14 releases label Apr 9, 2021
@apparentlymart
Copy link
Contributor

Returning to this old issue a long time later, I realize that I later opened #30937 which covers the same problem this one is describing. Since there's more context over there about what the problem is and how the Terraform team is planning to solve it, I'm going to close this issue to consolidate the discussion there.

@apparentlymart apparentlymart closed this as not planned Won't fix, can't repro, duplicate, stale Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug v0.14 Issues (primarily bugs) reported against v0.14 releases waiting for reproduction unable to reproduce issue without further information
Projects
None yet
Development

No branches or pull requests

3 participants