Add "wait" and "retry" deployment options #1013

rshariy · 2020-11-26T01:53:57Z

ARM template deployment often fails with errors like:

"Another operation is in progress on the selected item. If there is an in-progress operation, please retry after it has finished."

"BMSUserErrorObjectLocked","message":"Another operation is in progress on the selected item."

Just to clarity - this is not a dependency issue. ARM deployment may fail if ,for example, you try to add a VM to an RSV and there is another VM being added at the same time: for a few seconds RSV will not accept new clients and as the result your deployment will fail.

Would like to have an option to pause deployment and/or retry it - may be introduce the "wait" and "retry" deployment conditions, i.e:

resource blob 'Microsoft.Storage/storageAccounts/blobServices/containers@2019-06-01' = {
    wait: 30
    retry: 5
    name: '${stg.name}/default/logs'
}

The text was updated successfully, but these errors were encountered:

alex-frankel · 2020-11-30T21:12:45Z

Understood. This is something we have been considering, but haven't scheduled the work yet. If you (or others) have other examples that you have run into, it would be great to capture those here.

I know RBAC replication (and replication delays in general) are another place where something like this would be helpful.

anthony-c-martin · 2020-12-02T17:34:37Z

I know RBAC replication (and replication delays in general) are another place where something like this would be helpful.

@alex-frankel I'm assuming this is something we're planning on also addressing in the underlying platform? This feels like a leaky abstraction, not something that the end-user should have to deal with by adding delays.

alex-frankel · 2020-12-02T17:37:09Z

This feels like a leaky abstraction, not something that the end-user should have to deal with by adding delays.

Agreed. @bmoore-msft and I were also discussing this yesterday. Ideally, ARM will co-locate all the calls end-to-end so a user never has to think about this. Not sure if/when that will be possible, and this may be a necessary evil in the meantime.

bmoore-msft · 2020-12-02T22:27:09Z

The OP doesn't sound like replication (feels like concurrency) though I could see that you could potentially address both with something like retry. The problem in this case (or either really case) is indefinite postponement. This feels like a problem with the RP - common operations returning frequent 400s instead of maybe 429.

The challenge with this workaround is not only does the user have to fail, then implement a non-deterministic work around (that's expensive on the service) it will mask problems with across ARM, RPs and user code.

@rshariy - have you raised this issue with the RSV team? It doesn't appear to be an uncommon problem and seems like it should be addressed by the RSV... either it shouldn't happen or we're not helping customer figure out how to effectively use RSV.

rshariy · 2020-12-02T22:59:27Z

@bmoore-msft I raised a similar issue with the Azure Firewall product team about a year ago - the only solution we found is to use a PowerShell function to check Azure FW status (make sure it is not "updating") before kicking-off new ARM deployment to FW.

Just logged ticket 120120226003381 about the RSV issue - lets see what MS support will come up with.

alex-frankel · 2020-12-03T00:29:56Z

it will mask problems with across ARM, RPs and user code.

this point is what gives us caution on implementing something like this. We have some potential solutions to deal with the replication delay in particular that we will explore before introducing a wait.

@rshariy - please let us know the resolution of the case.

Agazoth · 2021-03-31T05:29:27Z

I have a main template that looks like this:

module kv 'keyvault.bicep' = {
  name: 'kvSmoketestDeploy'
  scope: rg
  params: {
    keyVaultName: keyVaultName
    enableSoftDelete: false
  }
}

module kvaccpol 'keyvaultaccesspolicy.bicep' = {
  name: 'kvAccPolSmoketestDeploy'
  scope: rg
  params: {
    keyVaultName: keyVaultName
    action: 'add'
    objectId: objectId
    access: keyVaultAccessPolicyAccess
  }
}

When that runs, the deployment breaks with:

{
   "error": {
     "code": "ParentResourceNotFound",
     "message": "Can not perform requested operation on nested resource. Parent resource 'kv-kvaccpoltest' not found."
   }
} (Code:NotFound)

Running the deployment again, deploys the policy

eja-git · 2021-04-14T21:23:24Z

I ran into a scenario where I'd like a wait, not much code to show, basically deploying a FunctionApp, then want to output the default key for use in Api Management. The problem is the function app takes some time to spin up before the app keys are present...

resource functionApp 'Microsoft.Web/sites@2020-06-01' = {
  name: functionAppName
  location: location
  kind: 'functionapp'
...

output functionappdefaultkey string = listKeys('${functionApp.id}/host/default', functionApp.apiVersion).functionKeys.default

Workaround is to run the initial deployment of the function app twice.

bmoore-msft · 2021-04-19T15:57:20Z

@eja-git this isn't a "wait" scenario, it's bug in the deployment engine job scheduling... the listKeys job is scheduled too early... so that's the fix for your particular scenario.

Pietervanhove · 2021-07-01T11:51:51Z

Hi,

I've logged the following issue projectkudu/kudu#3312 (comment) that could also benefit from the wait option during a deployment.

Best Regards
Pieter

azMantas · 2021-10-01T09:41:52Z

I am trying to simplify firewall rule collection deploying by using loadTextContent and then loop from each variable. workload-x.json contains all properties for rule collection.

var workloads = [
  json(loadTextContent('./workload-1.json'))
  json(loadTextContent('./workload-2.json'))
  json(loadTextContent('./workload-3.json'))
]

resource afwPolicy 'Microsoft.Network/firewallPolicies@2021-02-01' existing = {
  name: 'bicepRules'
}

resource collectionGroups 'Microsoft.Network/firewallPolicies/ruleCollectionGroups@2021-02-01' = [for workload in workloads: {
  name: workload.name
  parent: afwPolicy
  properties: workload.properties
}]

here is the error I get

Rule Collection Group workload-2 can not be updated because Parent Firewall Policy bicepRules is in Updating state from previous operation

I am sure that a short delay between deployments would help us to loop through all array

SenthuranSivananthan · 2021-10-01T10:55:03Z

Only one Rule Collection Group can be updated at a time with Azure Firewall Policy. Since the update refreshes all of the connected Azure Firewall instances, the amount of time it takes to update is non-deterministic. Therefore you will need to serialize the deployment using the batchSize decorator.

Can you try:

@batchSize(1)
resource collectionGroups 'Microsoft.Network/firewallPolicies/ruleCollectionGroups@2021-02-01' = [for workload in workloads: {
  name: workload.name
  parent: afwPolicy
  properties: workload.properties
}]

SQLDBAWithABeard · 2021-10-01T11:57:47Z

I have two scenarios that come to mind from recent experience.

Overarching enterprise management level policy being applied to a resource that has been created which I reference in next resource/module causing the Another Operation error. A retry would be useful here as I have no control or influence over the Policies.

I have also faced situations where a newly created resource is not available when referenced immediately afterwards which I assume is a replication/caching issue as the next run works flawlessly.

wsucoug69 · 2021-11-08T15:08:25Z

My scenario includes creating a Cosmos Account, this typically takes a few minutes and sometimes up to 10 minutes. In this case I am unable to use the resource output to set the connection string for use in subsequent modules e.g. passing into keyVault and functionAppSettings

alex-frankel · 2021-11-08T16:16:45Z

My scenario includes creating a Cosmos Account, this typically takes a few minutes and sometimes up to 10 minutes.

@markjbrown - do you mind taking a look at this one? I'd expect the Cosmos Account not to report complete until it is fully provisioned. @zapadoody -- do you happen to have the code sample of the repro and a correlation ID when the error occured?

markjbrown · 2021-11-08T16:38:47Z

For run-time deployment errors you should raise a support ticket as they are best equipped to diagnose specific errors with an activity id.

However I am happy to look at an existing bicep file though to see if there are any issues.

I do have a sample on how to output the endpoint and key from a Cosmos account and input into appSettings for an App Service here if that helps.

https://github.com/Azure/azure-quickstart-templates/blob/master/quickstarts/microsoft.documentdb/cosmosdb-webapp/main.bicep

wsucoug69 · 2021-11-09T14:29:23Z

here's my cosmosAccount.bicep

param location string
param cosmosAccountName string
param cosmosDefaultConsistencyPolicy string 
param cosmosPrimaryRegion string
param cosmosSecondaryRegion string

var lowerCosmosAcctName = toLower(cosmosAccountName)
var locations = [
  {
    locationName: cosmosPrimaryRegion
    failoverPriority: 0
    isZoneRedundant: false
  }
  {
    locationName: cosmosSecondaryRegion
    failoverPriority: 1
    isZoneRedundant: false
  }
]

resource cosmosAccountResource 'Microsoft.DocumentDB/databaseAccounts@2021-06-15' = {
  name: lowerCosmosAcctName
  kind: 'GlobalDocumentDB'
  location: location
  properties: {
    locations: locations
    databaseAccountOfferType: 'Standard'
    enableAutomaticFailover: true
    consistencyPolicy: {
      defaultConsistencyLevel: cosmosDefaultConsistencyPolicy
    }
  }
}


output cosmosAccountResourceName string = cosmosAccountResource.name

here's the KeyVault.bicep

param location string 
param keyVaultName string
param productionPrincipalId string
param productionTenantId string
param stagingPrincipalId string
param stagingTenantId string

@secure()
param cosmosPrimaryConnectionString string

@secure()
param cosmosSecondaryConnectionString string

@secure()
param serviceStorageConnectionString string

@secure()
param appStorageConnectionString string


resource keyVault 'Microsoft.KeyVault/vaults@2019-09-01' = {
  name: keyVaultName
  location: location
  properties: {
    enabledForDeployment: true
    enabledForTemplateDeployment: true
    enabledForDiskEncryption: true
    tenantId: productionTenantId
    accessPolicies: [
      {
        tenantId: productionTenantId
        objectId: productionPrincipalId
        permissions: {
          secrets: [
            'get'
            'list'
          ]
        }
      }
      {
        tenantId: stagingTenantId
        objectId: stagingPrincipalId
        permissions: {
          secrets: [
            'get'
            'list'
          ]
        }
      }
    ]
    sku: {
      name: 'standard'
      family: 'A'
    }
  }  
}

resource cosmosPrimaryConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
  name: '${keyVaultName}/cosmosPrimaryConnectionString'
  properties: {
    value: cosmosPrimaryConnectionString
  }
  dependsOn:[
    keyVault
  ]
}

resource cosmosSecondaryConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
  name: '${keyVaultName}/cosmosSecondaryConnectionString'
  properties: {
    value: cosmosSecondaryConnectionString
  }
  dependsOn:[
    keyVault
  ]
}

resource serviceStorageConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
  name: '${keyVaultName}/dbConnectionString'
  properties: {
    value: serviceStorageConnectionString
  }
  dependsOn:[
    keyVault
  ]
}

resource appStorageConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
  name: '${keyVaultName}/appStorageConnectionString'
  properties: {
    value: appStorageConnectionString
  }
  dependsOn:[
    keyVault
  ]
}

output appStorageConnectionStringUri string = appStorageConnectionStringSecret.properties.secretUri
output serviceStorageConnectionStringUri string = serviceStorageConnectionStringSecret.properties.secretUri
output cosmosPrimaryConnectionStringUri string = cosmosPrimaryConnectionStringSecret.properties.secretUri
output cosmosSecondaryConnectionStringUri string = cosmosSecondaryConnectionStringSecret.properties.secretUri

and here's the main.bicep

/// cosmos db account, database and container module
module cosmosAccountMod '../cosmosAccount.bicep' = {
  name: 'cosmosAccount-${environmentName}-${buildNumber}'
  params: {
    cosmosAccountName: cosmosAccountName
    cosmosDefaultConsistencyPolicy: cosmosDefaultConsistencyPolicy
    cosmosPrimaryRegion: cosmosPrimaryRegion
    cosmosSecondaryRegion: cosmosSecondaryRegion
    location: location
  }
}

module cosmosDatabaseMod '../cosmosDbContainer.bicep' = {
  name: 'cosmosDBContainer-${environmentName}-${buildNumber}'
  params: {
    cosmosAccountName: cosmosAccountMod.outputs.cosmosAccountResourceName
    cosmosContainerName: cosmosContainerName
    cosmosDatabaseName: cosmosDatabaseName
    cosmosThroughput: cosmosThroughput
  }
  dependsOn: [
    cosmosAccountMod
  ]
}

// storage account module - storage for the tenants application 
module appStorageAccountMod '../storageAccount.bicep' = {
  name: 'appStorageAcctName-${environmentName}-${buildNumber}'
  params: {
    storageAcctName: appStorageAcctName
    storageSkuName: appStorageAcctSku
    location: location
  }
}

// app insights module
module appInsightsMod '../appInsights.bicep' = {
  name: 'appInsightsName-${environmentName}-${buildNumber}'
  params: {
    name: appInsightsName
    resourceGroupLocation: location
  }
}

// app service plan module
module appServicePlanMod '../appServicePlan.bicep' = {
  name: 'appServicePlan-${environmentName}-${buildNumber}'
  params: {
    appSvcPlanSku: appSvcPlanSku
    appSvcPlanTier: appSvcPlanTier
    appSvcPlanName: appSvcPlanName
    appPlanLocation: location
  }
}

// function app module
module functionAppMod '../functionApp.bicep' = {
  name: 'functionApp-${environmentName}-${buildNumber}'
  params: {
    appSvcPlanName: appSvcPlanName
    functionAppName: functionAppName
    location: location
  }
  dependsOn: [
    appStorageAccountMod
    appServicePlanMod
    cosmosAccountMod
  ]
}

// service storage account module - storage for the function app 
module serviceStorageAccountMod '../storageAccount.bicep' = {
  name: 'serviceStorageAcctName-${environmentName}-${buildNumber}'
  params: {
    storageAcctName: serviceStorageAcctName
    storageSkuName: serviceStorageAcctSku
    location: location
  }
}

// key vault module
module keyVaultMod '../keyVault.bicep' = {
  name: 'keyVaultName-${environmentName}-${buildNumber}'
  params: {
    keyVaultName: keyVaultName
    location: location
    cosmosPrimaryConnectionString: listConnectionStrings(resourceId('Microsoft.DocumentDB/databaseAccounts', cosmosAccountName), '2020-04-01').connectionStrings[0].connectionString
    cosmosSecondaryConnectionString: listConnectionStrings(resourceId('Microsoft.DocumentDB/databaseAccounts', cosmosAccountName), '2020-04-01').connectionStrings[1].connectionString
    productionPrincipalId: functionAppMod.outputs.productionPrincipalId
    productionTenantId: functionAppMod.outputs.productionTenantId
    stagingPrincipalId: functionAppMod.outputs.stagingPrincipalId
    stagingTenantId: functionAppMod.outputs.stagingTenantId
    serviceStorageConnectionString: serviceStorageAccountMod.outputs.storageAccountConnectionString
    appStorageConnectionString: appStorageAccountMod.outputs.storageAccountConnectionString
  }
  dependsOn:[
    functionAppMod
    cosmosAccountMod
    cosmosDatabaseMod
  ]
}

// function app settings module
module functionAppSettingMod '../functionAppSettings.bicep' = {
  name: 'functionAppSettings-${environmentName}-${buildNumber}'
  params: {
    appInsightsKey: appInsightsMod.outputs.appInsightsKey
    cosmosConnectionStringUri: keyVaultMod.outputs.cosmosPrimaryConnectionStringUri
    appStorageConnectionStringUri: keyVaultMod.outputs.appStorageConnectionStringUri
    serviceStorageConnectionStringUri: keyVaultMod.outputs.serviceStorageConnectionStringUri
    functionAppName: functionAppMod.outputs.prodSlotFunctionAppName
    functionAppStagingName: functionAppMod.outputs.stagingSlotFunctionAppName
  }
  dependsOn:[
    functionAppMod
    appInsightsMod
    cosmosAccountMod
    keyVaultMod
  ]
}

wsucoug69 · 2021-11-09T14:31:47Z

Also to clarify previously I was using the output in the cosmosAccount.bicep but changed to the query approach to try ad get away from the error. Thanks for the tip on raising the support ticket.

wsucoug69 · 2021-11-09T20:29:46Z

For run-time deployment errors you should raise a support ticket as they are best equipped to diagnose specific errors with an activity id.

However I am happy to look at an existing bicep file though to see if there are any issues.

I do have a sample on how to output the endpoint and key from a Cosmos account and input into appSettings for an App Service here if that helps.

https://github.com/Azure/azure-quickstart-templates/blob/master/quickstarts/microsoft.documentdb/cosmosdb-webapp/main.bicep

@alex-frankel Can you take a look at that? It seems the dependsOn is being fulfilled with the ack of the started and/or accepted responses rather than succeeded

wsucoug69 · 2021-11-10T14:31:55Z

My scenario includes creating a Cosmos Account, this typically takes a few minutes and sometimes up to 10 minutes.

@markjbrown - do you mind taking a look at this one? I'd expect the Cosmos Account not to report complete until it is fully provisioned. @zapadoody -- do you happen to have the code sample of the repro and a correlation ID when the error occured?

@alex-frankel any thoughts on the bicep here? Also I have opened a support case for this if you need that ref # let me know and I can send direct.

markjbrown · 2021-11-10T18:23:07Z

The problem is this listConnectionStrings function. I've never seen it before. I tried testing in an ARM template and it doesn't work (not sure why the template didn't fail validation).

If you want to output the endpoint and keys use this syntax below. To make it as a connection string just concat them together with "AccountEndpoint=" and ";AccountKey="

"[reference(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName'))).documentEndpoint]"
"[listKeys(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName')), '2021-04-15').primaryMasterKey]"

wsucoug69 · 2021-11-11T15:23:46Z

The problem is this listConnectionStrings function. I've never seen it before. I tried testing in an ARM template and it doesn't work (not sure why the template didn't fail validation).

If you want to output the endpoint and keys use this syntax below. To make it as a connection string just concat them together with "AccountEndpoint=" and ";AccountKey="

"[reference(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName'))).documentEndpoint]" "[listKeys(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName')), '2021-04-15').primaryMasterKey]"

@markjbrown apologies thank you for the assistance!!!

brwilkinson · 2023-04-17T08:18:38Z

Thank you @tejas-nagchandi keep us posted, if you are able to resolve, otherwise we can continue to investigate.

tejas-nagchandi · 2023-04-17T09:17:07Z

@brwilkinson: I tested with dependsOn as well. But extensions dependencies are not resolved.
My bicep:

module componentVM 'virtualMachines.bicep' = [for (vm, index) in component: {
  name: '${vmType}VM-${vm.name}'
  params: {
    location: location
    vmName: vmName[index].name
    zone: vm.zone
    subnet: subnet
    vmProperties: properties
    keyVaultName: keyVaultName
    vnetName: vnetName
    vnetResourceGroup: vnetResourceGroup
    infraEncryptionKeyId: infraEncryptionKeyId
    uaiForDiskid: uaiForDiskid
    uaiForVMid: uaiForVMid
    lbProperties: lbProperties
  }
}]

module protectVM 'protectedItems.bicep' = [for (vm, index) in component: {
  name: 'protect-${vm.name}'
  dependsOn: componentVM
  params: {
    location: location
    policyId: policyId
    vaultName: vaultName
    vmName: vm.name
    resourceSuffix: resourceSuffix
  }
}]

This deployment fails with message

"message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
"details": [
  {
    "code": "ResourceDeploymentFailure",
    "target": "/subscriptions/xxxxx-xxxx-xxxx-xxxx-xxxxxxxx/resourceGroups/xxxx-d-rg/providers/Microsoft.RecoveryServices/vaults/xxxx-rsv/backupFabrics/Azure/protectionContainers/iaasvmcontainer;iaasvmcontainerv2;xxxx-d-rg;xxxxxxx/protectedItems/vm;iaasvmcontainerv2;xxxx-d-rg;xxxxxxxx",
    "message": "The 'AzureAsyncOperationWaiting' resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "UserErrorGuestAgentStatusUnavailable",
        "message": "VM agent is unable to communicate with the Azure Backup Service."
      }]}
  ]}

This happens because "Agent" and "VM Extensions" deployment is initiated by Policies set by organization.

And, when I wait for these extensions to be "Provisioned Successfully" and retry the failed deployment, the deployment goes smooth and my VMs are included in the ProtectedItems of my recoveryServiceVault.

brwilkinson · 2023-04-17T09:30:49Z

Hi @tejas-nagchandi

This is an interesting scenario that I see. I am not sure if you have uncovered a bug OR if it is actually doing the correct thing.

Let me explain.

dependson is actually an array of resource references.

E.g.

  dependsOn: [
    componentVM
  ]

in your case since componentVM is actually an array, it appears to look correct, however I am unsure if it is actually working correctly.
- I will do some more testing on that, however in the meantime can you please update the syntax as above.

you can also use what is below... above waits for all VM's to complete, below waits for the single VM iteration to complete. Both should work, however if you need a longer delay, just use the one above. i.e. wait for ALL VM's to complete prior to deploying the vmprotection.

 dependsOn: [
   componentVM[index]
 ]

if you still hit issues, perhaps also consider adding the following as well.

@batchSize(1)
module protectVM 'protectedItems.bicep' = [for (vm, index) in component: {
  name: 'protect-${vm.name}'
    dependsOn: [
    componentVM
  ]

Please let us know the outcome, hopefully you don't need the batchsize.

tejas-nagchandi · 2023-04-17T09:55:11Z

@brwilkinson: Same outcome after adding dependsOn as array and batchSize annotation.
I was actually expecting the same outcome as the dependsOn without array was also resolving into the correct dependency, I checked the compiled ARM before the deployment as well.

The main issue is that the VM resource deployment is reported successful before the extensions (initiated by policy) are getting provisioned.

brwilkinson · 2023-04-17T09:55:32Z

@tejas-nagchandi
If your issue is still not resolved after the dependson, I think it would be best to just open up a separate discussion for this topic of installing and configuring the backup agent. We can keep that topic outside of this issue/thread, then report back here on the outcome.

https://github.com/Azure/bicep/discussions

tejas-nagchandi · 2023-04-17T10:09:10Z

Sure @brwilkinson, I will open a separate discussion on this. Thanks for the quick response so far.

brwilkinson · 2023-04-18T10:25:55Z

Thank you @tejas-nagchandi for opening the separate discussion.

Installing and configuring the backup agent for VMs which are initiated by Azure policies #10462

We were able to determine that the conflict was from setting the backup in Bicep as well as in Azure Policy. So the recommendation was to remove this configuration from Bicep and allow the Policy to deploy the desired vm protetion configuration.

tejas-nagchandi · 2023-04-20T09:07:08Z

Thank you @tejas-nagchandi for opening the separate discussion.

Installing and configuring the backup agent for VMs which are initiated by Azure policies #10462

We were able to determine that the conflict was from setting the backup in Bicep as well as in Azure Policy. So the recommendation was to remove this configuration from Bicep and allow the Policy to deploy the desired vm protetion configuration.

@brwilkinson: The final solution is to gain all control within Bicep, so that the dependencies are managed easily. So, not to wait for policies to initiate but include extensions as well as protectedItems in bicep.

Kaloszer · 2023-04-21T11:40:13Z

Same issue when you have a Sentinel Analytic Rule which has a query using a newly created watchlist. Even though the watchlist resource is in dependsOn - it will still fail initially - because it still takes time for it to be available for querying (even after a sucessful deployment), a retry with a timer would help here.

brwilkinson · 2023-04-25T08:10:55Z

@Kaloszer can you share the info back on that other watchlist discussion?

bowlerma · 2023-08-07T08:59:28Z

We're hitting similar problems when deploying Azure SQL. We have a template that deploys a logical Azure SQL servers and then performs a number of additional configuration such as enabling audit, adding an AD Admin user, setting the connection policy, configuring firewall rules and adding elastic pools. All of these child resources are using 'dependsOn' to ensure that they run one after the other in series rather than in parallel.

Most of the times this works, but occasionally the template deployment fails with an 'Internal Server Error'. When we raise this with the Microsoft support team they just tells us "The server is currently busy. Please wait a few minutes and try again." Retrying the template deployment doesn't always work, and there is no built in mechanism to add this delay.

In this particular case I'd of thought a better response here would be to return a 429 response rather than a 500 response so that the deployment of each child resource can be automatically tried again with an exponential backoff between each retry.

It's little issues like this that make working with ARM such a frustrating experience. Just because something deployed OK once, there's no guarantee that it will deploy successfully the next time.

mdjx · 2023-09-03T15:11:47Z

When Entra Domain Services (previously Azure AD Domain Services / AADDS) is deployed via Bicep the deployment completes within Bicep, but the actual resource remains in the "Deploying" state in Azure for at least ~20 minutes longer.

A wait/retry mechanism would help ensure the service is fully provisioned before further deployments kick off that depend on it, or at least allow them to retry.

Kaloszer · 2023-09-18T14:53:34Z

Yet another case when you're trying to assign >1 federated identity to an uami within the same module:

Too many Federated Identity Credentials are written concurrently for the managed identity '/subscriptions/<sub>/resourcegroups/<rg>/providers/microsoft.managedidentity/userassignedidentities/<uami01'. Concurrent Federated Identity Credentials writes under the same managed identity are not supported. (Code: ConcurrentFederatedIdentityCredentialsWritesForSingleManagedIdentity)

PS:
Workaround is to deploy another module with the second binding with a dependency on the first one, but still...

sserjeglobant · 2024-01-12T18:57:45Z

Hello, I would like to know if you continue with this very necessary development, here is another example of what is happening:

It turns out that I have to create a vnet and multiple subnets,

I have a module for vnets and another module for subnets.

In the main, I call each module as follows:

vnet module plus its parameters

subnet module plus its parameters and the depends on vnet module name with the for function that reads the object of the subnets that it has to create.

What happens is that sometimes when subnet 0 is created, Azure Deployment has not closed the process and when it is going to be sent to create subnet 1, an error appears that there is a previous creation process and that the next one cannot be created. subnet thus damaging the deployment.

Does anyone have an idea how else I can solve this problem? Or maybe MS can help us with this valuable feature of adding waiting times to the modules.

SvenAelterman · 2024-01-13T05:32:06Z

Hello, I would like to know if you continue with this very necessary development, here is another example of what is happening:

It turns out that I have to create a vnet and multiple subnets,

I have a module for vnets and another module for subnets.

In the main, I call each module as follows:

vnet module plus its parameters

subnet module plus its parameters and the depends on vnet module name with the for function that reads the object of the subnets that it has to create.

What happens is that sometimes when subnet 0 is created, Azure Deployment has not closed the process and when it is going to be sent to create subnet 1, an error appears that there is a previous creation process and that the next one cannot be created. subnet thus damaging the deployment.

Does anyone have an idea how else I can solve this problem? Or maybe MS can help us with this valuable feature of adding waiting times to the modules.

This is a very different issue.

If you're expecting to be able to redeploy the module for your virtual network, you'll need to make sure you create your subnets with the virtual network, not separately (that's an anti-pattern). If you try to redeploy your virtual network only (no subnts) once you have created subnets and deployed resources in them, the deployment of the virtual network will attempt to delete your subnets, which is neither desired nor possible and will thus cause your virtual network deployment to fail.

If you are looking to deploy additional subnets in an existing virtual network (and will then never again deploy the virtual network unless you pull the full subnet configuration again), then you need to use the @batchSize(1) decorator in the subnet loop.

aslan-im · 2024-02-29T07:54:01Z

what is the status?

matzter · 2024-03-20T08:04:33Z

I have another, similar issue deploying a Front Door profile and a metricAlert in the same deployment.

'Microsoft.Cdn/profiles@2023-05-01
'Microsoft.Insights/metricAlerts@2018-03-01'

The error is "Couldn't find a metric named OriginHealthPercentage"
And yes, the metricAlerts deployment is depending on the profile deployment.

devdeer-alex · 2024-04-27T13:02:27Z

Just to be clear here: Isn't that contradicting the statement from the documentation?

Repeatable results: Repeatedly deploy your infrastructure throughout the development lifecycle and have confidence your resources are deployed in a consistent manner. Bicep files are idempotent, which means you can deploy the same file many times and get the same resource types in the same state. You can develop one file that represents the desired state, rather than developing lots of separate files to represent updates.

mattias-fjellstrom · 2024-05-20T13:21:55Z

Whatever solution is planned for this, will it be Bicep-specific or will it be available in ARM-templates as well?

I encountered an issue with Azure Policy where I use a policy-set containing a number of policies that each enables a given Defender for Cloud plan (Storage, CosmosDB, ARM, etc) if it is not enabled for a given subscription (each policy uses the deployIfNotExists effect).
When I create a new subscription these policies all run at the same time and some of them will error out with a Conflict ... error message. As far as I understand there seems to be no retry-operation built-in to Azure Policy (been waiting a few hours to make sure). So this would be a good scenario for specifying a retry in the ARM-template defined inline of the policy.

WhitWaldo · 2024-05-20T14:38:25Z

@mattias-fjellstrom Likely ARM-level given that Bicep is generating ARM under the hood for deployments (as evidenced by the artifacts in Azure following such a deployment).

alex-frankel · 2024-05-20T23:57:08Z

@WhitWaldo is correct!

mattias-fjellstrom · 2024-05-21T07:13:29Z

@WhitWaldo Very true, that makes sense 👍🏻

NickSpag · 2024-08-30T15:41:37Z

Has this been assigned or further discussed @alex-frankel? We're an ISV with an azure managed application in the marketplace so IaC-based environments are part of our CICD.

There are a few classes of errors here where this would be helpful. To highlight one: in the past few years alone we regularly see the metric alerts issue that's been discussed here, where metric's aren't "ready," and once or twice a year it results in multi-day disruptions to our customer updates and development cycle when the wait time needed is beyond anything we can orchestrate by manually pushing the alerts module down the deployment chain.

I'm sure this proposal is extensive work and cuts against the spirit of a declarative DSL but as a practical effect for our org: we're essentially at the point where we are going to have to extend our entire deployment approach to include a packaged C#-based runner, and/or network-connected DevOps pipelines in to customer tenants, exclusively in order to achieve wait/retry functionality (and graceful failure, if I had a wish list).

Unfortunately the Resource Providers simply aren't reliable enough to depend on here and we need appropriate tools to account for that reality.

rshariy added the enhancement New feature or request label Nov 26, 2020

ghost added the Needs: Triage 🔍 label Nov 26, 2020

alex-frankel added intermediate language Related to the intermediate language and removed Needs: Triage 🔍 labels Nov 30, 2020

alex-frankel added provider bug revisit and removed enhancement New feature or request intermediate language Related to the intermediate language labels Dec 3, 2020

jeskew mentioned this issue Apr 28, 2023

Cant create a KeyVault and use its getSecret method in the same template #10562

Open

alex-frankel mentioned this issue May 4, 2023

Dependencies within a module seems to be ignored (even when using dependsOn) #10611

Closed

jongio mentioned this issue May 9, 2023

Add retries for sqlcmd in sqlserver.bicep Azure/azure-dev#2098

Open

WhitWaldo mentioned this issue Dec 7, 2023

Automatic resource name generation function #12651

Closed

WhitWaldo mentioned this issue May 5, 2024

Add Azure Event Grid Aspire Component dotnet/aspire#788

Open

alex-frankel modified the milestone: Dilithium May 16, 2024

Add "wait" and "retry" deployment options #1013

Add "wait" and "retry" deployment options #1013

Comments

rshariy commented Nov 26, 2020 • edited Loading

alex-frankel commented Nov 30, 2020

anthony-c-martin commented Dec 2, 2020

alex-frankel commented Dec 2, 2020

bmoore-msft commented Dec 2, 2020

rshariy commented Dec 2, 2020

alex-frankel commented Dec 3, 2020 • edited Loading

Agazoth commented Mar 31, 2021

eja-git commented Apr 14, 2021 • edited Loading

bmoore-msft commented Apr 19, 2021

Pietervanhove commented Jul 1, 2021 • edited by anthony-c-martin Loading

azMantas commented Oct 1, 2021

SenthuranSivananthan commented Oct 1, 2021

SQLDBAWithABeard commented Oct 1, 2021

wsucoug69 commented Nov 8, 2021

alex-frankel commented Nov 8, 2021

markjbrown commented Nov 8, 2021

wsucoug69 commented Nov 9, 2021 • edited by anthony-c-martin Loading

wsucoug69 commented Nov 9, 2021

wsucoug69 commented Nov 9, 2021 • edited Loading

wsucoug69 commented Nov 10, 2021

markjbrown commented Nov 10, 2021

wsucoug69 commented Nov 11, 2021

brwilkinson commented Apr 17, 2023

tejas-nagchandi commented Apr 17, 2023

brwilkinson commented Apr 17, 2023

dependson is actually an array of resource references.

tejas-nagchandi commented Apr 17, 2023

brwilkinson commented Apr 17, 2023

tejas-nagchandi commented Apr 17, 2023

brwilkinson commented Apr 18, 2023

tejas-nagchandi commented Apr 20, 2023

Kaloszer commented Apr 21, 2023

brwilkinson commented Apr 25, 2023

bowlerma commented Aug 7, 2023

mdjx commented Sep 3, 2023 • edited Loading

Kaloszer commented Sep 18, 2023 • edited Loading

sserjeglobant commented Jan 12, 2024

SvenAelterman commented Jan 13, 2024

aslan-im commented Feb 29, 2024

matzter commented Mar 20, 2024

devdeer-alex commented Apr 27, 2024 • edited Loading

mattias-fjellstrom commented May 20, 2024

WhitWaldo commented May 20, 2024

alex-frankel commented May 20, 2024

mattias-fjellstrom commented May 21, 2024

NickSpag commented Aug 30, 2024 • edited Loading

rshariy commented Nov 26, 2020 •

edited

Loading

alex-frankel commented Dec 3, 2020 •

edited

Loading

eja-git commented Apr 14, 2021 •

edited

Loading

Pietervanhove commented Jul 1, 2021 •

edited by anthony-c-martin

Loading

wsucoug69 commented Nov 9, 2021 •

edited by anthony-c-martin

Loading

wsucoug69 commented Nov 9, 2021 •

edited

Loading

mdjx commented Sep 3, 2023 •

edited

Loading

Kaloszer commented Sep 18, 2023 •

edited

Loading

devdeer-alex commented Apr 27, 2024 •

edited

Loading

NickSpag commented Aug 30, 2024 •

edited

Loading