-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AVM Module Issue]: DNS Zones - Deployment History limit #1237
Comments
@aavdberg, thanks for submitting this issue for the Important A member of the @azure/avm-res-network-dnszone-module-owners-bicep or @azure/avm-res-network-dnszone-module-contributors-bicep team will review it soon! |
Hi @aavdberg. Thanks for raising this issue. I am intrigued are you supplying domain + Records totalling to more than 800? The 800 limit comes from the resource group level limits as noted here: Could it be your deployment history is more than 800? in the CI Environment we have script to remove deployment history where we see failures but there is a time delay between this being removed from the visible view in the portal to the metadata being removed on the server side. @AlexanderSehr May have some more supporting information for this |
Hey @aavdberg, thanks for bringing this up. This is indeed a curious case since there is an auto-cleanup mechanism for deployments at this scope, i.e., if you're closing in on the 800 deployments limit, it will start removing deployments as per the FIFO principle. That means, you somewhow managed to create so many deployments, that Azure is actually not able to remove deployments in time. This could, for example, happen in the extreme case of creating 800 deployments at once, but can happen for less. You mention 'only' 70, which should only run in the aforementioned issue, if 70 is higher than the threshold by which Azure starts removing deployments, which as per the link that @ChrisSidebotham provided is at 700. I assume you deploy other resources with your DNS Zones that may pump the # of dpeloyments to above 100? @alex-frankel would you happen to have any advice / a best practice in mind how to avoid this issue when deploying templates that make heavy use of deployments? @ChrisSidebotham, the script you're referring to is most important for the management group level as there is no auto-cleanup to be found. But we did in fact also use it for the subscription level in the past if we tested our entire library multiple times a day. |
If the resource group has CanNotDelete lock (as it has when using this module with WAF parameters) it prevents deletions from the deployment history. |
@aavdberg - Do you have an update on this with regards to the above comments? |
We have seventy domains that we deploy through Azure DevOps pipeline. We use now @batchsize(1) to give the automatic cleanup time to do its thing and then it's working, but that makes the deployment take longer. We did try it also with a batchsize of ten but it also fails because the automatic cleanup does not go fast enough. The domains get deployed to a resourcegroup and the scope is also to resourcegroup in the bicep file. Every domain has several records. So, every record in the domain gets a deployment created. // Module Public DNS Zone
@batchSize(1)
module publicdnszone 'br/public:avm/res/network/dns-zone:0.2.4' = [
for dnszone in varPublicDnsZones: {
name: 'deploy-PublicDnsZone-${dnszone.name}'
params: {
name: dnszone.name
location: 'global'
lock: {
kind: 'None'
name: 'lock'
}
a: dnszone.recordSets.?a
aaaa: dnszone.recordSets.?aaaa
caa: dnszone.recordSets.?caa
cname: dnszone.recordSets.?cname
mx: dnszone.recordSets.?mx
ptr: dnszone.recordSets.?ptr
soa: dnszone.recordSets.?soa
srv: dnszone.recordSets.?srv
txt: dnszone.recordSets.?txt
tags: {
createdBy: 'Bicep through Azure DevOps'
solutionName: parSolutionName
}
}
}
] Hopes this gives more clearity about the problem. |
It does, thanks @aavdberg. And it does yet again surface two challenges with Deployment objects in Azure:
I guess what I'm trying to say is that there is always a trade-off. The good news is, that the PG is aware and are working on speeding deployments up. ETA unknown though. |
@alex-frankel - Would you be able to assist in an ETA for the deployments work Alex mentioned above? Is there a better place for this issue regarding deployments than the current BRM Repository? |
@sydkar - Any update on moving this to the correct area? |
@ChrisSidebotham No update right now, trying to get an ETA on the deployments speed up work. I'll let you know by the end of the week. |
Sorry about the delay on this one. Is @teemukom's comment accurate? If so, then that is likely the source of the issue no? @aavdberg, can you take a look at the Deployment resources in the Resource Group (there is a "Deployments" tab in the table of contents of the Resource Group blade in the portal)? Does it eventually get pruned back down to 700? If so, then that suggests @AlexanderSehr is correct that we are not able to keep up with the pace of deployments getting created. The fact that the deployment works with @aavdberg if this is still occurring, can you share a recent correlation ID and we can investigate further? |
I'm also curious what the var |
Send you a ping in teams @alex-frankel |
When using a batchsize(1) then it slows down the deployment and have the deployment history cleanup time to clean up the old deployments. |
Just got off a call with @aavdberg and he is creating a large number of DNS records. With the current structure each new record (even if the record is a new module) will produce a unique deployment. All the deployments are ideally happening in parallel which is why the Deployment clean up process is not able to keep up. @ChrisSidebotham / @AlexanderSehr / @jtracey93 -- can we consider refactoring this AVM module to not produce so many nested modules? While we ideally would be able to (and plan to) raise the limit, it won't happen in the short-term, so refactoring the module is the only short-term fix that I feel is practical. The refactor would require creating all of the records in the entry point file for the module instead of in a nested deployment. I realize reduces the separation of concerns, but that might be a worthy tradeoff to increase usability for the AVM consumer. Thoughts? |
1 is maybe a bit ... low and 50 or 100 may do the trick 😄 |
@AlexanderSehr, @aavdberg mentions that even with a batchSize of 10, he is running into this issue:
Can we consider refactoring this module? |
Hola Mr @alex-frankel, As you may know, coming from CARML, we've traditionally always written modules in a way that they align closly with the Azure REST API's structure (i.e., how resource types are nested in provider namespaces and resource types may have child-resource types) and treated child-resource types as full-on modules that we published independently to their parent. Now, since we migrated the library to AVM we don't do this anymore due to the technical limitations in the backend, that is certain manual updates that can make this quite painful for every new module. There is the hope to work through this eventually, but it's what we have today. We've already seen several issues being opened that asked for the child-modules, so there's definitely a need beyond what we're ourselves using in Microsoft & at customers. Coming back to the issue at hand: The intuitive alignment with the API structure and the comprehensive publishing are the main drivers behind the design of the modules today - and doing so consistently. As a result, child-resource types are always nested in their own module file and are tested against the same rule set as their parent (with the sole exeption of the missing Some additional thoughts: Personally, I value consistency a lot as its benefits usually massively outweigh the benefits of inconsistency. I can't however argue with the fact that in this case it comes at the very unfortuante cost of deployment speed & the deployment cleanup. So much for my personal thoughts on the matter. I'm sure others have some thoughts too :) |
Ultimately, your call :) I don't think the deployment speed fix is a meaningful fix to this issue. That being said, @anthony-c-martin what is the status on the rollout for the performance improvements with nested modules? |
That's true. The 'speed' is an issue by itself - while this issue is about the number of deployments and their cleanup. May have gotten a bit dragged away towards the end 😏 Will update. Regardless, it surely can be 'our' call as a collective. The information provided just paints the picture of why the structure is the way it is, and what my subjective thoughts on the matter are 😉. I'm sure there's a wide variety of opions and maybe even more ideas (: |
@aavdberg as an aside, if you'd like to be onboarded to the Deployments performance improvement that @AlexanderSehr referenced, we can onboard you manually. Just email me your subscription ID(s). We are in the process of rolling it out to everyone, but we are going to do that slowly and progressively. |
@alex-frankel send you ping in Teams with subscription id |
Hi @alex-frankel, Is the deployment performance improvement open to anyone? |
Important The "Needs: Triage 🔍" label must be removed once the triage process is complete! Tip For additional guidance on how to triage this issue/PR, see the AVM Issue Triage documentation. |
Check for previous/existing GitHub issues
Issue Type?
I'm not sure
Module Name
avm/res/network/dns-zone
(Optional) Module Version
No response
Description
I am deploying through the 'br/public:avm/res/network/dns-zone:0.2.4' module Dnszones
But running to the following problem:
I deploy seventy domains with records in it.
(Optional) Correlation Id
No response
The text was updated successfully, but these errors were encountered: