Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Policy as Code against the planfile on the Update / Create #223

Open
lukegAtPaxos opened this issue Dec 8, 2023 · 22 comments · May be fixed by #226
Open

Support for Policy as Code against the planfile on the Update / Create #223

lukegAtPaxos opened this issue Dec 8, 2023 · 22 comments · May be fixed by #226
Labels
enhancement New feature or request needs:triage

Comments

@lukegAtPaxos
Copy link

What problem are you facing?

When running a terraform apply, validation of the plan file can be useful to ensure nothing unexpected is going to happen.

How could Official Terraform Provider help solve your problem?

Tools like conftest are used in other opensource tooling (Atlantis) and Hashicorp Sentinel in Terraform Cloud.

Would it be possible to block the Update / Create crossplane actions on a policy as code evaluation of the planfile.

@lukegAtPaxos lukegAtPaxos added enhancement New feature or request needs:triage labels Dec 8, 2023
@lukegAtPaxos lukegAtPaxos changed the title Support for OPA on the Update Support for Policy as Code against the planfile on the Update / Create Dec 8, 2023
@bobh66
Copy link
Collaborator

bobh66 commented Dec 9, 2023

You might be able to do something like this once #219 is merged. You could create the Workspace with managementPolicies: [Observe] which would run terraform plan and nothing else. We don't currently expose the output of that but it's probably feasible to do in some form. Then when you're happy with the output you could change the managementPolicies to [Observe, Create] or ['*'] to let it proceed. See https://docs.crossplane.io/latest/concepts/managed-resources/#managementpolicies

@lukegAtPaxos
Copy link
Author

Oh damn, I hadn't realised management policies weren't already implemented.

@caiofralmeida
Copy link
Contributor

Hi @lukegAtPaxos, the PR #219 has been merged.

@balu-ce
Copy link

balu-ce commented Jan 2, 2024

hi @bobh66 , iam making code changes like exposing the plan in gzip format either via config map or variable in status field, may i know which one will work and feasible

@bobh66
Copy link
Collaborator

bobh66 commented Jan 2, 2024

Hi @balu-ce - that's a good question. There are size limitations to both that might become an issue for really large terraform modules, but it would probably work for the majority of the use cases. Another possibility might be to create a Plan resource that gets created as a child of the Workspace resource. That doesn't solve any of the content size issues, it's just another way of representing the data.

I think I would lean toward using status for simplicity.

@balu-ce
Copy link

balu-ce commented Jan 3, 2024

@bobh66 we can increase the limit of CRD object if needed know ref

@balu-ce balu-ce linked a pull request Jan 5, 2024 that will close this issue
1 task
@balu-ce
Copy link

balu-ce commented Jan 5, 2024

@bobh66 Added changes, can you please review it

@lukegAtPaxos
Copy link
Author

lukegAtPaxos commented Jan 19, 2024

I've been using the provider a bit more recently and had some thoughts.

potentially the introduction of a PolicyConfig resource, much like the provider config it can be referenced by a workspace. In the terraform execution stage.

in https://github.com/upbound/provider-terraform/blob/main/internal/terraform/terraform.go the provided policy could be invoked on the plan, against something like conf test.

This would involve outputting a plan file, before the apply - and executing the policy engine. The apply could then be invoked from the stored plan file.

The result of a policy failure would be to cause the workspace to fail, bubbling up the failure in status.

curious what maintainers think, i can knock out a poc.

@negz
Copy link
Member

negz commented Jan 31, 2024

I'm not convinced that it's a good idea to try to implement a human-approves-plan workflow inside of provider-terraform's reconcile loop. In fact I'm pretty confident it's not.

What's the benefit of having an always-on control plane constantly monitoring your infrastructure if all it's going to do is update a plan that a human needs to review and tell it it's okay to apply?

I understand the desire for a dry run (e.g. crossplane/crossplane#1805) to build confidence before making a change to a configuration. Wouldn't it make more sense to shift that left, though? i.e. Check that the plan isn't going to do anything unexpected before the new configuration hits the control plane. For example by validating the new plan as part of a gitops workflow in CI (or even just by running terraform plan on a developer's laptop).

Edit: I feel similarly WRT automated tests. Why not run them before the configuration (i.e. CR) is ever applied to the control plane?

@balu-ce
Copy link

balu-ce commented Jan 31, 2024

I doubt that it will applicable for other native providers of crossplane but we need to reconsider it for terraform.

@negz
Copy link
Member

negz commented Jan 31, 2024

I doubt that it will applicable for other native providers of crossplane but we need to reconsider it for terraform.

Why?

@toastwaffle
Copy link
Contributor

@negz

Wouldn't it make more sense to shift that left, though? i.e. Check that the plan isn't going to do anything unexpected before the new configuration hits the control plane. For example by validating the new plan as part of a gitops workflow in CI (or even just by running terraform plan on a developer's laptop).

My concern would be that if we're not running terraform plan in exactly the same setup that provider-terraform runs terraform apply, we don't have any guarantees that the plan will reflect the changes that will be made. Especially with ProviderConfigs separating some of the config from the Workspace, it would be easy for the plan to be run with different providers, credentials, or a different terraform binary.

I agree that provider-terraform itself shouldn't be itself handle the approval workflow, but if it supported exposing the plan, you could fairly easily build an approval workflow on top of it using the management policies as Bob suggests above

@negz
Copy link
Member

negz commented Feb 13, 2024

you could fairly easily build an approval workflow on top of it using the management policies

It sounds like you're talking about an automated workflow, right? So some other system (not a human) validates the change? Do you have an example of what kind of validation such a system would run?

@toastwaffle
Copy link
Contributor

A combination of human and automated validations. One example would just be time based checking - e.g. changes should only be made between 10am and 4pm, Monday to Thursday, or only made during pre-approved maintenance windows.

I'd like to live in a world where we trusted ourselves and our systems to not require human approval of changes, but for production-critical environments we may be required to have a human review step as well.

@negz
Copy link
Member

negz commented Feb 13, 2024

I'd like to live in a world where we trusted ourselves and our systems to not require human approval of changes, but for production-critical environments we may be required to have a human review step as well.

Is that view compatible with the controller/reconciler "control plane" model? It almost seems like it's turning the control plane into a change notification system - i.e. something to let you know when drift has happened and ask you what to do about it.

Do you see this as something that would be relevant for other Crossplane providers, or only this Terraform one? Do you use/have plans to use other providers?

@toastwaffle
Copy link
Contributor

The way I see it, the control plane is responsible for actuating/enforcing an intent and correcting for drift; what we want to build on top of that is a set of safeguards between the changes being made by developers in source control, and a new version of the intent being "activated" (not that we intend to explicitly version things).

We have a lot of Terraform modules (occasionally several layers deep 😬 ) that are used in various combinations across many environments, and when a developer makes a change to a module, they're probably not going to be the one landing the change which bumps the module version and rolls that change and a bunch of others out to the fleet.

I don't think this is directly relevant to other Crossplane providers - the individual managed resource level is probably too granular for that sort of approval workflow - but I could see something similar being used for changes to compositions and claim parameters (given that a Terraform workspace can provision multiple cloud resources, I generally think of them as loosely equivalent to composite resources and compositions).

We do currently use provider-gcp, but only a tiny amount (probably 20-odd MRs, if that). Unfortunately we have a lot of history with Terraform, and I don't think there's any way we could justify rewriting all of our Terraform modules.

@bobh66
Copy link
Collaborator

bobh66 commented Feb 14, 2024

I don't know that I see any more work here for the provider beyond making the plan available for review in #226 . A higher layer can handle scheduling and approval of changes by manipulating claims and managementPolicies to control what changes get applied and when, and if the change gets applied automatically or requires approval.

@toastwaffle
Copy link
Contributor

Yes, absolutely, that PR is all we need for our purposes.

@lukegAtPaxos
Copy link
Author

lukegAtPaxos commented Feb 14, 2024

I'm not convinced that it's a good idea to try to implement a human-approves-plan workflow inside of provider-terraform's reconcile loop. In fact I'm pretty confident it's not.

What's the benefit of having an always-on control plane constantly monitoring your infrastructure if all it's going to do is update a plan that a human needs to review and tell it it's okay to apply?

I understand the desire for a dry run (e.g. crossplane/crossplane#1805) to build confidence before making a change to a configuration. Wouldn't it make more sense to shift that left, though? i.e. Check that the plan isn't going to do anything unexpected before the new configuration hits the control plane. For example by validating the new plan as part of a gitops workflow in CI (or even just by running terraform plan on a developer's laptop).

Edit: I feel similarly WRT automated tests. Why not run them before the configuration (i.e. CR) is ever applied to the control plane?

FWIW my original statement wasn't introducing a human approved step, but allowing a hook for policy automation to validate a plan file.

Policy in terraform can be written for many things - a policy could be written to ensure no AWS IAM Keys are being created for example - but at its root its validating the plan file, not offering a human approver the ability to approve.

@toastwaffle
Copy link
Contributor

@negz @bobh66 can we get a decision on this/#226 please?

@negz
Copy link
Member

negz commented Mar 25, 2024

I defer to @bobh66 and @ytsarev. While I created this provider, I haven't touched it in some time.

@bobh66
Copy link
Collaborator

bobh66 commented Mar 26, 2024

I understand the need for modeling changes and pre-approvals, but I agree with @negz that this sort of processing should be handled at a higher/different layer and that the provider should simply "do what it's told". I think it's better to handle this as part of crossplane/crossplane#5477 rather than do something provider-specific here. I'll update #266 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs:triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants