Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype: Take arbitrary actions in response to events #35253

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

apparentlymart
Copy link
Member

This is a prototype of a fresh take on my earlier proposal #6810 which proposed one way to fill a current gap in Terraform's model in representing arbitrary non-CRUD actions that might appear in a remote API. For example:

  • Invalidating a cache.
  • Restarting a service in-place as an attempt to clear a fault.
  • Sending a notification somewhere.
  • CRUD-like operations that aren't a good fit for Terraform's managed resource model, like creating a backup.

The general goal is to model anything that doesn't really make sense to treat as declarative, and then to allow both triggering those actions directly and also describing situations where Terraform should take those actions automatically in reaction to other changes made declaratively.

The following describes the initial ideas I'm trying for here. I'm writing this before I've done much prototyping, so the details are likely to change as I learn from the prototyping exercise.

Actions

An "action" is an imperative action that can be exposed either directly by a provider or indirectly through a module.

Terraform already uses the word "action" to refer to the built-in actions "create", "update", "delete", etc, so another way to think about it is that these are additional actions that can extend the built-in behaviors of Terraform to deal with situations Terraform's default model cannot represent.

Actions when used alone are something you can trigger manually when creating a plan, in which case Terraform would perform the custom actions at the end of the apply phase after all of the normally-planned actions have completed.

However, the more interesting way to use actions is to associate them with events so that Terraform can trigger them automatically based on other changes.

An action declared in a module wraps a sequence of actions defined either in providers or in child modules, allowing module authors to expose higher-level actions as part of their abstraction. For example, a provider-level action for "invalidate a CloudFront distribution's caches" would require specifying which CloudFront distribution to use, while a module that encapsulates a CloudFront distribution could expose an action for invalidating the caches for that specific CloudFront distribution, avoiding the need for the operator to look up the id manually.

Events

The idea of "events" helps Terraform model the side-effects from applying a plan in a way that allows automatically triggering actions.

Since Terraform is active only during plan/apply steps, "events" are not realtime notifications but are instead recognized during the plan phase, potentially causing additional actions to be added to the plan in response to those events. The idea is that during planning the provider predicts the events that would occur during the apply phase, which then allows Terraform Core to plan any additional actions those events will cause, and then those additional actions are included when applying the overall plan.

As with the previous form of this proposal, I imagine Terraform providing its own event types that correspond to the side-effects being performed by Terraform itself: objects being created, destroyed, updated, or just generally "changed" (all of the previous actions together).

This proposal also includes the possibility of providers to define their own resource-type-specific event types for modelling more specific events that Terraform Core cannot recognize alone. For example, the consul_key_prefix resource type in the hashicorp/consul provider might choose to announce a separate event for each individual key that is being changed so that actions can be triggered for only a subset of them.

For provider-specific event types, the PlanResourceChange provider API function would grow to allow the provider to return in its response zero or more events that the action implies, and then Terraform would use that information to decide which actions to plan using the automation rules as described in the next section.

Automation Rules

An automation rule represents glue between events and actions. I expect to follow the typical "if this, then that" model commonly used for high-level automations, where "this" are events with optional custom filter expressions and "that" are sequences of actions.

An example automation rule might be "if any instance of aws_s3_object.website is changed, trigger the hashicorp/aws provider's CloudFormation cache invalidate action".

In the case of that rule, Terraform would check during the plan phase whether there's any change action associated with aws_s3_object.website. If so, Terraform would then ask the provider to plan the "cache invalidate" action with the module author's configured arguments. The result of planning the action is a planned result, similar to the "planned new state" for a managed resource instance change, but since actions are non-declarative the results would be marked as ephemeral values.

In the result of planning the action the provider might announce even more events that the action implies, which could then cause even more actions to be planned. That could potentially continue infinitely, so we'll need some way to curtail infinite or excessive recursive triggering. Perhaps that would be an application of the "deferred actions" ideas, generalized to arbitrary actions triggered by the provider.

@apparentlymart apparentlymart self-assigned this May 28, 2024
@apparentlymart
Copy link
Member Author

apparentlymart commented May 30, 2024

#24451 describes a potential variation of this which seems to boil down to there being an event that always occurs for every plan/apply round, and thus any action triggered by it would effectively be taken on every apply.

However, that naive definition does mean that the configuration would effectively never converge, because there would always be at least one action to take. A more subtle interpretation would be that this additional event is triggered only if there's already at least one planned action, in which case it can potentially also trigger the notification action. The notification action would never be the only action in a plan in that case, which would then allow the configuration to converge.

(This design can only deal with the apply-time variation of what's described there -- the main use-case that was raised -- and not with the init-time or plan-time "hooks". Performing arbitrary side-effects during plan is a more significant change in Terraform's execution model that's not in scope for what I'm prototyping here, and we've typically considered that sort of thing to be the responsibility of automation that's wrapping Terraform rather than of Terraform itself since the automation should always know what it's running and so can notify other systems when it's doing that.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant