diff --git a/docs/README.md b/docs/README.md index f0b050ab87..360cb41eaf 100644 --- a/docs/README.md +++ b/docs/README.md @@ -8,9 +8,14 @@ to [the main Terraform CLI documentation](https://www.terraform.io/docs/cli/inde ## Terraform Core Architecture Documents -* [Terraform Core Architecture Summary](./architecture.md): an overview of the - main components of Terraform Core and how they interact. This is the best - starting point if you are diving in to this codebase for the first time. +* [Modules Runtime Architecture Summary](./architecture.md): an overview of the + main components of Terraform Core related to planning and applying modules. + This is the best starting point if you are diving in to this codebase for the + first time. + +* [Stacks Runtime Architecture Summary](../internal/stacks/README.md): an + overview of the main components of Terraform Core related to planning and + applying stack configurations. * [Resource Instance Change Lifecycle](./resource-instance-change-lifecycle.md): a description of the steps in validating, planning, and applying a change diff --git a/internal/stacks/README.md b/internal/stacks/README.md new file mode 100644 index 0000000000..feed5ca64f --- /dev/null +++ b/internal/stacks/README.md @@ -0,0 +1,48 @@ +# Terraform Stacks functionality + +The Go packages under this directory together implement the Terraform Stacks +features. + +Terraform Stacks is an orchestration layer on top of zero or more trees of +Terraform modules, and so much of what you'll find here is analogous to +a top-level package that serves a similar purpose for individual Terraform +modules or trees of modules. + +The main components here are: + +- `stackaddrs`: A stacks-specific analog to the top-level package `addrs`, + containing types we use to refer to objects within the stacks language and + runtime, and some logic for navigating between different types of addresses. + + This package builds on package `addrs`, since the stacks runtime wraps + the modules runtime. Therefore some of the stack-specific address types + incorporate more general address types from the other package. + +- `stackconfig`: Implements the loading, parsing, and static decoding for + the stacks language, analogous to the top-level package `configs` that + does similarly for Terraform's module language. + +- `stackplan` and `stackstate` together provide the models and + marshalling/unmarshalling logic for the Stacks variants of Terraform's + "plan" and "state" concepts. + +- `stackruntime` deals with the runtime behavior of stacks, including + the creation of plans based on a comparison between desired and actual state, + and then applying those plans. + + All of the dynamic behavior of the stacks language lives here. + +- `tfstackdata1` is a Go representation of an internal protocol buffers schema + used for preserving plan and state data between runs. These formats are + implementation details that external callers are not permitted to rely on. + + (The public interface is via the Terraform Core RPC API, which is + implemented in the sibling directory `rpcapi`.) + +## More Documentation + +The following are some more specific and therefore more detailed documents +about some particular parts of the implementation of the Terraform Stacks +features: + +* [Stacks Runtime internal architecture](./stackruntime/internal/stackeval/README.md) diff --git a/internal/stacks/stackruntime/internal/stackeval/README.md b/internal/stacks/stackruntime/internal/stackeval/README.md new file mode 100644 index 0000000000..37adee3c62 --- /dev/null +++ b/internal/stacks/stackruntime/internal/stackeval/README.md @@ -0,0 +1,361 @@ +# Terraform Stacks Runtime Internal Architecture + +This directory contains the guts of the Terraform Stacks language runtime. +The public API to this is in the package two levels above this one, +called `stackruntime`. + +The following documentation is aimed at future maintainers of the code in +this package. There is no end-user documentation here. + +## Overview + +If you're arriving here familiar with the runtime of the traditional Terraform +language used for modules -- which we'll call the "modules runtime" in the +remainder of this document -- you will find that things work quite differently +in here. + +The modules runtime works by first explicitly building a dependency graph and +then performing a concurrent walk of that graph, visiting each node and asking +it to "evaluate" itself. "Evaluate" could mean something as simple as just +tweaking some in-memory data, or it could involve a time-consuming call to +a provider plugin. The nodes all collaborate via a shared mutable data structure +called `EvalContext`, which nodes use both to read from and modify the state, +plan, and other relavant metadata during evaluation. + +The stacks runtime is solving broadly the same problem -- scheduling the +execution of various calculations and side-effects into an appropriate order -- +but does so in a different way that relies on an _implicit_ data flow graph +constructed dynamically during evaluation. + +The evaluator does still have a sort of global "god object" that everything +belongs to, which is an instance of type `Main`. However, in this runtime +that object is the entry point to a tree of other objects that each encapsulate +the data only for a particular concept within the language, with data flowing +between them using method calls and return values. + +## Config objects vs. Dynamic Objects + +There are various pairs of types in this package that represent a static object +in the configuration and dynamic instances of that object respectively. + +For example, `InputVariableConfig` directly represents a `variable` block +from a `.tfstack.hcl` file, while `InputVariable` represents the possibly-many +dynamic instances of that object that can be caused by being within a stack +that was called using `for_each`. + +In general, the static types are responsible for "static validation"-type tasks, +such as checking whether expressions refer to instances of other configuration +objects where the configuration object itself doesn't even exist, let alone +any instances of it. The goal is to perform as many checks as possible as +static checks, because that allows us to give feedback about detected problems +as early as possible (during the validation phase), and also avoids redundantly +reporting errors for these problems multiple times when there are multiple +instances of the same problematic object. + +Dynamic types are therefore responsible for everything that needs to respond +to dynamic expression evaluation, and anything which involves interacting with +external systems. For example, creating a plan for a component must be dynamic +because it involves asking providers to perform planning operations that might +contact external servers over the network, and then anything which makes use +of the results from planning is itself a dynamic operation, transitively. + +## Calls vs. Instances + +A subset of the object types in this package have an additional distinction +aside from Config vs. Dynamic. + +`StackCall`, `Component`, and `Provider` all represent dynamic instances +of objects in the configuration that can themselves produce dynamic child +objects. `StackCallInstance`, `ComponentInstance`, and `ProviderInstance` +represent those specific instances. + +What all of these types have in common is that the configuration constructs +they represent each support a `for_each` argument for dynamically declaring +zero or more instances of the object. + +The breakdown of responsibilities for this process has three parts. We'll +use components for the sake of example here, but the same breakdown applies +to stack calls and provider configurations too: + +* `ComponentConfig` represents the actual `component` block in the configuration, + and is responsible for verifying that the component declaration is even valid + regardless of any dynamic information. +* `Component` represents a dynamic instance of one of those `component` blocks, + in the context of a particular stack. This deals with the situation where + a component is itself inside a child stack that was called using a `stack` + block which had `for_each` set, and therefore there are multiple instances + of this component block even before we deal with the component block's _own_ + `for_each` argument. + + The `Component` type is responsible for evaluating the `for_each` expression. + + The `Component` type is also responsible for producing the value that + would be placed in scope to handle a reference like `component.foo`, + which it does by collecting up the results from each instance implied + by the `for_each` expression and returning them as a mapping. +* `ComponentInstance` represents just one of the instances produced by the + `component` block's own `for_each` expression. + + This type is therefore responsible for evaluating any of the arguments + that are permitted to refer to `each.key` or `each.value` and could + therefore vary between instances. It's also responsible for the main + dynamic behavior of components, which is creating plans, applying them, + and reporting their results. + +## Object Singletons + +Almost everything reachable from a `Main` object must be treated as a singleton, +because these objects contain the tracking information for asynchronous +work in progress and the results of previously-completed asynchronous work. + +The guarantee of ensuring that each object is indeed treated as a singleton +is the responsiblity of some other object which we consider the child to be +contained within. + +For example, the `Main` object itself is responsible for instantiating the +`Stack` object representing the main stack (aka the "root" stack) and then +remembering it so it can return the same object on future requests. However, +any child stacks are tracked inside the state of the root stack object, +and so the root stack is responsible for ensuring the uniqueness of those +across multiple calls. This continues down the tree, with every object +except the `Main` object being the responsibility of exactly one managing +parent. + +Failing to preserve this guarantee would cause duplicate work and potentially +inconsistent results, assuming that the work in question does not behave as a +pure function. To help future maintainers preserve the guarantee, there is +a convention that new instances of all of the model types in this package +are produced using an unexported function, such as `newStack`, and that +each of those functions must be called only from one other place within +the managing parent of each type. + +(Stacks themselves are a slight exception to this rule because the managing +parent of the main stack is `Main` while the managing parent of all other +stacks is the parent `Stack`. There must therefore be two callsites for +`newStack`, but they are written in such a way as to avoid trampling on each +other's responsibilities.) + +The actual singleton objects are retained in an unexported map inside the +managing parent. They are typically created only on first request from +some other caller, via a method of the managing parent. The resulting new object +is then saved in the map to be returned on future calls. + +Instances of the `...Config` types should typically be singleton per +`Main` object, because they are static by definition. + +Instances of dynamic types are actually only singleton per _evaluation phase_, +since e.g. the behavior of a `ComponentInstance` is different when we're trying +to create a plan than when we are trying to apply a plan previously created. +More on that in the next section. + +## Evaluation Phases + +Each `Main` object is typically instantiated for only one evaluation phase, +which from the external caller's perspective is controlled by which of the +factory functions they call. + +Internally we track evaluation phases as instances of `EvalPhase`, which +is a comparable type that we use internally to differentiate between the +singletons created for one phase and the singletons created for another. + +Since currently each `Main` has only one evaluation phase, this is actually +technically redundant: a `Main` instantiated for planning would produce +only objects for the `PlanPhase` phase. + +However, the implementation nonetheless tracks a separate pool of singletons +per phase and requires any operation that performs expression evaluation to +explicitly say which evaluation phase it's for, as some insurance both against +bugs that might otherwise be quite hard to track down and against possible +future needs that might call for us needing to blend work for multiple phases +into the same `Main` object for some reason. + +* `NewForValidating` returns a `Main` for `ValidatePhase`, which is capable + only of static validation and will fail any dynamic evaluation work. +* `NewForPlanning` returns a `Main` for `PlanPhase`, bound to a particular + prior state and planning options. +* `NewForApplying` returns a `Main` for `ApplyPhase`, bound to a particular + stack plan, which itself includes the usual stuff like the prior state, + the planned changes, input variable values that were specified during + planning, etc. +* `NewForInspecting` returns a `Main` for `InspectPhase`, which is a special + phase that is intended for implementing less-commonly-used utilities such + as something equivalent to `terraform console` but for Stacks. In this + case, the evaluator is bound only to a prior state, and just returns values + directly from that state without trying to plan or apply any changes. + + This phase is also handy for unit testing of parts of the runtime that + don't rely on external side-effects; many of the unit tests in this + package do their work in `InspectPhase`, particularly if testing an + object whose direct behavior does not vary based on the evaluation + phase. It's still important to test in other phases for operations whose + behavior varies by phase, of course! + +## Expression Evaluation + +The most important cross-cutting behavior in the language runtime is the +evaluation of user-provided expressions. The main function for that is +`EvalExpr`, but there's also `EvalBody` for evaluating all of the expressions +in a dynamic body at once, and extensions such as `EvalExprAndEvalContext` +which also returns some of the information that was used during evaluation +so that callers can produce more helpful diagnostic messages. + +The actual evaluation process involves two important concepts: + +- `EvaluationScope` is an interface implemented by objects that can have + expressions evaluated inside them. Each `Stack` is effectively a + "global scope", and then some child objects like `Component`, `StackCall`, + and `Provider` act as _child_ scopes which extend the global scope with + local context like `each.key`, `each.value`, and `self`. + + An evaluation scope's responsibility is to translate a `stackaddrs.Reference` + (a representation of an already-decoded reference expression) into an object + that implements `Referenceable`. + +- `Referenceable` is an interface implemented by objects that can be referred + to in expressions. For example, a reference expression like `var.foo` + should refer to an `InputVariable` object, and so `InputVariable` implements + `Referenceable` to decide the actual value to use for that reference. + + The responsibility of an implementation of this interface is simply to + return a `cty.Value` to insert into the expression scope for a particular + `EvalPhase`. For example, a `Component` object implements this interface + by returning an object containing all of the output values from the + component's plan when asked for `PlanPhase`, but returns the output values + from the final state instead when asked for `ApplyPhase`. + +Overall then, the expression evaluation process has the following main steps: + +1. Analyze the expression or collection of expressions to find all of the + HCL symbol references (`hcl.Traversal` values). +2. Use `stackaddrs.ParseReference` to try to raise the reference into one of + the higher-level address types, wrapped in a `stackaddrs.Reference`. + + We fail at this step for syntactically-invalid references, but this step + has no access to the dynamic symbol table so it cannot catch references to + objects that don't exist. +3. Pass the `stackaddrs.Reference` value to the caller's selected + `EvaluationScope` implementation, which checks whether the address refers + to an object that's actually declared, and if so returns that object. + This uses `EvaluationScope.ResolveExpressionReference`. + + This step fails if the reference is syntactically valid but refers to + something that isn't actually declared. + + Objects that expressions can refer to must implement `Referenceable`. +4. Call `ExprReferenceValue` on each of the collected `Referenceable` objects, + passing the caller's `EvalPhase`. + + That method must then return a `cty.Value`. If something has gone wrong + upstream that prevents returning a concrete value, the method should return + some kind of unknown value -- ideally with a type constraint, but as + `cty.DynamicVal` as a last resort -- so that evaluation can continue + downstream just enough to let the call stacks all unwind and collect + all the error diagnostics up at the top. +5. Assemble all of the collected values into a suitably-shaped `hcl.EvalContext`, + attach the usual repertiore of available functions, and finally ask the + original expression to evaluate itself in that evaluation context. + + Failures can occur here if the expression itself is invalid in some way, + such as trying to add together values that cannot convert to number, or + other similar kinds of type/value expectation mismatch. + +## Checked vs. Unchecked Results + +Data flow between objects in a particular evaluator happens mostly on request. + +For example, if a `component` block contains a reference to `var.foo` then +as part of evaluating that expression the `Component` or `ComponentInstance` +object will (indirectly, through the expression evaluator) ask the +`InputVariable` object for `variable "foo"` to produce its value, and only +at that point will the `InputVariable` object begin the work of evaluating +that value, which could involve evaluating yet another expression, and so on. + +Because the flow of requests between objects is dynamic, and because many +different requesters can potentially ask for the same result via different +call paths, if an error or warning diagnostic is returned we need to make sure +_that_ propagates by only one return path to avoid returning the same +diagnostic message multiple times. + +To deal with that problem, operations that can return diagnostics are typically +split into two methods. One of them has a `Check` prefix, indicating that +it is responsible for propagating any diagnostics, and the other lacks the +prefix. + +For example, `InputVariable` has both `Value` and `CheckValue`. The latter +returns `(cty.Value, tfdiags.Diagnostics)`, while the former just wraps the +latter and discards the diagnostics completely. + +This strategy assumes two important invariants: +- Every fallible operation can produce some kind of inert placeholder result + when it fails, which we can use to unwind everything else that's depending + on the result without producing any new errors. (or, in some cases, producing + a minimal amount of additional errors that each add more information than + the original one did, as a last resort when the ideal isn't possible). +- Only one codepath is responsible for calling the `Check...` variant of the + function, and everything else will use the unprefixed version and just + deal with getting a placeholder result sometimes. + +This is quite different than how we've dealt with diagnostics in other parts +of Terraform, and does unfortunately require some additional care under future +maintenence to preserve those invariants, but following the naming convention +across all of the object types will hopefully make these special rules easier +to learn and then maintain under future changes. + +In practice, the one codepath that calls the `Check...` variants is the +"walk" codepath, which is discussed in the next section. + +## Static and Dynamic "Walks" + +As discussed in the previous section, most results in the stacks runtime +are produced only when requested. That means that if no other object in +the configuration were to include an expression referring to `var.foo`, +it might never get any opportunity to evaluate itself and raise any errors +in its declaration or definition. + +To make sure that every relevant object gets visited at least once, each of +the main evaluation phases (not `InspectPhase`) has at least one "walk" +associated with it, which navigates the entire tree of relevant objects +accessible from the `Main` object and calls a phase-specific method on +each one. + +There are two "walk drivers" that arrange for traversing different subsets +of the objects: +- The "static" walk is used for both `ValidatePhase` and `PlanPhase`, and + visits only the objects of `Config`-suffixed types, representing static + configuration objects. +- The "dynamic" walk is used for both `PlanPhase` and `ApplyPhase`, and + visits both the main dynamic objects (the ones of types with no special + suffix) and the objects of `Instance`-suffixed types that represent + dynamic instances of each configuration object. + +The "walk driver" decides which objects need to be visited, calling a callback +function for each object. Each phase calls a different method of each visited +object in its callback: +- `ValidatePhase` calls the `Validate` method of interface `Validatable`, + which is only allowed to return diagnostics and should not have any + externally-visible side-effects. +- `PlanPhase` calls the `PlanChanges` method of interface `Plannable`, + which can return an arbitrary number of "planned change" objects that + should be returned to the caller to contribute to the plan, and an arbitrary + number of diagnostics. +- `ApplyPhase` calls the `CheckApply` method of interface `ApplyChecker`, + which is responsible for collecting the results of apply actions that are + actually scheduled elsewhere, since the runtime wants a little more control + over the execution of the side-effect heavy apply actions. This returns am + arbitrary number of "applied change" objects that each represents a + mutation of the state, and an arbitrary number of diagnostics. + +Those who are familiar with Terraform's modules runtime might find this +"walk" idea roughly analogous to the process of building a graph and then +walking it concurrently while preserving dependencies. The stack runtime +walks are different in that they are instead walking the _tree_ of objects +accessible from `Main`, and they don't need to be concerned about ordering +because the dynamic data flow between the different objects -- where a method +of one object can block on the completion of a method of another -- causes a +suitable evaluation order automatically. + +The scheduling here is dynamic and emerges automatically from the control +flow. The runtime achieves this by having any operation that depends on +expensive or side-effect-ish work from another object pass the data using +the promises and tasks model implemented by +[package `promising`](../../../../promising/README.md).