--- authors: Forrest Marshall (forrest@goteleport.com) state: draft --- # RFD 90 - Upgrade System ## Required Approvers * Engineering: @klizhentas && (@zmb3 || @rosstimothy || @espadolini) * Product: (@klizhentas || @xinding33) ## What System for automatic upgrades of teleport installations. ## Why Teleport must be periodically updated in order to integrate security patches. Regular updates also ensure that users can take advantage of improvements in stability and performance. Outdated teleport installations impose additional burdens on us and on our users. Teleport does not currently assist with upgrades in any way, and the burden of manual upgrades can be prohibitive. Reducing the friction of upgrades is beneficial both in terms of security and user experience. Doing this may also indirectly lower our own support load. Upgrades may be particularly beneficial for deployments where instances may run on infrastructure that is not directly controlled by cluster administrators (teleport cloud being a prime example). ## Intro ### Suggested Reading While not required, it is helpful to have some familiarity with [The Update Framework](https://theupdateframework.com/) when reading this RFD. TUF is a flexible framework for securing upgrade systems. It provides a robust framework for key rotation, censorship detection, package validation, and much more. ### High-Level Goals 1. Maintain or improve the security of teleport installations by keeping them better updated and potentially providing more secure paths to upgrade. 2. Improve the experience of teleport upgrade/administration by reducing the need for manual intervention. 3. Improve the auditability of teleport clusters by providing insight into, and policy enforcement for, the versioning of teleport installations. 4. Support a wide range of uses by providing a flexible and extensible set of tools with support for things like user-provided upgrade scripts and custom target selection. 5. Provide options for a wide range of deployment contexts (bare-metal, k8s, etc). 6. Offer a simple "batteries included" automatic upgrade option that requires minimal configuration and "just works" for most non-containerized environments. ### Abstract Model Overview This document proposes a modular system capable of supporting a wide range of upgrade strategies, with the intention being that the default or "batteries included" upgrade strategy will be implemented primarily as a set of interchangeable components which can be swapped out and/or extended. The proposed system consists of at least the following components: - `version-directive`: A static resource that describes the desired state of versioning across the cluster. The directive includes matchers which allow the auth server to match individual teleport instances with both the appropriate installation target, and the appropriate installation method. This resource may be periodically generated by teleport or by some custom external program. It may also be manually created by an administrator. See the [version directives](#version-directives) section for details. - `version-controller`: An optional/pluggable component responsible for generating the `version-directive` based on some dynamic state (e.g. a server which publishes package versions and hashes). A builtin `version-controller` would simply be a configuration resource from the user's perspective. A custom/external version controller would be any program with the permissions necessary to update the `version-directive` resource. See the [version controllers](#version-controllers) section for details. - *version reconciliation loop*: A control loop that runs in the auth server which compares the desired state as specified by the `version-directive` with the current state of teleport installations across the cluster. When mismatches are discovered, the appropriate `installer`s are run. See the [Version Reconciliation Loop](#version-reconciliation-loop) section for details. - `installer`: A component capable of attempting to affect installation of a specific target on one or more teleport hosts. The auth server needs to know enough to at least start the process, but the core logic of a given installer may be an external command (e.g. the proposed `local-script` installer would cause each teleport instance in need of upgrade to run a user-supplied script locally and then restart). From the perspective of a user, an installer is a teleport configuration object (though that configuration object may only be a thin hook into the "real" `installer`). Whether or not the teleport instance being upgraded needs to understand the installer will vary depending on type. See the [Installers](#installers) section for details. There is room in the above model for a lot more granularity, but it gives us a good framework for reasoning about how state is handled within the system. Version controllers generate version directives describing what releases should be running where and how to install them. The version control loop reconciles desired state with actual state, and invokes installers as-needed. Installers attempt to affect installation of the targets they are given. ### Implementation Phases Implementation will be divided into a series of phases consisting of 1 or more separate PRs/releases each: - Setup: Changes required for inventory status/control model, but not necessarily specific to the upgrade system. - Notification-Only System: Optional phase intended to deliver value sooner at the cost of overall feature development time. - Script-Based Installs MVP: Early MVP supporting only manual control and simple script-based installers for non-auth instances. - TUF-Based System MVP: Fully functional but minimalistic upgrade system based on TUF (still excludes auth instances). - Stability & Polish: Additional necessary features (including auth installs and local rollbacks). Represents the point at which the core upgrade system can be considered "complete". - Extended Feature Set: A collection of nice-to-haves that we're pretty sure folks are going to want. See the [implementation plan](#implementation-plan) section for detailed breakdown of the required elements for each phase. ## Details ### Usage Scenarios Hypothetical usage scenarios that we would like to be able to support. #### Notification-Only Usecase Cluster administrators may or may not be using teleport's own install mechanisms, but they want teleport to be able to inform them when instances are outdated, and possibly generate alerts if teleport is running a deprecated version and/or there is a newer security patch available. In this case we want options for both displaying suggested version (or just a "needs upgrade" badge) on inventory views, and also probably some kind of cluster-wide alert that can be made difficult to ignore (e.g. a message on login or a banner in the web UI). We also probably want an API that will support plugins that can emit alerts to external locations (e.g. a slack channel). In this usecase teleport is serving up recommendations based on external state, so a client capable of discovery (e.g. the `tuf` version controller) is required, but the actual ability to affect installations may not be necessary. #### Minimal Installs Usecase Cluster administrators manually specify the exact versions/targets for the cluster, and have a specific in-house installation script that should be used for upgrades. The teleport cluster may not even have any access to the public internet. The installation process is essentially a black box from teleport's perspective. The target may even be an internally built fork of teleport. In this case, we want to provide the means to specify the target version and desired script. Teleport should then be able to detect when an instance is not running the target that it ought to, and invoke the install script. Teleport should not care about the internals of how the script plans to perform the installation. Instances that require upgrades run the script, and may be required to perform a graceful restart if the script succeeds. The script may expect inputs (e.g. version string), and there may be different scripts to run depending on the nature of the specific node (e.g. `prod-upgrade`, `staging-upgrade` or `apt-upgrade`, `yum-upgrade`), but things like selecting the correct processor architecture or OS compatibility are likely handled by the script. In this minimal usecase teleport's role is primarily as a coordinator. It detects when and where user provided scripts should be run, and invokes them. All integrity checks are the responsibility of the user-provided script, or the underlying install mechanism that it invokes. #### Automatic Installs Usecase Cluster administrators opt into a mostly "set and forget" upgrade policy which keeps their teleport cluster up to date automatically. They may wish to stay at a specific major version, but would like to have patches and minor backwards-compatible improvements come in automatically. They want features like maintenance schedules to prevent a node from being upgraded when they need it available, and automatic rollbacks where nodes revert to their previous installation version of they are unhealthy for too long. They also want teleport to be able to upgrade itself without dependencies. This usecase requires the same coordination powers as the minimal usecase, but also a lot more. Teleport needs to be able to securely and reliably detect new releases when they become available. Teleport needs to be able to evaluate new releases in the context of flexible upgrade policies and select which releases (and which targets within those releases) are appropriate and when they should be installed. Teleport needs to be able to download and verify installation packages, upgrade itself, and monitor the health of newly installed versions. In this maximal usecase, teleport is responsible for discovery, selection, coordination, validation, and monitoring. Most importantly, teleport must do all of this in a secure and reliable manner. The potential fallout from bugs and vulnerabilities is greater than in the minimal usecase. #### Plan/Apply Usecase Cluster administrators want automatic discovery of new versions and the ability to trigger automatic installs, but they want manual control over when installs happen. They may also wish for additional controls/checks such as multiparty approval for upgrades and/or the ability to perform dry runs that attempt to detect potential problems early. The core `plan`/`apply` usecase is mostly the same as automatic installs (minus the automatic part), but the more advanced workflows require additional features. Multiparty approval and dry runs both necessitate a concept of "pending" version directives, and dry runs require that all installers expose a dry run mode of some kind. #### Hacker Usecase Cluster administrators march to the beat of their own drum. They want to know the latest publicly available teleport releases, but skip all prime number patches. Nodes can only be upgraded if it is low tide in their region, the moon is waxing, and the ISS is on the other side of the planet. They want to use teleport's native download and verification logic, but they also need to start the downloaded binary in a sandbox first to ensure it won't trigger their server's self-destruct. If rollback is necessary, the rollback reason and timestamp need to be steganographically encoded into a picture of a turtle and posted to instagram. This usecase has essentially the same requirements as the automatic installs usecase, with one addition. It necessitates *loose coupling* of components. ### Security Due to the pluggable nature of the system proposed here, it is difficult to make *general* statements about the security model. This is because most of the responsibilities of an upgrade system (package validation, censorship resistance, malicious downgrade resistance, etc) are responsibilities that fall to the pluggable components. That being said, we can lay down some specific principals: - Version controllers should have some form of censorship detection (e.g. the TUF controller verifies that the package metadata it downloads has been recently re-signed by a hot key to prove liveness). Teleport will provide a `stale_after` field for version directives so that failure to gather new state is warned about, but additional warnings generated by the controller itself are encouraged. - Installers must fail if they are not provided with sufficient target information to ensure that the acquired package matches the target (e.g. if installation is delegated to an external package manager that is deemed trusted this might be as simple as being explicit about version, but in the case of the TUF installer this means rejecting target specifications that don't include all required TUF metadata). - We should encourage decentralized trust. The TUF based system should leverage TUF's multisignature support to ensure that compromise of a single key cannot compromise installations. We should also provide tools to help those using custom installation mechanism to avoid single-point failures as well (e.g. multiparty approval for pending `version-directive`s), and the ability to cross-validate their sources with the TUF validation metadata pulled from our repos. - Teleport should have checks for invalid state-transitions independently of any specific controller. #### TUF Security We won't be re-iterating all of the attack vectors that (correct) usage of TUF is intended to protect against. I suggest at least reading the [attacks](https://theupdateframework.github.io/specification/v1.0.28/index.html#goals-to-protect-against-specific-attacks) section of the specification. Instead we will zoom in on how we intend to use TUF for our purposes. TUF provides a very good mechanism for securely getting detailed package metadata distributed to clients, including sufficient information to verify downloaded packages, and to ensure that censorship and tampering can be detected. The trick to making sure a TUF-based system really lives up to the promise of the framework, is to have a good model for how the TUF metadata is generated and signed in the first place. This is where we come to the heart of our specific security model. We will leverage TUF's thresholded signature system and go's ability to produce deterministic builds in order to establish isolated cross-checks that can independently produce the same TUF metadata for a given release. At a minimum, we will have two separate signers: - Build Signer: Our existing build infrastructure will be extended to generate and sign TUF metadata for all release artifacts (or at least the subset that can be built deterministically). - Verification Signer: A separate environment isolated from the main build system will independently build all deterministic artifacts. All metadata will be independently generated and signed by this system. With this dual system in place, we can ensure that compromised build infrastructure cannot compromise the upgrade system (and be able to detect compromises essentially immediately). If we can manage to fully isolate the two environments such that no teleport team member has access to both environments, we should be able to secure the upgrade system from any single compromise short of a direct compromise of our public repositories. All of the above presumes that no exploits are found in TUF itself, or its official go library, s.t. TUF's core checks (multisignature verification, package/metadata validation, etc) could be directly or indirectly circumvented. The TUF spec has been audited multiple times, but the most recent audit as of the time of writing was performed in 2018 and did not cover the go implementation specifically. In order to further mitigate potential TUF related issues, we will wrap all download and TUF metadata retrieval operations in our own custom API with required TLS authentication. TUF metadata will be used only as an additional verification check, and will not be used to discover the identity from which a package should be downloaded (i.e. malicious TUF metadata won't be able to change _where_ we download a package from). The intent here will be to ensure that a vulnerability in TUF itself cannot be exploited without also compromising the TLS client and/or our own servers directly. This means we won't be taking advantage of TUF's ability to support unauthenticated mirrors, but since we have no immediate plans to support that feature anyhow, adding this further layer of security has no meaningful downside. ### Inventory Control Model - Auth servers exert direct control over non-auth instance upgrades via bidirectional GRPC control stream. - Non-auth instances advertise detailed information about the current installation, and implement handlers for control stream messages that can execute whatever local component is required for running a given install method (e.g. executing a specific script if the `local-script` installer is in use). - Each control stream is registered with a single auth server, so each auth server is responsible for triggering the upgrade of a subset of the server inventory. In order to reduce thundering herd effects, upgrades will be rolling with some reasonable default rate. - Upgrade decisions are level-based. Remote downgrades and retries are an emergent property of a level-based system, and won't be given special treatment - The auth server may skip a directive that it recognizes as resulting in an incompatible change in version (e.g. skipping a full major version). - By default, semver pre-release installations are not upgraded (e.g. `1.2.3-debug.2`). - In order to avoid nearly doubling the amount of backend writes for existing large clusters (all of whose instances are predominantly ssh services), the existing "node" resource (which would be more accurately described as the `ssh_server` resource), will be repurposed to represent a server installation which may or may not be running an ssh service. Whether or not other services would also benefit from unification in this way can be evaluated on a case-by-case basis down the road. - In order to support having a single control stream per teleport instance (rather than separate control streams for each service) we will need to refactor how instance certs are provisioned. Currently, separate certs are granted for each service running on a instance, with no single certificate ever encoding all the permissions granted by the instance's join token. Hypothetical GRPC spec: ```protobuf // InventoryService is a subset of the AuthService (broken out for the readability) service InventoryService { // InventoryControlStream is a bidirectional stream that handles presence and // control messages for peripheral teleport installations. rpc InventoryControlStream(stream ClientMessage) returns (stream ServerMessage); } // ClientMessage is sent from the client to the server. message ClientMessage { oneof Msg { // Hello is always the first message sent. ClientHello Hello = 1; // Heartbeat periodically updates status. Heartbeat Heartbeat = 2; // LocalScriptInstallResult notifies of installation failures. LocalScriptInstallResult LocalScriptInstallResult = 3; } } // ServerMessage is sent from the server to the client. message ServerMessage { oneof Msg { // Hello is always the first message sent. ServerHello Hello = 1; // LocalScriptInstall instructs the client to perform a local-script // upgrade operation. LocalScriptInstall LocalScriptInstall = 2; } } // ClientHello is the first message sent by the client and contains // information about the client's version, identity, and claimed capabilities. // The client's certificate is used to validate that it has *at least* the capabilities // claimed by its hello message. Subsequent messages are evaluated by the limits // claimed here. message ClientHello { // Version is the currently running teleport version. string Version = 1; // ServerID is the unique ID of the server. string ServerID = 2; // Installers is a list of supported installers (e.g. `local-script`). repeated string Installers = 3; // ServerRoles is a list of teleport server roles (e.g. ``). repeated string ServerRoles = 4; } // Heartbeat periodically message Heartbeat { // TODO } // ServerHello is the first message sent by the server. message ServerHello { // Version is the currently running teleport version. string Version = 1; } // LocalScriptInstall instructs a teleport instance to perform a local-script // installation. message LocalScriptInstall { // Target is the install target metadata. map Target = 1; // Env is the script env variables. map Env = 2; // Shell is the optional shell override. string Shell = 3; // Script is the script to be run. string Script = 4; } // LocalScriptInstallResult informs auth server of result of a local-scrip installer // running. This is a best-effort message since some local-script installers may restart // the process as part of the installation. message LocalScriptInstallResult { bool Success = 1; string Error = 2; } ``` ### Inventory Status and Visibility We face some non-trivial constraints when trying to track the status and health of ongoing installations. These aren't problems per-say, but they are important to keep in mind: - Teleport instances are ephemeral and can be expected to disappear quite regularly, including mid-install. As such, we can't make a hard distinction between a node disappearing due to normal churn, and a node disappearing due to a critical issue with the install process. - Backend state related to teleport instances is not persistent. A teleport instance should have its associated backend state cleaned up in a reasonable amount of time, and the auth server should handle instances for which no backend state exists gracefully. - The flexible/modular nature of the upgrade system means that there is a very significant benefit to minimizing the complexity of a component's interface/contract. E.g. a `local-script` installer that just runs an arbitrary script is much easier for a user to deal with than one that must expose discrete download/run/finalize/rollback steps. - Ordering in distributed systems is hard. With the above in mind, lets look at some basic ideas for how to track installation state: - Immediately before triggering a local install against a server, the auth server must update that server's corresponding backend resource with some basic info about the install attempt (time, installer, current version, target version, etc). The presence of this information does not guarantee that an install attempt was ever made (e.g. the auth server might have crashed after writing, but before sending). - Auth servers will use CompareAndSwap operations when updating server resources to avoid overwriting concurrent updates from other auth servers. This is important because we don't want two auth servers to send install messages to the same instance in quick succession, and we also don't want to accidentally lose information related to install attempts. - An instance *may*, but is not required to, send various status updates related to an install attempt after it has been triggered. As features are added into the upgrade system (e.g. local rollbacks) new messages with special meanings can be added to improve the reliability and safety of rollouts. - Auth servers will make inferences based on the available information attached to server inventory resources to roughly divide them into the following states: - `VersionParity`: server advertises the correct version (or no version directive matches the server) and the server was not recently sent any install messages. - `NeedsInstall`: server advertises different version than the one specified in its matching version directive, and no recent install attempts have been made. - `InstallTriggered`: install was triggered recently enough that it is unclear what the result is. - `RecentInstall`: server has recently sent a local install message, and is now advertising a version matching the target of that message. Whether recency in this case should be measured in time, number of heartbeats, or some combination of both is an open question, but it is likely that we'll need to tolerate some overlap where heartbeats advertising two different versions are interleaved. We should try to limit this possibility, but eliminating it completely is unreasonable. - `ChurnedDuringInstall`: server appears to have gone offline immediately before, during, or immediately after an installation. It is impossible to determine whether this was caused by the install attempt, but for a given environment there is some portion/rate of churn that, if exceeded, is likely significant. - `ImplicitInstallFault`: server is online but seems to have failed to install the new version for some reason. Its possible that the server never got the install message, or that it performed a full install and rollback, but could not update its status for some reason. - `ExplicitInstallFault`: server is online and seems to have failed to install the new version for some reason, but has successfully emitted at least one error message. For a `local-script` installer this likely just means that the script had a non-zero exit code, but for a builtin installer we may have a failure message with sufficient information to be programmatically actionable (e.g. `Rollback` vs `DownloadFailed`). - By aggregating the counts of servers in the above states by target, version, and installer the auth servers can generate health metrics to assess the state of an ongoing rollout, potentially halting it if some threshold is reached (e.g. `max_churn`). Hypothetical inventory view: ``` $ tctl inventory ls Server ID Version Services Status ------------------------------------ ------- ----------- ----------------------------------------------- eb115c75-692f-4d7d-814e-e6f9e4e94c01 v0.1.2 ssh,db installing -> v1.2.3 (17s ago) 717249d1-9e31-4929-b113-4c64fa2d5005 v1.2.3 ssh,app online (32s ago) bbe161cb-a934-4df4-a9c5-78e18b599601 v0.1.2 ssh churned during install -> v1.2.3 (6m ago) 5e6d98ef-e7ec-4a09-b3c5-4698b10acb9e v0.1.2 k8s online, must install >= v1.2.2 (eol) (38s ago) 751b8b44-5f96-450d-b76a-50504aa47e1f v1.2.3 ssh online (14s ago) 3e869f3f-8caa-4df3-aa5c-0a85e884a240 v1.2.3 db offline (12m ago) 166dc9b9-fc85-44a0-96ca-f4bec069aa92 v1.2.1 k8s online, must install >= v1.2.2 (sec) (12s ago) f67dbc3a-2eff-42c8-87c2-747ee1eedb56 v1.2.1 proxy online, install soon -> v1.2.3 (46s ago) 9db81c94-558a-4f2d-98f9-25e0d1ec0214 v1.2.2 k8s online, install recommended -> v1.2.3 (20s ago) 5247f33a-1bd1-4227-8c6e-4464fee2c585 v1.2.3 auth online (21s ago) ... Warning: 1 instance(s) need upgrade due to newer security patch (sec). Warning: 1 instance(s) need upgrade due to having reached end of life (eol). ``` Some kind of status summary should also exist for the version-control system as a whole. I'm still a bit uncertain about how this should be formatted and what all should be in it, but key points like the current versioning source, targets, and installers should be covered, as well as stats on recent installs/faults/churns: ``` $ tctl version-control status Directive: Source: tuf/default Status: active Promotion: auto Installers: Kind Name Status Recent Installs Installing Faults Churned ------------ ----------- ------- --------------- ---------- ------ ------- tuf default enabled 6 2 1 1 local-script apt-install enabled 3 2 - 2 Inventory Summary: Current Version Target Version Count Recent Installs Installing Faults Churned --------------- -------------- ----- --------------- ---------- ------ ------- v1.2.3 v2.3.4 12 - 4 1 3 v2.3.4 - 10 9 - - - v3.4.5-beta.1 - 2 - - - - v0.1.2 - 1 - - - - Critical Versioning Alerts: Version Alert Count ------- --------------------------------- ----- v1.2.3 Security patch available (v2.3.4) 12 v0.1.2 Version reached end of life 1 ``` ### Version Reconciliation Loop The version reconciliation loop is a level-triggered control loop that is responsible for determining and applying state-transitions in order to make the current inventory versioning match the desired inventory versioning. Each auth server runs its own version reconciliation loop which manages the server control streams attached to that auth server. The core job of the version reconciliation loop is fairly intuitive (compare desired state to actual state, and launch installers to correct the difference). To get a better idea of how it should work in practice, we need to look at the caveats that make it more complex: - We need to use a rolling update strategy with a configurable rate, which means that not all servers eligible for installation will actually have installation triggered on a given iteration. The version directive may change mid rollout, so simply blocking the loop on a given directive until it has been fully applied isn't reasonable. - We need to monitor cluster-wide health of ongoing installations and pause installations if we see excess failures/churn, which means that aggregating information about failures is a key part of the reconciliation loop's job. - We should avoid triggering installs against servers that recently made an install attempt (regardless of success/failure), and we should also avoid sending install messages to servers that just connected or are in the process of graceful shutdown. This means that a server's eligibility for installation is a combination of both persistent backend records, and "live" control stream status. Given the above, the reconciliation loop is a bit more complex, but still falls into three distinct phases: 1. Setup: load cluster-level upgrade system configuration, active `version-directive`, churn/fault stats, etc. 2. Reconciliation: Match servers to target and installer, and categorize them by their current install eligibility given recent install attempts, control stream status, etc. 3. Application: Determine the number of eligible servers that will actually be slated for install given current target rollout rate, update their backend states with a summary of the install attempt that is about to be made (skipping servers which had their installation status concurrently updated), and pass them off to installer-specific logic. As much as possible, we want the real "decision making" power to rest with the `version-controller` rather than the version reconciliation loop. That being said, the version reconciliation loop will have some internal rules that it evaluates to make sure that directives, as applied to the current server inventory, do not result in any invalid state-transitions (e.g. it will refuse to change the target arch for a given server, or skip a major version). ### Version Directives #### The Version Directive Resource The `version-directive` resource is the heart of the upgrade system. It is a static resource that describes the current desired state of the cluster and how to get to that state. This is achieved through a series of matchers which are used to pair servers with installation targets and installers. At its core, a `version-directive` can be thought of as a function of the form `f(server) -> optional(target,installer)`. Installation targets are arbitrary attribute mappings that must *at least* contain `version`, but may contain any additional information as well. Certain metadata is understood by teleport (e.g. `fips:yes|no`, `arch:amd64|arm64|...`), but additional metadata (e.g. `sha256sum:12345...`) is simply passed through to the installer. The target to be used for a given server is the first target that *is not incompatible* (i.e. no attempt to find the "most compatible" target is made). A target is incompatible with a server if that server's version cannot safely upgrade/downgrade to that target version, *or* if the target specifies a build attribute that differs from a build attribute of the current installation (e.g. `fips:yes` when current build is `fips:no`). We don't require that all build attributes are present since not all systems require knowledge of said attributes. It is the responsibility of an installer to fail if it is not provided with sufficient target attributes to perform the installation safely (e.g. the `tuf` installer would fail if the target passed to it did not contain the expected length and hash data). The first compatible installer from the installer list will be selected. Compatibility will be determined *at least* by the version of the instance, as older instances may not support all installer types. How rich of compatibility checks we want to support here is an open question. I am wary of being too "smart" about it (per-installer selectors, pre-checking expected attributes, etc), as too much customization may result in configurations that are harder to review and more likely to silently misbehave. Within the context of installation target matching, version compatibility for a given server is defined as any version within the inclusive range of `vN.0.0` through `vN+1.*`, where `N` is the current major version of the server. Stated another way, upgrades may keep the major version the same, or increment it by one major version. Downgrades may revert as far back as the earliest release of the current major version. Downgrades to an earlier major version are not supported. All matchers in the `version-directive` resource are lists of matchers that are checked in sequence, with the first matching entry being selected. If a server matches a specific sub-directive, but no installation targets and/or installers in that sub-directive are compatible, that server has no defined `(target,installer)` tuple. Beyond matching installation targets to servers, the `version-directive` also supports some basic time constraints to assist in scheduling, and a `stale_after` field which will be used by teleport to determine if the directive is old enough to start emitting warnings about it (especially useful if directives are generated by external plugins which might otherwise fail silently). Example `version-directive` resource: ```yaml # version directive is a singleton resource that is either supplied by a user, # or periodically generated by a version controller (e.g. tuf, plugin, etc). # this represents the desired state of the cluster, and is used to guide a control # loop that matches install targets to appropriate nodes and installers. kind: version-directive version: v1 metadata: name: version-directive spec: nonce: 2 status: enabled version_controller: static-config confid_id: stale_after: