teleport/rfd/0074-kube-secret-storage.md
Tiago Silva 380f485987
RFD 74: Kubernetes Secret for Agent's Identity storage (#12958)
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
2022-07-29 10:29:48 +00:00

46 KiB
Raw Blame History

authors state
Tiago Silva (tiago.silva@goteleport.com) implemented

RFD 74 - Teleport Kube-Agent credential storage in Kubernetes Native Secrets

Required Approvers

  • Engineering: @r0mant
  • Product: @klizhentas || @xinding33
  • Security: @reedloden

What

Teleport Kubernetes Agent support for dynamic short-lived tokens relying only on native Kubernetes Secrets for identity storage.

Why

When a Teleport Agent wants to join a Teleport Cluster, it needs to share the invite token with the cluster for initial authentication. The invite token can be:

  • Short-lived token (the token will expire after a low TTL and cannot be reused after that time).
  • Long-lived/static token (the usage of long-lived tokens is discouraged for security reasons).

After sharing the invite token with the Teleport cluster, the agent receives its identity from the Auth service. Identity certificates are mandatory for accessing the Teleport Cluster and must be stored for accesses without reusing the invite token.

Kubernetes Pods are, by definition, expected to be stateless. This means that each time a Pod is recycled because it was restarted, deleted, upgraded, or moved to another node, the state that was written to its filesystem is lost.

One way to overcome this problem is to use Persistent Volumes. PV is a Kubernetes feature that mounts a storage volume in the container filesystem, whose lifecycle is independent of the Pod that mounts it. Kubernetes PV storage has its own drawbacks. Persistent Volumes are very distinct between different cloud vendors. Some providers lock volumes to be mounted in the same zone they were created, meaning that if Teleport agent Pod is recycled, it must be maintained in the same zone it was created. This creates operational issues and for on-premises deployments might be difficult to manage.

Another possibility is that the agent might use the invite token each time the pod is recycled, meaning that it issues a join request to the cluster every time it starts and receives a new identity from the Auth service. This means that the invite token must be a static/long-lived token, otherwise after a while the agent could not register himself in the cluster because the invite token expired. This approach is not recommended and might cause security flaws because the join token can be stolen or guessed.

One solution that might address all the issues referenced above is to allow the agent to use Kubernetes secrets as storage backend for process state. This allows not only that the agent is able to run stateless depending only on native objects generally available in any Kubernetes cluster, but also that the agent might support dynamic short-lived invite tokens with no dependency on external storage.

Details

Currently, Teleport Kubernetes Agent joins the cluster based on an invitation token generated by Teleport Auth Service. The invite token is generated by:

$ tctl nodes add --roles=kube,app,db... --ttl=10m --format=json 

Invite tokens allow agents to receive from Teleport a set of signed SSH and TLS certificate-key pairs that can be used to authenticate and register in the cluster, but they also limit the roles that the agent is allowed to use. Each token can be reused several times until it expires, allowing each agent to generate a separate Identity for each role it is running. The agent must store Identity certificates for later reuse in case the invite token expires.

The current RFD outlines the details for extending credentials storage for Teleport agents in regular Kubernetes secret objects.

Secret creation and lifecycle

The agent creates, reads and updates the Kubernetes Secrets.

The agent will never issue a delete request to destroy the secret. Helms post-delete hook achieves this once the operator executes helm uninstall.

Secret content

The agent will store multiple identity, replacement, and state in a single secret. When the agent updates the secret, it will have the following structure:

apiVersion: v1
stringData: 
            # {role} might be kube, app,... if multiple roles are defined for the agent then, 
            #multiple entries are created each one holding its identity
            /ids/{role}/current: {
                "kind": "identity",
                "version": "v2",
                "metadata": {
                    "name": "current"
                },
                "spec": {
                    "key": "{key_content}",
                    "ssh_cert": "{ssh_cert_content}",
                    "tls_cert": "{tls_cert_content}",
                    "tls_ca_certs": ["{tls_ca_certs}"],
                    "ssh_ca_certs": ["{ssh_ca_certs}"]
                }
            }
             # State is important if restart/rollback happens during the CA rotation phase.
             # it holds the current status of the rotation
            /states/{role}/state: {
                "kind": "state",
                "version": "v2",
                ...
            }
            
            # during CA rotation phase, the new keys are stored if agent is restarted or rotation has to rollback
            /ids/{role}/replacement: {
                "kind": "identity",
                "version": "v2",
                "metadata": {
                    "name": "replacement"
                },
                "spec": {
                    "key": "{key_content}",
                    "ssh_cert": "{ssh_cert_content}",
                    "tls_cert": "{tls_cert_content}",
                    "tls_ca_certs": ["{tls_ca_certs}"],
                    "ssh_ca_certs": ["{ssh_ca_certs}"]
                }
            }
    
kind: Secret
metadata:
  name: {$TELEPORT_REPLICA_NAME}-state
  namespace: {$KUBE_NAMESPACE}

Where:

  • ssh_cert is a PEM encoded SSH host cert.
  • key is a PEM encoded private key.
  • tls_cert is a PEM encoded x509 client certificate.
  • tls_ca_certs is a list of PEM encoded x509 certificate of the certificate authority of the cluster.
  • ssh_ca_certs is a list of SSH certificate authorities encoded in the authorized_keys format.
  • TELEPORT_REPLICA_NAME is the teleport agent replica name. In Kubernetes this variable exposes the POD name TELEPORT_REPLICA_NAME=metadata.name.
  • KUBE_NAMESPACE is the Kubernetes namespace in which the agent was installed. It is exposed by an ENV variable.

RBAC Changes

The Teleport agent service account must be able to create, read and update secrets within the namespace, therefore, Helm must create a new namespace role and attach it to the agent service account with the following content:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: { .Release.Name }-secrets
  namespace: { .Release.Namespace }
rules:
- apiGroups: [""]
  # objects is "secrets"
  resources: ["secrets"]
  verbs: ["create", "get", "update","watch", "list"]

The RBAC only allows the service account to read and update the secrets created under { .Release.Namespace }.

Teleport Changes

Kube Secret as storage backend

Teleport will automatically detect when it's running in Kubernetes and use Secrets to store the process state. Auth server data, audit events and session recording will be unaffected and keep using their configured backends.

The Kubernetes Secret backend storage is responsible for managing the Kubernetes secret, i.e. creating the secret if it does not exist, reading and updating it.

If the identity secret exists in Kubernetes and has the node identity on it, the storage engine will parse and return the keys to the Agent, so it can use them to authenticate in the Teleport Cluster. If the cluster access operation is successful, the agent will be available for usage, but if the access operation fails because the Teleport Auth does not validate the node credentials, the Agent will log an error providing insightful information about the failure cause.

In case of the identity secret is not present or is empty, the agent will try to join the cluster with the invite token provided. If the invite token is valid (has details in the Teleport Cluster and did not expire yet), Teleport Cluster will reply with the agent identity. Given the identity, the agent will write it in the secret for future usage.

Otherwise, if the invite token is not valid or has expired, the Agent could not join the cluster, and it will stop and log a meaningful error message.

The following diagram shows the behavior when using Kubernetes Secret backend storage.

                                                              ┌─────────┐                                        ┌────────┐          ┌──────────┐                    
                                                              │KubeAgent│                                        │Teleport│          │Kubernetes│                    
                                                              └────┬────┘                                        └───┬────┘          └────┬─────┘                    
                                                                   ────┐                                             │                    │                          
                                                                       │ init procedure                              │                    │                          
                                                                   <───┘                                             │                    │                          
                                                                   │                                                 │                    │                          
                                                                   │                          Get Secret Data        │                   ┌┴┐                         
                                                                   │────────────────────────────────────────────────────────────────────>│ │                         
                                                                   │                                                 │                   │ │                         
                                                                   │                                                 │                   │ │                         
         ╔══════╤══════════════════════════════════════════════════╪═════════════════════════════════════════════════╪═══════════════════╪═╪════════════════════════╗
         ║ ALT  │  Identity data is present in Secret              │                                                 │                   │ │                        ║
         ╟──────┘                                                  │                                                 │                   │ │                        ║
         ║                                                         │                        returns secret data      │                   │ │                        ║
         ║                                                         │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │                        ║
         ║                                                         │                                                 │                   │ │                        ║
         ║                                                         │ Joining the cluster with identity from secret  ┌┴┐                  │ │                        ║
         ║                                                         │───────────────────────────────────────────────>│ │                  │ │                        ║
         ║                                                         │                                                │ │                  │ │                        ║
         ║                                                         │                                                │ │                  │ │                        ║
         ║                   ╔══════╤══════════════════════════════╪════════════════════════════════════════════════╪═╪═════════════╗    │ │                        ║
         ║                   ║ ALT  │  successful case             │                                                │ │             ║    │ │                        ║
         ║                   ╟──────┘                              │                                                │ │             ║    │ │                        ║
         ║                   ║                                     │Node successfully authenticated and registered  │ │             ║    │ │                        ║
         ║                   ║                                     │in the cluster                                  │ │             ║    │ │                        ║
         ║                   ║                                     │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│ │             ║    │ │                        ║
         ║                   ╠═════════════════════════════════════╪════════════════════════════════════════════════╪═╪═════════════╣    │ │                        ║
         ║                   ║ [identity signed by a different Auth server]                                         │ │             ║    │ │                        ║
         ║                   ║                                     │Node identity signed by a different Auth Server │ │             ║    │ │                        ║
         ║                   ║                                     │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│ │             ║    │ │                        ║
         ║                   ║                                     │                                                │ │             ║    │ │                        ║
         ║                   ║      ╔════════════════════════════╗ ────┐                                            │ │             ║    │ │                        ║
         ║                   ║      ║unable to join the cluster ░║     │ failure state.                             │ │             ║    │ │                        ║
         ║                   ║      ║logs the error              ║ <───┘                                            │ │             ║    │ │                        ║
         ║                   ╚══════╚════════════════════════════╝═╪════════════════════════════════════════════════╪═╪═════════════╝    │ │                        ║
         ╠═════════════════════════════════════════════════════════╪═════════════════════════════════════════════════════════════════════╪═╪════════════════════════╣
         ║ [Identity data is not present in Secret]                │                                                 │                   │ │                        ║
         ║                                                         │                           returns error         │                   │ │                        ║
         ║                                                         │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │                        ║
         ║                                                         │                                                 │                   │ │                        ║
         ║                                                         │               Sends invite code                ┌┴┐                  │ │                        ║
         ║                                                         │───────────────────────────────────────────────>│ │                  │ │                        ║
         ║                                                         │                                                │ │                  │ │                        ║
         ║                                                         │                                                │ │                  │ │                        ║
         ║         ╔══════╤════════════════════════════════════════╪════════════════════════════════════════════════╪═╪══════════════════╪═╪══════════════╗         ║
         ║         ║ ALT  │  successful case                       │                                                │ │                  │ │              ║         ║
         ║         ╟──────┘                                        │                                                │ │                  │ │              ║         ║
         ║         ║                                               │             returns node identity              │ │                  │ │              ║         ║
         ║         ║                                               │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│ │                  │ │              ║         ║
         ║         ║                                               │                                                │ │                  └┬┘              ║         ║
         ║         ║                                               │                  Updates secret data with Identity                   │               ║         ║
         ║         ║                                               │─────────────────────────────────────────────────────────────────────>│               ║         ║
         ║         ║                                               │                                                │ │                   │               ║         ║
         ║         ║                                               │               joins the cluster                │ │                   │               ║         ║
         ║         ║                                               │───────────────────────────────────────────────>│ │                   │               ║         ║
         ║         ╠═══════════════════════════════════════════════╪════════════════════════════════════════════════╪═╪═══════════════════╪═══════════════╣         ║
         ║         ║ [invite code expired]                         │                                                │ │                   │               ║         ║
         ║         ║                                               │          invalid invite token error            │ │                   │               ║         ║
         ║         ║                                               │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│ │                   │               ║         ║
         ║         ║                                               │                                                │ │                   │               ║         ║
         ║         ║      ╔══════════════════════════════════════╗ ────┐                                            │ │                   │               ║         ║
         ║         ║      ║unable to join the cluster           ░║     │ failure state.                             │ │                   │               ║         ║
         ║         ║      ║because the invite might be expired.  ║ <───┘                                            │ │                   │               ║         ║
         ║         ║      ║logs the error                        ║ │                                                │ │                   │               ║         ║
         ║         ╚══════╚══════════════════════════════════════╝═╪════════════════════════════════════════════════╪═╪═══════════════════╪═══════════════╝         ║
         ╚═════════════════════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════╪═════════════════════════╝
                                                                   │                                                 │                    │                          
                                                                   │                                                 │                    │                          
                                                    ╔═══════╤══════╪═════════════════════════════════════════════════╪════════════════════╪═══════════════╗          
                                                    ║ LOOP  │  CA  rotation                                          │                    │               ║          
                                                    ╟───────┘      │                                                 │                    │               ║          
                                                    ║              │              Rotate certificates                │                    │               ║          
                                                    ║              │<───────────────────────────────────────────────>│                    │               ║          
                                                    ║              │                                                 │                    │               ║          
                                                    ║              │                New certificates                 │                    │               ║          
                                                    ║              │<─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │                    │               ║          
                                                    ║              │                                                 │                    │               ║          
                                                    ║              │                     Update secret content       │                    │               ║          
                                                    ║              │─────────────────────────────────────────────────────────────────────>│               ║          
                                                    ╚══════════════╪═════════════════════════════════════════════════╪════════════════════╪═══════════════╝          
                                                              ┌────┴────┐                                        ┌───┴────┐          ┌────┴─────┐                    
                                                              │KubeAgent│                                        │Teleport│          │Kubernetes│                    
                                                              └─────────┘                                        └────────┘          └──────────┘                    

Backend storage

Kubernetes Secret storage will be a transparent backend storage that can be plugged into ProcessStorage structure. Currently, ProcessStorage is responsible for handling the Identity and State storage. It expects a backend.Backend, but it only requires a subset of it. The idea is to define a new interface IdentityBackend with the required methods and use it only for backend storage.

// ProcessStorage is a backend for local process state,
// it helps to manage rotation for certificate authorities
// and keeps local process credentials - x509 and SSH certs and keys.
type ProcessStorage struct {
    identityStorage IdentityBackend
    Backend backend.Backend
}

// IdentityBackend implements abstraction over local or remote storage backend methods
// required for Identity/State storage.
// As in backend.Backend, Item keys are assumed to be valid UTF8, which may be enforced by the
// various Backend implementations.
type IdentityBackend interface {
	// Create creates item if it does not exist
	Create(ctx context.Context, i backend.Item) (*backend.Lease, error)
	// Put puts value into backend (creates if it does not
	// exists, updates it otherwise)
	Put(ctx context.Context, i backend.Item) (*backend.Lease, error)
	// Get returns a single item or not found error
	Get(ctx context.Context, key []byte) (*backend.Item, error)
}

During the startup procedure, the agent identifies that it is running inside Kubernetes. The identification can be achieved by checking the presence of the service account mount path /var/run/secrets/kubernetes.io. Although Kubernetes has an option that can disable this mount path, automountServiceAccountToken: false, this option is always true in our helm chart since we require it for handling the secrets via Kubernetes API and if the service account is not present, Teleport must avoid using secrets and fallback to local storage.

If the agent detects that it is running in Kubernetes, it instantiates an identity backend for Kubernetes Secret. This backend creates a client with configuration provided by restclient.InClusterConfig(), which uses the service account token mounted in the pod. With this, the agent can operate the secret by creating, updating, and reading the secret data.

// NewProcessStorage returns a new instance of the process storage.
func NewProcessStorage(ctx context.Context, path string) (*ProcessStorage, error) {
  var (
    identityStorage IdentityBackend
    err error
  )

  if path == "" {
    return nil, trace.BadParameter("missing parameter path")
  }

  litebk, err := lite.NewWithConfig(ctx, lite.Config{
    Path:      path,
    EventsOff: true,
    Sync:      lite.SyncFull,
  })

  if kubernetes.InKubeCluster() {
    identityStorage, err = kubernetes.New()
    if err != nil {
      return nil, trace.Wrap(err)
    }
    // TODO: add compatibility layer
  } else {
    identityStorage = litebk
  }

  return &ProcessStorage{Backend: litebk, identityStorage: identityStorage}, nil
}

For a compatibility layer, if the secret does not exist in Kubernetes, but locally we have the SQLite database, this means storage is enabled, and the agent had already joined the cluster in the past. Hereby, the agent can inherit the credentials stored in the database and write them in Kubernetes secret (more details).

The storage must also store the events of type kind=state since they are used during the CA rotation phase. They are helpful if the pod restarts, and has to rollback to the previous identity or finish the process and replace the old identity with the new one.

CA Rotation

CA rotation feature allows the cluster operator to force the cluster to recreate and re-issue certificates for users and agents.

The CA rotation process is independent of the agent role and storage and has the following behavior: during the procedure, agents and users receive new keys signed by the newest CA from Teleport Auth. While the cluster is transitioning, the keys signed by the old CA are considered valid in order for the agent to be able to receive its new identity certificates from Teleport Auth. During the whole process, the CA rotation can be rollbacked, and if this happens the new certificate becomes invalid. This action can happen because another agent has issues or by decision of the operator. Given this, the new identity keys and old identity keys are stored separately in the process storage until the rotation process finishes successfully.

Once the new identity is received by the agent, it will store its state, indicating it is under a rotation event, and the new identity in the storage. The state under /states/{role}/state key and the identity under /ids/{role}/replacement. This is mandatory so the agent, after restart, can resume the operation if not finished yet or rollback. If the process fails, the state is rewritten in order to inform the rotation has finished. Otherwise, if the process finishes successfully the process will replace the identity under /states/{role}/current with the new identity and replace the state content.

Switching from local to Kubernetes Secret storage will not change the behavior, it just changes the storage location and storage structure.

Limitations

High availability can only be achieved using Statefulsets and not by Deployments. This means the Helm chart has to switch to Statefulset objects even if previously was running as Deployment. This change is required because at least one invariant must be kept across restarts to correctly map each agent pod and its identity Secret. The invariant used is the Statefulset pod name {{ .Release.Name }}-{0...replicas}}.

Given this, it is required to expose the $TELEPORT_REPLICA_NAME environment variable to each pod in order to the backend storage be able to write the identity separately. The values for $TELEPORT_REPLICA_NAME are populated by Kubernetes with the pod name by setting it to use the fieldPath: metadata.name value.

If the action is an upgrade and previously the agent was running as Deployment, Helm chart will keep the Deployment (it upgrades it to the version the operator chooses) in order to maintain the cluster accessible and install, as well, the Statefulset. After the upgrade process finishes the Helm Chart will install a hook that will wait for the StatefulSet pods to be ready and force the deletion of the Deployment. Since Pods created from Deployments have random names each time a pod is recreated a new secret is generated creating a lot of garbage. In order to prevent that, when running in Deployment mode, the agent will clean the secret it created using a preStop hook.

If the agent was previously running, secrets were already created, and the operator wants to upgrade it to use a different Auth service or proxy it is required that he must wipe every secret ({$RELEASE_NAME-0|$RELEASE_NAME-1|...|$RELEASE_NAME-replicas}-state). This will prevent the agent to reuse the credentials signed by a different CA, otherwise the cluster authentication will fail.

Upgrade from PRE-RFD versions

When upgrading from PRE-RFD into versions that implement this RFD, we have two scenarios based on pre-existing configuration.

A description of each case's caveats is available below.

  1. Storage was available: Statefulset -> Statefulset

    When storage is available, it means that identity and state were previously stored in the local database in PV. This means that the agent is able to read the local file and store that information in a new Kubernetes Secret, deleting the local database once the operation was successful.

    Once the secret is stored, the agent no longer requires the PV storage since nothing is stored there. At this point the operator can upgrade the Helm Chart to have storage.enabled=false.

    A descriptive comment and deprecation must be added to the storage section detailing the upgrade process. First, it's required in order to read the agent identity from local storage, and later it can be disabled.

  2. Storage was not available: Deployment -> Statefulset

    If storage was not available, Helm chart installed the assets as a Deployment. Due to limitations, the current RFD forces the usage of Statefulset. This means that for this case, if no action is taken Helm chart will switch from Deployment to a Statefulset. Helm manages this change by destroying the Deployment object and creating the new Statefulset. During this transition, even if the operator has the PodDisruptionBudget object enabled, the Kubernetes cluster might become inaccessible from Teleport for some time because there is no guarantee that the system has at least one agent replica running. So in order to prevent the downtime, Helm chart will keep the Deployment if it already exists in the cluster, but it will also install in parallel the Statefulset. Once the upgrade finishes, Helm chart will run a post-upgrade hook that waits until the StatefulSet pods are ready and deletes the Deployment after that. The hook will run the following commands:

        $ kubectl rollout status --watch --timeout=600s statefulset/{{ .Release.Name }}
        $ kubectl delete deployment/{ .Release.Name } -n { .Release.Namespace }
    

    Regarding the invite token, since it was running as deployment, it should be using a long-lived/static token as described in the case above, and it should not create any issue when joining the cluster with that invite token if it is still valid.

Helm Chart Differences

File templates/config.yaml

{{- $logLevel := (coalesce .Values.logLevel .Values.log.level "INFO") -}}
{{- if .Values.teleportVersionOverride -}}
  {{- $_ := set . "teleportVersion" .Values.teleportVersionOverride -}}
{{- else -}}
  {{- $_ := set . "teleportVersion" .Chart.Version -}}
{{- end -}}
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ .Release.Name }}
  namespace: {{ .Release.Namespace }}
{{- if .Values.extraLabels.config }}
  labels:
  {{- toYaml .Values.extraLabels.config | nindent 4 }}
{{- end }}
  {{- if .Values.annotations.config }}
  annotations:
    {{- toYaml .Values.annotations.config | nindent 4 }}
  {{- end }}
data:
  teleport.yaml: |
    teleport:
      auth_token: "/etc/teleport-secrets/auth-token"
      auth_servers: ["{{ required "proxyAddr is required in chart values" .Values.proxyAddr }}"]

File templates/deployment.yaml

#
# Warning to maintainers, any changes to this file that are not specific to the Deployment need to also be duplicated
# in the statefulset.yaml file.
#
-{{- if not .Values.storage.enabled }}
+{{- $deployment := lookup "v1" "Deployment" .Release.Namespace .Release.Name -}}
+{{- if ( $deployment ) }}
{{- $replicaCount := (coalesce .Values.replicaCount .Values.highAvailability.replicaCount "1") }}
...
-        {{- if .Values.extraEnv }}
-        env:
-          {{- toYaml .Values.extraEnv | nindent 8 }}
-        {{- end }}
+        env:
+          - name: TELEPORT_REPLICA_NAME
+            valueFrom:
+              fieldRef:
+                 fieldPath: metadata.name
+          - name: KUBE_NAMESPACE
+            valueFrom:
+              fieldRef:
+                 fieldPath: metadata.namespace
+         {{- if .Values.extraEnv }}
+          {{- toYaml .Values.extraEnv | nindent 8 }}
+        {{- end }}

File templates/statefulset.yaml

#
# Warning to maintainers, any changes to this file that are not specific to the StatefulSet need to also be duplicated
# in the deployment.yaml file.
#
-{{- if .Values.storage.enabled }}
{{- $replicaCount := (coalesce .Values.replicaCount .Values.highAvailability.replicaCount "1") }}
...
-        {{- if .Values.extraEnv }}
-        env:
-          {{- toYaml .Values.extraEnv | nindent 8 }}
-        {{- end }}
+        env:
+          - name: TELEPORT_REPLICA_NAME
+            valueFrom:
+              fieldRef:
+                 fieldPath: metadata.name
+          - name: KUBE_NAMESPACE
+            valueFrom:
+              fieldRef:
+                 fieldPath: metadata.namespace
+         {{- if .Values.extraEnv }}
+          {{- toYaml .Values.extraEnv | nindent 10 }}
+        {{- end }}

# remove PV storage if storage is not enabled
...
+{{- if .Values.storage.enabled }}
        - mountPath: /var/lib/teleport
          name: "{{ .Release.Name }}-teleport-data"
+{{- end }}
...
+{{- if .Values.storage.enabled }}
  volumeClaimTemplates:
  - metadata:
      name: "{{ .Release.Name }}-teleport-data"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: {{ .Values.storage.storageClassName }}
      resources:
        requests:
          storage: {{ .Values.storage.requests }}
+{{- end }}
{{- end }}

File values.yaml

Change storage comments

File post-upgrade-job.yaml

+{{- $deployment := lookup "v1" "Deployment" .Release.Namespace .Release.Name -}}
+{{- if ( $deployment ) }}
apiVersion: batch/v1
kind: Job
metadata:
  name: "{{ .Release.Name }}"
  labels:
    app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
    app.kubernetes.io/instance: {{ .Release.Name | quote }}
    app.kubernetes.io/version: {{ .Chart.AppVersion }}
    helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
  annotations:
    "helm.sh/hook": post-upgrade
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": hook-succeeded
spec:
  template:
    metadata:
      name: "{{ .Release.Name }}"
      labels:
        app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
        app.kubernetes.io/instance: {{ .Release.Name | quote }}
        helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    spec:
      restartPolicy: Never
      containers:
      - name: post-upgrade-job
        image: "alpine:3.3"
        command: ["/bin/bash","-c", "kubectl rollout status --watch --timeout=600s statefulset/{{ .Release.Name }} && kubectl delete deployment/{ .Release.Name } -n { .Release.Namespace }"]
+{{- end}}

UX

If the agent detects that it is running inside Kubernetes, it will enable, by default, the Kube secret storage instead of local storage. This means that it will not require any additional configuration or operation by the end-user.

To install the agent on Kubernetes using Helm, the command remains the same as before, as one can see below.

$ helm install/upgrade teleport-agent  teleport/teleport-kube-agent \
  --create-namespace \
  --namespace teleport-agent \
  --set roles=kube \
  --set proxyAddr=${PROXY_ENDPOINT?} \
  --set authToken=${JOIN_TOKEN?} \
  --set kubeClusterName=${KUBERNETES_CLUSTER_NAME?}

Security

The change with the greatest impact is that the agent's service account has read and write permissions for any existing secret in the namespace where it was installed. If the SA token is stolen or a hijack happens in the Teleport agent, the attacker can use it to read any secrets in that namespace. We recommend that Teleport is installed in a separate namespace to minimize this risk.

This issue happens because, currently, Kubernetes does not allow the specification of resourceNames in RBAC roles with wildcards. If supported in the future, we can limit the read/list/update actions when handling secrets for the Teleport Service account. Otherwise, another possible solution might be to keep the identity stored in a single Secret instead of a secret per pod, allowing read and write operations only for that secret.

Another possible security concern is Secret data encryption. By default, standard Kubernetes clusters store objects in etcd as plaintext, but most cloud vendors ship Kubernetes Clusters with encryption at REST for objects, including Secrets. Although they offer such a solution, it is possible to add application-level encryption using a KMS for managing sensitive data such as Secrets. Application-layer secret encryption provides an additional layer of security for sensitive data since it protects against attackers who gain access to an offline copy of etcd.

The encryption and decryption of Secrets are solely the responsibility of the Kubernetes API server. When a user creates a Secret, the Kubernetes API server generates a random Data-Encryption-Key (DEK) and encrypts the Secret data. Then, it sends the DEK into the KMS plugin to encrypt it with Key-Encryption-Key (KEK). Finally, the object is stored together with the encrypted DEK. Once a client requests a secret, the Kubernetes API server contacts the KMS plugin to decrypt the encrypted DEK. When the decrypted DEK returns from the KMS plugin, the API server decrypts the secret payload and returns it, unencrypted, to the client.

If the cluster already has a KMS solution implemented, Kubernetes API server already manages secret encryption when a Teleport agent creates or reads its secrets. Otherwise, if the KMS solution configuration happened after the Teleport generated its secrets, the user has to annotate every secret in the cluster KMS encryption time to force the Kubernetes API server to encrypt them. From an agent perspective, this requires no changes since logic runs at the Kubernetes API server level.

Links