teleport/lib/service/kubernetes.go
Tiago Silva 249a4c5595
Adds Azure AKS auto-discovery (#16633)
This PR presents a watcher for automatic  `kube_cluster` discovery for Azure AKS clusters. Given a user with access to the Azure cloud, the auto-discovery service will scan the cloud and register all clusters available in AKS .

Once the discovery service creates a `kube_cluster` in Auth Server, the Kubernetes Service will start serving it. The credentials used to access the cluster depend on the different AKS clusters configurations:

# Authentication 
## Local Accounts

If the AKS cluster auth is based on local accounts created during the provisioning phase of the cluster, the agent will use the [`aks:ListClusterUserCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP) endpoint. 

This endpoint returns a `kubeconfig` fully populated with user credentials that Teleport can use to access the cluster.

## AZ Active Directory

When AZ active directory integration is enabled, Azure allows login with AD users. Azure forces the login to happen with dynamic short-lived user tokens. These tokens are generated by calling `credentials.GetToken` with a fixed Scope: `6dae42f8-4368-4678-94ff-3960e28e3630` and with the cluster's `tenant_id`. The token contains the user details as well as `group_ids` to match with authorization rules.

```go
// getAzureToken generates an authentication token for clusters with AD enabled.
func (a *aKSClient) getAzureToken(ctx context.Context, tentantID string, clientCfg *rest.Config) (time.Time, error) {
	const (
		azureManagedClusterScope = "6dae42f8-4368-4678-94ff-3960e28e3630"
	)
	cred, err := a.azIdentity(&azidentity.DefaultAzureCredentialOptions{
		TenantID: tentantID,
	})
	if err != nil {
		return time.Time{}, trace.Wrap(ConvertResponseError(err))
	}

	cliAccessToken, err := cred.GetToken(ctx, policy.TokenRequestOptions{
		// azureManagedClusterScope is a fixed scope that identifies azure AKS managed clusters.
		Scopes: []string{azureManagedClusterScope},
	},
	)
	if err != nil {
		return time.Time{}, trace.Wrap(ConvertResponseError(err))
	}
	// reset the old exec provider credentials
	clientCfg.ExecProvider = nil
	clientCfg.BearerToken = cliAccessToken.Token

	return cliAccessToken.ExpiresOn, nil
}
```

# Authorization

## Local Accounts
The [`aks:ListClusterUserCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP) endpoint returns credentials with enough permissions for Teleport to enroll the cluster.

## AZ AD 

### Azure RBAC

When Azure RBAC mode is enabled, the cluster authorization is based on rules specified in the Azure Identity permissions. 

The AZ group associated with the AZ identity the Teleport Process is running has to define the following permissions:

```json
{
    "Name": "AKS Teleport Discovery Permissions",
    "Description": "Required permissions for Teleport auto-discovery.",
    "Actions": [],
    "NotActions": [],
    "DataActions": [
      "Microsoft.ContainerService/managedClusters/pods/read",
      "Microsoft.ContainerService/managedClusters/users/impersonate/action",
      "Microsoft.ContainerService/managedClusters/groups/impersonate/action",
      "Microsoft.ContainerService/managedClusters/serviceaccounts/impersonate/action",
      "Microsoft.ContainerService/managedClusters/authorization.k8s.io/selfsubjectaccessreviews/write",
      "Microsoft.ContainerService/managedClusters/authorization.k8s.io/selfsubjectrulesreviews/write",
    ],
    "NotDataActions": [],
    "assignableScopes": [
        "/subscriptions/{subscription_id}"
    ]
}
```

If correctly specified, the Azure authentication service automatically grants access to any cluster within  `subscription_id` 
 without any other definition. On the other hand, if it's incorrectly configured, an error is triggered but Teleport cannot gain access to the cluster.


### Kubernetes RBAC

If AZ RBAC integration is disabled, the authorization to the cluster is processed by Kubernetes RBAC. This is done by matching the Az Identity principals (`group_ids`) with `Role`, `ClusterRole` objects that live in the AKS cluster.  This mode requires that the `ClusterRole` and `ClusterRoleBinding` must exist and must be well configured for each cluster to enroll.

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: teleport-role
rules:
- apiGroups:
  - ""
  resources:
  - users
  - groups
  - serviceaccounts
  verbs:
  - impersonate
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - "authorization.k8s.io"
  resources:
  - selfsubjectaccessreviews
  - selfsubjectrulesreviews
  verbs:
  - create
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: teleport-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: teleport-role
subjects:
- kind: Group
  name: {group_name}
  apiGroup: rbac.authorization.k8s.io
```

#### `ClusterRole` and `ClusterRoleBinding` configured

If cluster operators or previous Teleport run has configured access to the cluster, no further action is required since Teleport already has access to the cluster.

#### Cluster `aks:ListClusterAdminCredentials` returns valid credentials

If the Teleport process has access to [`aks:ListClusterAdminCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-admin-credentials?tabs=HTTP) and the endpoint returns valid cluster admin credentials, Teleport will automatically create the  `ClusterRole` and `ClusterRoleBinding` objects in the cluster configured to the `group_id` that is listed in the access token. In order to extract the `group_id` from the token, Teleport parses the JWT claims and extracts the first element.

If the object creation was successful, Teleport can access it, otherwise, it will use the `aks:BeginRunCommand` method to try to configure access to itself.

#### Cluster `aks:BeginRunCommand` returns valid credentials

When we reach this mode, Teleport tries to run a `kubectl` command against the cluster to configure the `ClusterRole` and `ClusterRoleBinding`. `aks:BeginRunCommand` allows any user with access to that endpoint to run arbitrary commands in the cluster (commands cannot be validated). Teleport will use it as the last resource to configure the access to itself.

If the command failed, Teleport cannot grant access to the cluster and an error is returned.

# UX

Currently, to discover AKS  resources created and to have them dynamically served by the `kubernetes_service`one can define the following configuration.

```yaml
discovery_service:
   enabled: true
  azure:
  - subscriptions: ["*"]
    types: ["aks"]
    regions: ["*"]
    tags:
      '*': '*'

kubernetes_service:
   enabled: true

   resources:
       labels:
           '*': '*'
```

# Future work
- Support AWS dynamic authentication

Part of #16135, #13376  
Related to  #12048, #16276, #16281
2022-10-11 21:37:50 +00:00

300 lines
9.7 KiB
Go

/*
Copyright 2020 Gravitational, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package service
import (
"net"
"net/http"
"github.com/gravitational/trace"
"github.com/sirupsen/logrus"
"github.com/gravitational/teleport"
apidefaults "github.com/gravitational/teleport/api/defaults"
"github.com/gravitational/teleport/api/types"
"github.com/gravitational/teleport/lib/auth"
"github.com/gravitational/teleport/lib/events"
kubeproxy "github.com/gravitational/teleport/lib/kube/proxy"
"github.com/gravitational/teleport/lib/labels"
"github.com/gravitational/teleport/lib/reversetunnel"
"github.com/gravitational/teleport/lib/services"
"github.com/gravitational/teleport/lib/utils"
)
func (process *TeleportProcess) initKubernetes() {
log := process.log.WithFields(logrus.Fields{
trace.Component: teleport.Component(teleport.ComponentKube, process.id),
})
process.registerWithAuthServer(types.RoleKube, KubeIdentityEvent)
process.RegisterCriticalFunc("kube.init", func() error {
conn, err := process.waitForConnector(KubeIdentityEvent, log)
if conn == nil {
return trace.Wrap(err)
}
if err := process.initKubernetesService(log, conn); err != nil {
warnOnErr(conn.Close(), log)
return trace.Wrap(err)
}
return nil
})
}
func (process *TeleportProcess) initKubernetesService(log *logrus.Entry, conn *Connector) (retErr error) {
// clean up unused descriptors passed for proxy, but not used by it
defer func() {
if err := process.closeImportedDescriptors(teleport.ComponentKube); err != nil {
log.WithError(err).Warn("Failed closing imported file descriptors.")
}
}()
cfg := process.Config
// Create a caching auth client.
accessPoint, err := process.newLocalCacheForKubernetes(conn.Client, []string{teleport.ComponentKube})
if err != nil {
return trace.Wrap(err)
}
teleportClusterName := conn.ServerIdentity.ClusterName
proxyGetter := reversetunnel.NewConnectedProxyGetter()
// This service can run in 2 modes:
// 1. Reachable (by the proxy) - registers with auth server directly and
// creates a local listener to accept proxy conns.
// 2. Not reachable ("IoT mode") - creates a reverse tunnel to a proxy and
// handles registration and incoming connections through that.
//
// The listener exposes incoming connections over either mode.
var listener net.Listener
var agentPool *reversetunnel.AgentPool
switch {
// Filter out cases where both listen_addr and tunnel are set or both are
// not set.
case conn.UseTunnel() && !cfg.Kube.ListenAddr.IsEmpty():
return trace.BadParameter("either set kubernetes_service.listen_addr if this process can be reached from a teleport proxy or point teleport.proxy_server to a proxy to dial out, but don't set both")
case !conn.UseTunnel() && cfg.Kube.ListenAddr.IsEmpty():
// TODO(awly): if this process runs auth, proxy and kubernetes
// services, the proxy should be able to route requests to this
// kubernetes service. This means either always connecting over a
// reverse tunnel (with a performance penalty), or somehow passing the
// connections in-memory between proxy and kubernetes services.
//
// For now, as a lazy shortcut, kuberentes_service.listen_addr is
// always required when running in the same process with a proxy.
return trace.BadParameter("set kubernetes_service.listen_addr if this process can be reached from a teleport proxy or point teleport.proxy_server to a proxy to dial out")
// Start a local listener and let proxies dial in.
case !conn.UseTunnel() && !cfg.Kube.ListenAddr.IsEmpty():
log.Debug("Turning on Kubernetes service listening address.")
listener, err = process.importOrCreateListener(ListenerKube, cfg.Kube.ListenAddr.Addr)
if err != nil {
return trace.Wrap(err)
}
defer func() {
if retErr != nil {
warnOnErr(listener.Close(), log)
}
}()
// Dialed out to a proxy, start servicing the reverse tunnel as a listener.
case conn.UseTunnel() && cfg.Kube.ListenAddr.IsEmpty():
// create an adapter, from reversetunnel.ServerHandler to net.Listener.
shtl := reversetunnel.NewServerHandlerToListener(reversetunnel.LocalKubernetes)
listener = shtl
agentPool, err = reversetunnel.NewAgentPool(
process.ExitContext(),
reversetunnel.AgentPoolConfig{
Component: teleport.ComponentKube,
HostUUID: conn.ServerIdentity.ID.HostUUID,
Resolver: conn.TunnelProxyResolver(),
Client: conn.Client,
AccessPoint: accessPoint,
HostSigner: conn.ServerIdentity.KeySigner,
Cluster: teleportClusterName,
Server: shtl,
FIPS: process.Config.FIPS,
ConnectedProxyGetter: proxyGetter,
})
if err != nil {
return trace.Wrap(err)
}
if err = agentPool.Start(); err != nil {
return trace.Wrap(err)
}
defer func() {
if retErr != nil {
agentPool.Stop()
}
}()
log.Info("Started reverse tunnel client.")
}
var dynLabels *labels.Dynamic
if len(cfg.Kube.DynamicLabels) != 0 {
dynLabels, err = labels.NewDynamic(process.ExitContext(), &labels.DynamicConfig{
Labels: cfg.Kube.DynamicLabels,
Log: log,
})
if err != nil {
return trace.Wrap(err)
}
dynLabels.Sync()
go dynLabels.Start()
defer func() {
if retErr != nil {
dynLabels.Close()
}
}()
}
lockWatcher, err := services.NewLockWatcher(process.ExitContext(), services.LockWatcherConfig{
ResourceWatcherConfig: services.ResourceWatcherConfig{
Component: teleport.ComponentKube,
Log: log,
Client: conn.Client,
},
})
if err != nil {
return trace.Wrap(err)
}
// Create the kube server to service listener.
authorizer, err := auth.NewAuthorizer(teleportClusterName, accessPoint, lockWatcher)
if err != nil {
return trace.Wrap(err)
}
tlsConfig, err := conn.ServerIdentity.TLSConfig(cfg.CipherSuites)
if err != nil {
return trace.Wrap(err)
}
// asyncEmitter makes sure that sessions do not block
// in case if connections are slow
asyncEmitter, err := process.newAsyncEmitter(conn.Client)
if err != nil {
return trace.Wrap(err)
}
streamer, err := events.NewCheckingStreamer(events.CheckingStreamerConfig{
Inner: conn.Client,
Clock: process.Clock,
ClusterName: teleportClusterName,
})
if err != nil {
return trace.Wrap(err)
}
streamEmitter := &events.StreamerAndEmitter{
Emitter: asyncEmitter,
Streamer: streamer,
}
var publicAddr string
if len(cfg.Kube.PublicAddrs) > 0 {
publicAddr = cfg.Kube.PublicAddrs[0].String()
}
kubeServer, err := kubeproxy.NewTLSServer(kubeproxy.TLSServerConfig{
ForwarderConfig: kubeproxy.ForwarderConfig{
Namespace: apidefaults.Namespace,
Keygen: cfg.Keygen,
ClusterName: teleportClusterName,
Authz: authorizer,
AuthClient: conn.Client,
StreamEmitter: streamEmitter,
DataDir: cfg.DataDir,
CachingAuthClient: accessPoint,
HostID: cfg.HostUUID,
Context: process.ExitContext(),
KubeconfigPath: cfg.Kube.KubeconfigPath,
KubeClusterName: cfg.Kube.KubeClusterName,
KubeServiceType: kubeproxy.KubeService,
Component: teleport.ComponentKube,
LockWatcher: lockWatcher,
CheckImpersonationPermissions: cfg.Kube.CheckImpersonationPermissions,
PublicAddr: publicAddr,
},
TLS: tlsConfig,
AccessPoint: accessPoint,
LimiterConfig: cfg.Kube.Limiter,
OnHeartbeat: process.onHeartbeat(teleport.ComponentKube),
GetRotation: process.getRotation,
ConnectedProxyGetter: proxyGetter,
ResourceMatchers: cfg.Kube.ResourceMatchers,
StaticLabels: cfg.Kube.StaticLabels,
DynamicLabels: dynLabels,
CloudLabels: process.cloudLabels,
})
if err != nil {
return trace.Wrap(err)
}
defer func() {
if retErr != nil {
warnOnErr(kubeServer.Close(), log)
}
}()
process.RegisterCriticalFunc("kube.serve", func() error {
if conn.UseTunnel() {
log.Info("Starting Kube service via proxy reverse tunnel.")
utils.Consolef(cfg.Console, log, teleport.ComponentKube,
"Kubernetes service %s:%s is starting via proxy reverse tunnel.",
teleport.Version, teleport.Gitref)
} else {
log.Infof("Starting Kube service on %v.", listener.Addr())
utils.Consolef(cfg.Console, log, teleport.ComponentKube,
"Kubernetes service %s:%s is starting on %v.",
teleport.Version, teleport.Gitref, listener.Addr())
}
process.BroadcastEvent(Event{Name: KubernetesReady, Payload: nil})
err := kubeServer.Serve(listener)
if err != nil {
if err == http.ErrServerClosed {
return nil
}
return trace.Wrap(err)
}
return nil
})
// Cleanup, when process is exiting.
process.OnExit("kube.shutdown", func(payload interface{}) {
if asyncEmitter != nil {
warnOnErr(asyncEmitter.Close(), log)
}
// Clean up items in reverse order from their initialization.
if payload != nil {
// Graceful shutdown.
warnOnErr(kubeServer.Shutdown(payloadContext(payload, log)), log)
agentPool.Stop()
agentPool.Wait()
} else {
// Fast shutdown.
warnOnErr(kubeServer.Close(), log)
agentPool.Stop()
}
warnOnErr(listener.Close(), log)
warnOnErr(conn.Close(), log)
if dynLabels != nil {
dynLabels.Close()
}
log.Info("Exited.")
})
return nil
}