teleport/lib/service
Tiago Silva 249a4c5595
Adds Azure AKS auto-discovery (#16633)
This PR presents a watcher for automatic  `kube_cluster` discovery for Azure AKS clusters. Given a user with access to the Azure cloud, the auto-discovery service will scan the cloud and register all clusters available in AKS .

Once the discovery service creates a `kube_cluster` in Auth Server, the Kubernetes Service will start serving it. The credentials used to access the cluster depend on the different AKS clusters configurations:

# Authentication 
## Local Accounts

If the AKS cluster auth is based on local accounts created during the provisioning phase of the cluster, the agent will use the [`aks:ListClusterUserCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP) endpoint. 

This endpoint returns a `kubeconfig` fully populated with user credentials that Teleport can use to access the cluster.

## AZ Active Directory

When AZ active directory integration is enabled, Azure allows login with AD users. Azure forces the login to happen with dynamic short-lived user tokens. These tokens are generated by calling `credentials.GetToken` with a fixed Scope: `6dae42f8-4368-4678-94ff-3960e28e3630` and with the cluster's `tenant_id`. The token contains the user details as well as `group_ids` to match with authorization rules.

```go
// getAzureToken generates an authentication token for clusters with AD enabled.
func (a *aKSClient) getAzureToken(ctx context.Context, tentantID string, clientCfg *rest.Config) (time.Time, error) {
	const (
		azureManagedClusterScope = "6dae42f8-4368-4678-94ff-3960e28e3630"
	)
	cred, err := a.azIdentity(&azidentity.DefaultAzureCredentialOptions{
		TenantID: tentantID,
	})
	if err != nil {
		return time.Time{}, trace.Wrap(ConvertResponseError(err))
	}

	cliAccessToken, err := cred.GetToken(ctx, policy.TokenRequestOptions{
		// azureManagedClusterScope is a fixed scope that identifies azure AKS managed clusters.
		Scopes: []string{azureManagedClusterScope},
	},
	)
	if err != nil {
		return time.Time{}, trace.Wrap(ConvertResponseError(err))
	}
	// reset the old exec provider credentials
	clientCfg.ExecProvider = nil
	clientCfg.BearerToken = cliAccessToken.Token

	return cliAccessToken.ExpiresOn, nil
}
```

# Authorization

## Local Accounts
The [`aks:ListClusterUserCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP) endpoint returns credentials with enough permissions for Teleport to enroll the cluster.

## AZ AD 

### Azure RBAC

When Azure RBAC mode is enabled, the cluster authorization is based on rules specified in the Azure Identity permissions. 

The AZ group associated with the AZ identity the Teleport Process is running has to define the following permissions:

```json
{
    "Name": "AKS Teleport Discovery Permissions",
    "Description": "Required permissions for Teleport auto-discovery.",
    "Actions": [],
    "NotActions": [],
    "DataActions": [
      "Microsoft.ContainerService/managedClusters/pods/read",
      "Microsoft.ContainerService/managedClusters/users/impersonate/action",
      "Microsoft.ContainerService/managedClusters/groups/impersonate/action",
      "Microsoft.ContainerService/managedClusters/serviceaccounts/impersonate/action",
      "Microsoft.ContainerService/managedClusters/authorization.k8s.io/selfsubjectaccessreviews/write",
      "Microsoft.ContainerService/managedClusters/authorization.k8s.io/selfsubjectrulesreviews/write",
    ],
    "NotDataActions": [],
    "assignableScopes": [
        "/subscriptions/{subscription_id}"
    ]
}
```

If correctly specified, the Azure authentication service automatically grants access to any cluster within  `subscription_id` 
 without any other definition. On the other hand, if it's incorrectly configured, an error is triggered but Teleport cannot gain access to the cluster.


### Kubernetes RBAC

If AZ RBAC integration is disabled, the authorization to the cluster is processed by Kubernetes RBAC. This is done by matching the Az Identity principals (`group_ids`) with `Role`, `ClusterRole` objects that live in the AKS cluster.  This mode requires that the `ClusterRole` and `ClusterRoleBinding` must exist and must be well configured for each cluster to enroll.

```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: teleport-role
rules:
- apiGroups:
  - ""
  resources:
  - users
  - groups
  - serviceaccounts
  verbs:
  - impersonate
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - "authorization.k8s.io"
  resources:
  - selfsubjectaccessreviews
  - selfsubjectrulesreviews
  verbs:
  - create
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: teleport-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: teleport-role
subjects:
- kind: Group
  name: {group_name}
  apiGroup: rbac.authorization.k8s.io
```

#### `ClusterRole` and `ClusterRoleBinding` configured

If cluster operators or previous Teleport run has configured access to the cluster, no further action is required since Teleport already has access to the cluster.

#### Cluster `aks:ListClusterAdminCredentials` returns valid credentials

If the Teleport process has access to [`aks:ListClusterAdminCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-admin-credentials?tabs=HTTP) and the endpoint returns valid cluster admin credentials, Teleport will automatically create the  `ClusterRole` and `ClusterRoleBinding` objects in the cluster configured to the `group_id` that is listed in the access token. In order to extract the `group_id` from the token, Teleport parses the JWT claims and extracts the first element.

If the object creation was successful, Teleport can access it, otherwise, it will use the `aks:BeginRunCommand` method to try to configure access to itself.

#### Cluster `aks:BeginRunCommand` returns valid credentials

When we reach this mode, Teleport tries to run a `kubectl` command against the cluster to configure the `ClusterRole` and `ClusterRoleBinding`. `aks:BeginRunCommand` allows any user with access to that endpoint to run arbitrary commands in the cluster (commands cannot be validated). Teleport will use it as the last resource to configure the access to itself.

If the command failed, Teleport cannot grant access to the cluster and an error is returned.

# UX

Currently, to discover AKS  resources created and to have them dynamically served by the `kubernetes_service`one can define the following configuration.

```yaml
discovery_service:
   enabled: true
  azure:
  - subscriptions: ["*"]
    types: ["aks"]
    regions: ["*"]
    tags:
      '*': '*'

kubernetes_service:
   enabled: true

   resources:
       labels:
           '*': '*'
```

# Future work
- Support AWS dynamic authentication

Part of #16135, #13376  
Related to  #12048, #16276, #16281
2022-10-11 21:37:50 +00:00
..
acme.go Introduce app server and app resources (#8140) 2021-09-09 14:19:02 -07:00
cfg.go Adds Azure AKS auto-discovery (#16633) 2022-10-11 21:37:50 +00:00
cfg_test.go Remove RestartThreshold and related constants (#13722) 2022-06-23 16:59:09 +00:00
connect.go Introduce config v3, add auth_server and proxy_server, remove auth_addresses (#15761) 2022-09-28 15:30:15 +00:00
db.go Add Cassandra/Scylla database support (#15895) 2022-10-10 12:37:51 +02:00
db_test.go Update golangci-lint to 1.49.0 (#16507) 2022-09-19 22:38:59 +00:00
desktop.go Introduce config v3, add auth_server and proxy_server, remove auth_addresses (#15761) 2022-09-28 15:30:15 +00:00
discovery.go Adds Azure AKS auto-discovery (#16633) 2022-10-11 21:37:50 +00:00
info.go Better signal handling and pools for gzip. 2018-02-19 10:57:26 -08:00
kubernetes.go Adds Azure AKS auto-discovery (#16633) 2022-10-11 21:37:50 +00:00
listeners.go Remove centralised port allocation for tests (#13658) 2022-07-20 12:04:54 +10:00
proxy_settings.go Introduce config v3, add auth_server and proxy_server, remove auth_addresses (#15761) 2022-09-28 15:30:15 +00:00
service.go Add Kubernetes Cluster Connection Tester (#16899) 2022-10-10 17:27:29 +00:00
service_test.go Add Cassandra/Scylla database support (#15895) 2022-10-10 12:37:51 +02:00
signals.go Refactor Supervisor.WaitForEvent (#14940) 2022-07-28 13:34:27 +00:00
state.go Move prometheus collectors from utils to metrics (#15288) 2022-08-09 17:35:19 +00:00
state_test.go Revert readyz changes (#12244) 2022-04-26 22:16:55 +00:00
supervisor.go Refactor Supervisor.WaitForEvent (#14940) 2022-07-28 13:34:27 +00:00
validateconfig.go Introduce config v3, add auth_server and proxy_server, remove auth_addresses (#15761) 2022-09-28 15:30:15 +00:00
validateconfig_test.go Introduce config v3, add auth_server and proxy_server, remove auth_addresses (#15761) 2022-09-28 15:30:15 +00:00