This PR presents a watcher for automatic `kube_cluster` discovery for GCP GKE clusters. Given an identity with access to the GCP cloud, the auto-discovery service will scan the cloud and register all clusters available in Kubernetes Engine.
Once the discovery service creates a `kube_cluster` on the Auth Server, the Kubernetes Service will start serving it. The credentials used to access the cluster are short-lived and generated through Google OAuth2 associated with the GCP Service Account configured for the Kubernetes Service.
GCP's Service Account must have the following role def attached:
```yaml
description: 'GKE Auto-Discovery'
includedPermissions:
- container.clusters.impersonate
- container.clusters.get
- container.clusters.list
- container.pods.get
- container.selfSubjectAccessReviews.create
- container.selfSubjectRulesReviews.create
name: projects/{projectID}/roles/GKEKubernetesAutoDisc
stage: GA
title: GKEKubernetesAutoDisc
```
Part of #16135, #13376
Related to #12048, #16276, #16281, #16633, #14991
* Detect if HTTP_PROXY/HTTPS_PROXY is specified in `tsh proxy ssh`
* Support HTTPS_PROXY in `tsh proxy ssh`
* Ensure we tidy up connections on failure
* Lintering
This PR presents a watcher for automatic `kube_cluster` discovery for AWS EKS clusters. Given a user with access to the AWS cloud, the auto-discovery service will scan the cloud and register all clusters available in EKS.
Once the discovery service creates a `kube_cluster` on the Auth Server, the Kubernetes Service will start serving it. The credentials used to access the cluster are short-lived - TTL is 15m - and are generated through the IAM role associated with the process running the `kubernetes_service`.
All clusters must necessarily allow access to the Teleport service by configuring the IAM role in the `configmap/aws-auth` present in the `kube-system` namespace. Without this access, the clusters will not be served correctly.
# Future work
- Support GCP auto-discovery
Part of #16135, #13376
Related to #12048, #16276, #16281, #16633
In “tsh login” show only alerts with on-login label.
In “tsh status” show only alerts with “high” severity. Which license warning should match.
In all “tctl” commands show only alerts with “high” severity.
This PR presents a watcher for automatic `kube_cluster` discovery for Azure AKS clusters. Given a user with access to the Azure cloud, the auto-discovery service will scan the cloud and register all clusters available in AKS .
Once the discovery service creates a `kube_cluster` in Auth Server, the Kubernetes Service will start serving it. The credentials used to access the cluster depend on the different AKS clusters configurations:
# Authentication
## Local Accounts
If the AKS cluster auth is based on local accounts created during the provisioning phase of the cluster, the agent will use the [`aks:ListClusterUserCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP) endpoint.
This endpoint returns a `kubeconfig` fully populated with user credentials that Teleport can use to access the cluster.
## AZ Active Directory
When AZ active directory integration is enabled, Azure allows login with AD users. Azure forces the login to happen with dynamic short-lived user tokens. These tokens are generated by calling `credentials.GetToken` with a fixed Scope: `6dae42f8-4368-4678-94ff-3960e28e3630` and with the cluster's `tenant_id`. The token contains the user details as well as `group_ids` to match with authorization rules.
```go
// getAzureToken generates an authentication token for clusters with AD enabled.
func (a *aKSClient) getAzureToken(ctx context.Context, tentantID string, clientCfg *rest.Config) (time.Time, error) {
const (
azureManagedClusterScope = "6dae42f8-4368-4678-94ff-3960e28e3630"
)
cred, err := a.azIdentity(&azidentity.DefaultAzureCredentialOptions{
TenantID: tentantID,
})
if err != nil {
return time.Time{}, trace.Wrap(ConvertResponseError(err))
}
cliAccessToken, err := cred.GetToken(ctx, policy.TokenRequestOptions{
// azureManagedClusterScope is a fixed scope that identifies azure AKS managed clusters.
Scopes: []string{azureManagedClusterScope},
},
)
if err != nil {
return time.Time{}, trace.Wrap(ConvertResponseError(err))
}
// reset the old exec provider credentials
clientCfg.ExecProvider = nil
clientCfg.BearerToken = cliAccessToken.Token
return cliAccessToken.ExpiresOn, nil
}
```
# Authorization
## Local Accounts
The [`aks:ListClusterUserCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP) endpoint returns credentials with enough permissions for Teleport to enroll the cluster.
## AZ AD
### Azure RBAC
When Azure RBAC mode is enabled, the cluster authorization is based on rules specified in the Azure Identity permissions.
The AZ group associated with the AZ identity the Teleport Process is running has to define the following permissions:
```json
{
"Name": "AKS Teleport Discovery Permissions",
"Description": "Required permissions for Teleport auto-discovery.",
"Actions": [],
"NotActions": [],
"DataActions": [
"Microsoft.ContainerService/managedClusters/pods/read",
"Microsoft.ContainerService/managedClusters/users/impersonate/action",
"Microsoft.ContainerService/managedClusters/groups/impersonate/action",
"Microsoft.ContainerService/managedClusters/serviceaccounts/impersonate/action",
"Microsoft.ContainerService/managedClusters/authorization.k8s.io/selfsubjectaccessreviews/write",
"Microsoft.ContainerService/managedClusters/authorization.k8s.io/selfsubjectrulesreviews/write",
],
"NotDataActions": [],
"assignableScopes": [
"/subscriptions/{subscription_id}"
]
}
```
If correctly specified, the Azure authentication service automatically grants access to any cluster within `subscription_id`
without any other definition. On the other hand, if it's incorrectly configured, an error is triggered but Teleport cannot gain access to the cluster.
### Kubernetes RBAC
If AZ RBAC integration is disabled, the authorization to the cluster is processed by Kubernetes RBAC. This is done by matching the Az Identity principals (`group_ids`) with `Role`, `ClusterRole` objects that live in the AKS cluster. This mode requires that the `ClusterRole` and `ClusterRoleBinding` must exist and must be well configured for each cluster to enroll.
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: teleport-role
rules:
- apiGroups:
- ""
resources:
- users
- groups
- serviceaccounts
verbs:
- impersonate
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- "authorization.k8s.io"
resources:
- selfsubjectaccessreviews
- selfsubjectrulesreviews
verbs:
- create
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: teleport-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: teleport-role
subjects:
- kind: Group
name: {group_name}
apiGroup: rbac.authorization.k8s.io
```
#### `ClusterRole` and `ClusterRoleBinding` configured
If cluster operators or previous Teleport run has configured access to the cluster, no further action is required since Teleport already has access to the cluster.
#### Cluster `aks:ListClusterAdminCredentials` returns valid credentials
If the Teleport process has access to [`aks:ListClusterAdminCredentials`](https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-admin-credentials?tabs=HTTP) and the endpoint returns valid cluster admin credentials, Teleport will automatically create the `ClusterRole` and `ClusterRoleBinding` objects in the cluster configured to the `group_id` that is listed in the access token. In order to extract the `group_id` from the token, Teleport parses the JWT claims and extracts the first element.
If the object creation was successful, Teleport can access it, otherwise, it will use the `aks:BeginRunCommand` method to try to configure access to itself.
#### Cluster `aks:BeginRunCommand` returns valid credentials
When we reach this mode, Teleport tries to run a `kubectl` command against the cluster to configure the `ClusterRole` and `ClusterRoleBinding`. `aks:BeginRunCommand` allows any user with access to that endpoint to run arbitrary commands in the cluster (commands cannot be validated). Teleport will use it as the last resource to configure the access to itself.
If the command failed, Teleport cannot grant access to the cluster and an error is returned.
# UX
Currently, to discover AKS resources created and to have them dynamically served by the `kubernetes_service`one can define the following configuration.
```yaml
discovery_service:
enabled: true
azure:
- subscriptions: ["*"]
types: ["aks"]
regions: ["*"]
tags:
'*': '*'
kubernetes_service:
enabled: true
resources:
labels:
'*': '*'
```
# Future work
- Support AWS dynamic authentication
Part of #16135, #13376
Related to #12048, #16276, #16281
Address TODOs, add deprecation warnings and remove as many U2F code references
as possible.
Existing behavior is kept unaltered: it's still possible to inform Teleport of
old U2F AppIDs and U2F configurations are still silently converted to WebAuthn.
There's no reason to break that, so we don't.
Most server-side references to SecondFactorU2F are removed, but client-side
references remain: this makes it possible to interop newer clients with old
clusters (something else may break, but hopefully not this part).
Closes#10375.
* Add initial version of installer
* Resolve comments
- Use aws waiters when checking commands
- Use SSMRunRequest rather than passing instances
- General comments
* Resolve comments, (rebase) pass scriptname parameter
This resolves comments regarding running on multiple ec2 instances at
once by adding state to the instances cache to check if the instance
is known about and how far into installation it is
* Revert cache
* Dont cache on non discovery nodes
* Resolve some comments
* Move discovery out to its own serviec
* Add a `discovery_service` section
* Fix messed up conflict merge
* Make starting a standalone discovery agent work
* Resolve comments
* Resolve comments
- use a regular events.Emitter
- resolve a thousand typos :)
* Resolve comments
* resolve comments, fix a bad merge
* Fail when a non ec2 matcher type is configured
* fix lint-go
* Resolve comments
* Resolve comments, add initial test (currently broken)
* Fix log string so only 1 pair of [] are used
* Chunk instances for sending commands
* add 'isInitialized' to watchers
* Add test for chunked discovery, log output
* lints
* explicetly set matcher.Tags to "*":"*" if its unset
* WebAPI: update user traits
Web API only supports updating the roles property for a given User.
This PR adds the possibility of updating User's traits
- Logins
- DB Users
- DB Names
- Kube Users
- Kube Groups
- Windows Logins
- AWS Role ARNs
It only updates if the requests contains a non-nil value for the trait's
list.
It deduplicates the trait's list before applying it.
* Send Teleport version when adding remote cluster
* Send Database CA only if remote cluster is v10+
* Fix existing test and new one for v9 trusted CA validation
* Add DatabaseCAMinVersion constant and replace hardcoded versions.
Implement passwordless cluster settings, as described by RFD 52: Passwordless
[1]. See https://github.com/gravitational/teleport/pull/11716 for recent RFD
changes.
Add the "passwordless" connector and allows cluster admins to configure it as
the cluster default.
teleport.yaml:
```yaml
auth_service:
authentication:
type: local
second_factor: on
webauthn:
rp_id: example.com
passwordless: true # enables/disables passwordless
connector_name: passwordless # sets passwordless as the default
```
`tsh` obeys the connector name if present. User overrides are possible via `tsh
--auth=local` or `tsh --auth=passwordless`. The `tsh --pwdless` flag is now
removed, as discussed in recent RFD changes.
Web API `/ping` and `/ping/{connector}` endpoints are changed to support
"passwordless" and to provide other necessary information.
#9160
[1] https://github.com/gravitational/teleport/blob/master/rfd/0052-passwordless.md#cluster-settings
* Add passwordless toggle to AuthPreferenceSpecV2 proto
* Update generate protos
* Add AuthPreference fields and validation
* Reserve system connector names
* Distinguish between type=local and local connector
* Add passwordless option to teleport.yaml
* Do /ping and /ping/{connector} changes
* Make tsh respect the passwordless connector, remove --pwdless
Alias the "u2f" second factor mode to "webauthn", effectively sunsetting U2F in
favor of WebAuthn.
The change effectively disables "U2F mode" server-side, making Teleport use
WebAuthn instead. This is in line with our compatibility promise, as Teleport
8.x clients are already WebAuthn-capable (and thus have no problems talking to
the cluster).
I have cleaned up a good chunk of U2F references in lib/web and lib/client, plus
a few other places. Changes on lib/auth are just the necessary to get the tests
back to good standing. There is more work to be done, but this seems enough for
a single PR.
#10375
* Remove "Disabled" field from types.Webauthn
* Update generated protos
* Treat second_factor "u2f" as "webauthn"
* Remove references to Webauthn.Disabled
* Remove U2F from lib/web/
* Remove U2F from lib/client/
* Remove U2F from lib/auth/ (partially)
* Fix issues after rebase on master
* Fix typo
* Add certificate renewal bot
This adds a new `tbot` tool to continuously renew a set of
certificates after registering with a Teleport cluster using a
similar process to standard node joining.
This makes some modifications to user certificate generation to allow
for certificates that can be renewed beyond their original TTL, and
exposes new gRPC endpoints:
* `CreateBotJoinToken` creates a join token for a bot user
* `GenerateInitialRenewableUserCerts` exchanges a token for a set of
certificates with a new `renewable` flag set
A new `tctl` command, `tctl bots add`, creates a bot user and calls
`CreateBotJoinToken` to issue a token. A bot instance can then be
started using a provided command.
* Cert bot refactoring pass
* Use role requests to split renewable certs from end-user certs
* Add bot configuration file
* Use `teleport.dev/bot` label
* Remove `impersonator` flag on initial bot certs
* Remove unnecessary `renew` package
* Misc other cleanup
* Do not pass through `renewable` flag when role requests are set
This adds additional restrictions on when a certificate's `renewable`
flag is carried over to a new certificate. In particular, it now also
denies the flag when either role requests are present, or the
`disallowReissue` flag has been previously set.
In practice `disallow-reissue` would have prevented any undesired
behavior but this improves consistency and resolves a TODO.
* Various tbot UX improvements; render SSH config
* Fully flesh out config template rendering
* Fix rendering for SSH configuration templates
* Added `String()` impls for destination types
* Improve certificate renewal logging; show more detail
* Properly fall back to default (all) roles
* Add mode hints for files
* Add/update copyright headers
* Add stubs for tbot init and watch commands
* Add gRPC endpoints for managing bots
* Add `CreateBot`, `DeleteBot`, and `GetBotUsers` gRPC endpoints
* Replace `tctl bot (add|rm|ls)` implementations with gRPC calls
* Define a few new constants, `DefaultBotJoinTTL`, `BotLabel`,
`BotGenerationLabel`
* Fix outdated destination flag in example tbot command
* Bugfix pass for demo
* Fixed a few nil pointer derefs when using config from CLI args
* Properly create destination if `--destination-dir` flag is used
* Remove improper default on CLI flag
* `DestinationConfig` is now a list of pointers
* Address first wave of review feedback
Fixes the majority of smaller issues caught by reviewers, thanks all!
* Add doc comments for bot.go functions
* Return the token TTL from CreateBot
* Split initial user cert issuance from `generateUserCerts()`
Issuing initial renewable certificate ended up requiring a lot of
hacks to skip checks that prevented anonymous bots from getting
certs even though we'd verified their identity elsewhere (via token).
This reverts all those hacks and splits initial bot cert logic into a
dedicated `generateInitialRenewableUserCerts()` function which should
make the whole process much easier to follow.
* Set bot traits to silence log messages
* tbot log message consistency pass
* Resolve lints
* Add config tests
* Remove CreateBotJoinToken endpoint
Users should instead use the CreateBot/DeleteBot endpoints.
* Create a fresh private key for every impersonated identity renewal
* Hide `config` subcommand
* Rename bot label prefix to `teleport.internal/`
* Use types.NewRole() to create bot roles
* Clean up error handling in custom YAML unmarshallers
Also, add notes about the supported YAML shapes.
* Fetch proxy host via gRPC Ping() instead of GetProxies()
* Update lib/auth/bot.go
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
* Fix some review comments
* Add renewable certificate generation checks (#10098)
* Add renewable certificate generation checks
This adds a new validation check for renewable certificates that
maintains a renewal counter as both a certificate extension and a
user label. This counter is used to ensure only a single certificate
lineage can exist: for example, if a renewable certificate is stolen,
only one copy of the certificate can be renewed as the generation
counter will not match
When renewing a certificate, first the generation counter presented
by the user (via their TLS identity) is compared to a value stored
with the associated user (in a new `teleport.dev/bot-generation`
label field). If they aren't equal, the renewal attempt fails.
Otherwise, the generation counter is incremented by 1, stored to the
database using a `CompareAndSwap()` to ensure atomicity, and set on
the generated certificate for use in future renewals.
* Add unit tests for the generation counter
This adds new unit tests to exercise the generation counter checks.
Additionally, it fixes two other renewable cert tests that were
failing.
* Remove certRequestGeneration() function
* Emit audit event when cert generations don't match
* Fully implement `tctl bots lock`
* Show bot name in `tctl bots ls`
* Lock bots when a cert generation mismatch is found
* Make CompareFailed respones from validateGenerationLabel() more actionable
* Update lib/services/local/users.go
Co-authored-by: Nic Klaassen <nic@goteleport.com>
* Backend changes for tbot IoT and AWS joining (#10360)
* backend changes
* add token permission check
* pass ctx from caller
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
* fix comment typo
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
* use UserMetadata instead of Identity in RenewableCertificateGenerationMismatch event
* Client changes for tbot IoT joining (#10397)
* client changes
* delete replaced APIs
* delete unused tbot/auth.go
* add license header
* don't unecessarily fetch host CA
* log fixes
* s/tunnelling/tunneling/
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
* auth server addresses may be proxies
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
* comment typo fix
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
* move *Server methods out of auth_with_roles.go (#10416)
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Tim Buckley <tim@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
* Address another batch of review feedback
* Addres another batch of review feedback
Add `Role.SetMetadata()`, simplify more `trace.WrapWithMessage()`
calls, clear some TODOs and lints, and address other misc feedback
items.
* Fix lint
* Add missing doc comments to SaveIdentity / LoadIdentity
* Remove pam tag from tbot build
* Update note about bot lock deletion
* Another pass of review feedback
Ensure all requestable roles exist when creating a bot, adjust the
default renewable cert TTL down to 1 hour, and check types during
`CompareAndSwapUser()`
Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
This adds an rpc UpsertKubeServiceV2 to replace UpsertKubeService. Currently, kubernetes service does not have a keepalive heartbeat unlike app, db, and windows service. This brings functionality in line with the others. This would allow for future use of the keepalive to track connected agents via prometheus metrics.
WebAuthn configuration works as follows:
If there is an explicit WebAuthn config, use it
Otherwise, try to fallback to the U2F config, copying/deriving what we can from it
Falling back to U2F allows users to easily migrate to Webauthn (in fact, if second_factor is "on" then Webauthn takes over from U2F).
See the "UX and configuration"[1] section of the RFD for reference.
[1] 0fc785dbff/rfd/9999-webauthn-support.md (ux-and-configuration)
* Add protos for WebAuthn configuration
* Add validation for types.Webauthn
* Add Webauthn to FileConfig
* goimports api/types/authentication_authpreference_test.go
* Add missing license headers
* Added support for connecting API client through tunnel proxy and web proxy addresses (with identity file).
* Added concurrent dialing logic to dial several possible dialing combinations and seamlessly return the first client to connect.
* mfa: add new second_factor options "on" and "optional"
"on" means that 2FA is required for all users, either TOTP or U2F.
"optional" means that 2FA is supported for all users, but not required.
Only users with MFA devices registered will be prompted for 2FA on
login.
The login with both supported methods is using the same API as the U2F
login. It just now supports TOTP in addition. The API endpoints are
still named after "u2f", I'll rename those in a future PR (in a
backwards-compatible way).
* Apply suggestions from code review
Co-authored-by: Gus Luxton <gus@gravitational.com>
Co-authored-by: a-palchikov <deemok@gmail.com>
* Address reivew feedback
Co-authored-by: Gus Luxton <gus@gravitational.com>
Co-authored-by: a-palchikov <deemok@gmail.com>