In `auth.Context`, the `Identity` field used to contain the original
caller identity and `User` field contained the mapped local user. These
are different, if the request comes from a remote trusted cluster.
Lots of code assumed that `auth.Context.Identity` contained the local
identity and used roles/traits from there.
To prevent this confusion, populate `auth.Context.Identity` with the
*mapped* identity, and add `auth.Context.UnmappedIdentity` for callers
that actually need it.
One caller that needs `UnmappedIdentity` is the k8s proxy. It uses that
identity to generate an ephemeral user cert. Using the local mapped
identity in that case would make the downstream server (e.g.
kubernetes_service) to treat it like a real local user, which doesn't
exist in the backend and causes trouble.
`ProcessKubeCSR` endpoint on the auth server was also updated to
understand the unmapped remote identities.
Co-authored-by: Andrew Lytvynov <andrew@goteleport.com>
Fixes#5708
OSS users loose connection to leaf clusters after upgrade of the root cluster (but not leaf clusters).
Teleport 6.0 switches users to ossuser role, this breaks implicit cluster mapping of admin to admin users.
The fix downgrades admin role to be less privileged in OSS.
This specific error happens when there are no k8s clusters registered,
which is common. Don't include the original error in the log because it
includes the entire stack trace.
* Transferred user endpoints/handlers from e
* Transferred and refactored endpoints/handlers for roles, trusted clusters, and github cn from e
* Export ok() func so e can use
* Silence rbac auth connector access denials on first check failure
* Update e-ref
With the introduction of `kube_listen_addr`, some users are confused on
how to set a public address for k8s access that's different from
`public_addr` of the proxy. `kube_public_addr` removes that confusion
and more closely resembles the other proxy endpoints.
This config:
```yaml
proxy_service:
kube_listen_addr: 0.0.0.0:3026
kube_public_addr: kube.example.com:3026
```
translates to the old format:
```yaml
proxy_service:
kubernetes:
enabled: yes
listen_addr: 0.0.0.0:3026
public_addr: kube.example.com:3026
```
Implements RFD #7https://github.com/gravitational/teleport/blob/master/rfd/0007-rbac-oss.md
OSS users can use roles. Some FedRamp related role options
are limited to enterprise.
All users are migrated to a new role "ossuser".
This role is a limited access role downgrading all users
from OSS role "admin".
All trusted clusters are mapped to "ossuser" as well.
Github connector maps teams to generated roles.
For transition period, format `tctl users add alice` works
alongside with `tctl users add alice --roles=admin`, but prints
a warning.
Cluster labels were added in 5.0 to restrict access to trusted clusters.
Enforce this restriction on `tsh login leafName` (aka `GenerateUserCerts`).
Note: access check is already enforced on actual user connections
(ssh/k8s/etc) and listing of trusted clusters (`tsh clusters`). You
cannot bypass authz to actually connect to that cluster.
* auth: add RequireSessionMFA to roles and enforce it
Enforcement kicks in when at least one role that grants access requires
it. Right now, clients don't request MFA-verified certs yet, so if a
role sets the field, it won't be usable. Next PR will add client logic
to request the special certs.
* Address review feedback
* mfa: add new second_factor options "on" and "optional"
"on" means that 2FA is required for all users, either TOTP or U2F.
"optional" means that 2FA is supported for all users, but not required.
Only users with MFA devices registered will be prompted for 2FA on
login.
The login with both supported methods is using the same API as the U2F
login. It just now supports TOTP in addition. The API endpoints are
still named after "u2f", I'll rename those in a future PR (in a
backwards-compatible way).
* Apply suggestions from code review
Co-authored-by: Gus Luxton <gus@gravitational.com>
Co-authored-by: a-palchikov <deemok@gmail.com>
* Address reivew feedback
Co-authored-by: Gus Luxton <gus@gravitational.com>
Co-authored-by: a-palchikov <deemok@gmail.com>
* adding gzip middlewear for gzipping static assets
* Only setting Content-Type if it has not been set explicitly previously, simplifying isCompressedImageRequest with strings function
* upgrading to using a pool for our gzip writers in order to increase memory efficiency
Device uniqueness is checked on `GetUsers`, so if a duplicate name
appears, any operations touching users will fail.
Check device name uniqueness on `UpsertMFADevice` to avoid this.
Also, swap the OTP device creation order on user signup/reset: only
upsert the device after validating the token.
* Fixes the scp logic to take target directory into account in sink mode.
Also expose channel error in scp client so the error is more visible to
the user. Old behavior will only output the 'exit code n' if anything
breaks.
Fixes https://github.com/gravitational/teleport/issues/5497.
* Silence 'wait: remote command exited without exit status or exit signal' error when interrupting the scp session. Leave a TODO to fix properly in a future PR
* Address review comments
* auth: API for requesting per-connection certificates
See https://github.com/gravitational/teleport/blob/master/rfd/0014-session-2FA.md#api
This API is a wrapper around GenerateUserCerts with a few differences:
- performs an MFA check before generating a cert
- enforces a single usage (ssh/k8s/db for now)
- embeds client IP in the cert
- marks a cert to distinguish from regular user certs
- enforces a 1min TTL
* Apply suggestions from code review
Co-authored-by: a-palchikov <deemok@gmail.com>
Co-authored-by: a-palchikov <deemok@gmail.com>
After adding several U2F tokens with `tsh mfa add`, you can now `tsh
login` using any of those tokens.
Two caveats:
1. The MFA method you get prompted for on login depends on the
`second_factor` config field on the auth server. There isn't yet an
option to require _either_ TOTP or U2F yet, even if you have both kinds
registered.
2. Web logins still need updating.
Also a few small unrelated changes:
- remove u2f-host binary presence check and docs
- hide `tsh mfa` commands until the feature is complete
* Use fake clock consistently in units tests.
* Split web session management into two interfaces and implement them separately for clear separation
* Split session management into New/Validate to make it aparent where the sessions are created and where existing sessions are managed. Remove ttlmap in favor of a simple map and handle expirations
explicitly.
Add web session management to gRPC server for the cache.
* Reintroduce web sessions APIs under a getter interface.
* Add SubKind to WatchKind for gRPC and add conversions from/to protobuf. Fix web sessions unit tests.
* lib/web: create/insert session context in ValidateSession if the session has not yet been added to session cache.
lib/cache: add event filter for web session in auth cache.
lib/auth: propagate web session subkind in gRPC event.
* Add implicit migrations for legacy web session key path for queries.
* Integrate web token in lib/web
* Add a bearer token when upserting a web session
* Fix tests. Use fake clock wherever possible.
* Converge session cache handling in lib/web
* Clean up and add doc comments where necessary
* Use correct form of sessions/tokens controller for ServerWithRoles. Use fake time in web tests
* Converge the web sessions/tokens handling in lib/auth to match the old behavior w.r.t access checking (e.g. implicit handling of the local user identity).
* Use cached reads and waiters only when necessary. Query sessions/tokens using best-effort - first looking in the cache and falling back to a proxy client
* Properly propagate events about deletes for values with subkind.
* Update to retrofit changes after recent teleport API refactorings
* Update comment on removing legacy code to move the deadline to 7.x
* Do not close the resources on the session when it expires - this beats the purpose of this PR.
Also avoid a race between closing the cached clients and an existing reference to the session by letting the session linger for longer before removing it.
* Move web session/token request structs to the api client proto package
* Only set HTTP fs on the web handler if the UI is enabled
* Properly tear down web session test by releasing resources at the end. Fix the web UI assets configuration by removing DisableUI and instead use the presence of assets (HTTP file system) as an indicator that the web UI has been enabled.
* Decrease the expired session cache clean up threshold to 2m. Only log the expiration error message for errors other than not found
* Add test for terminal disconnect when using two proxies in HA mode
* mfa: implement management commands in tsh
New commands are:
- tsh mfa ls
- tsh mfa add
- tsh mfa rm
There are 2 problems intentionally left in this PR to keep it small:
1. TOTP registration requires user to manually enter the secret in the
app. When there's free time, I'll add platform-specific QR code display
to make this easier.
2. U2F authentication only checks one of the registered devices. This is
a limitation of the u2f-host binary, which can't check multiple devices
at once (even if spawning multiple u2f-host commands in parallel). In
the next PR, I'll replace u2f-host with a Go library that supports this.
* Address review feedback
Add 3 new RPCs for the auth server:
- AddMFADevice
- DeleteMFADevice
- GetMFADevices
All RPCs act on the user calling them, rather than specifying the user
in parameters. It's one less thing to validate and also prevents authz
bugs with one user messing with other user's MFA devices.
Add and Delete RPCs are streaming both ways, to allow MFA using an
existing device (prevents MFA bypass) and a challenge/response
registration used in U2F and TOTP. This approach makes the challenge
bound to the RPC connection and doesn't require backend storage.
Each user can now have multiple devices. This commit only changes the
backend structure to support it, the client and API haven't been updated
yet.
Also added a migration for existing MFA data on auth server startup.
This is just a refactoring without functional changes. Pull all the u2f
handling spread across multiple client and server packages into one
place.
Also clean up an obsolete vendored dependency, unrelated to this PR.
* Fix truncated audit-log when using DynamoDB
This is a fix for #4977. Teleport will continue to query DynamoDB until
the response doesn't contain a `LastEvaluatedKey` anymore, which signals
the end of the result set.
Co-authored-by: Alexey Kontsevoy <biz.kovoy@gmail.com>
* When exporting a kubeconfig, optionally overwrite the target file
Running `tctl auth sign ---out=filepath` or `tsh login --out=filepath`
might overwrite the target `filepath` regardless of its existing
contents.
Make the tools prompt the user before overwriting (by default) and
provide a flag to always force the overwrite (for automation).
Without overwrites, writing kubeconfig could fail if parsing the
existing file as a kubeconfig fails.
Proxy protocol is sometimes used by load balancers to communicate the
real client IP address. Re-use the detection/parsing code from
lib/multiplexer on all k8s listeners (proxy and kubernetes_service)
If "NewServerContext" returns an error, then the error is logged using
the returned context which is nil causing a panic.
This change always uses the logger attached to the server instead.
The logic of `auth.Register` is adapted so that it attempts the proxy mode connection first in case the first of the auth servers to register with has port `defaults.HTTPListenPort` (= 3080).
* Add -p flag to scp
* Add support for preserving access/modification times on files/directories when copying files between hosts.
* lib/sshutils/scp: add time statting for directories
* Add directory handling for scp
* Rewrite scp tests with testify
* Address review comments
- detect whether k8s support is on based on proxy advertising a k8s port
- make sure proxy advertised k8s port is updated on re-login
- don't touch user's kubeconfig if k8s support is disabled in proxy
* Add AccessRequest access to userACL
* Define requestable roles
* Update UI test plan to include access request
* Edit testplan and fix whitespace issue
* When renewing session, set expiry to the shortest time
This commit fixes#5177
Initial implementation uses dir backend as a cache and is OK
for small clusters, but will be a problem for many proxies.
This implementation uses Go autocert that is quite limited
compared to Caddy's certmagic or lego.
Autocert has no OCSP stapling and no locking for cache for example.
However, it is much simpler and has no dependencies.
It will be easier to extend to use Teleport backend as a cert cache.
```yaml
proxy_service:
public_addr: ['example.com']
# ACME - automatic certificate management environment.
#
# It provisions certificates for domains and
# valid subdomains in public_addr section.
#
# The sudomains are valid if there is a registered application.
# For example, app.example.com will get a cert if app is a regsitered
# application access app. The sudomain cookie.example.com is not.
#
# Teleport acme is using TLS-ALPN-01 challenge:
#
# https://letsencrypt.org/docs/challenge-types/#tls-alpn-01
#
acme:
# By default acme is disabled.
enabled: true
# Use a custom URI, for example staging is
#
# https://acme-staging-v02.api.letsencrypt.org/directory
#
# Default is letsencrypt.org production URL:
#
# https://acme-v02.api.letsencrypt.org/directory
uri: ''
# Set email to receive alerts and other correspondence
# from your certificate authority.
email: 'alice@example.com'
```
* Make k8s errors responses decode-able by kubectl
`kubectl` expects a k8s `Status` object in error responses.
Intercept generic handler errors and forwarder errors, and wrap them in
a `Status` object.
* Use strict teleport.yaml validation in warning mode
Strict YAML validation catches the cases where a valid config key is
placed in the wrong location in the config. These errors were not
caught by the old validation.
The failure is always reported, but only fails startup when both old and
new validations fail. This will let the users fix their configs during
6.0 release and we will start enforcing it in 7.0.
Example:
```yaml
auth_service:
data_dir: "/foo" # this field must live under "teleport:", not "auth_service:"
```
Output:
```
$ teleport start -c teleport-invalid.yaml
ERRO "Teleport configuration is invalid: yaml: unmarshal errors:\n line 6: field data_dir not found in type config.Auth." config/fileconf.go:303
ERRO This error will be enforced in the next Teleport release. config/fileconf.go:304
[AUTH] Auth service 5.0.0-dev:v4.4.0-alpha.1-262-g307040886-dirty is starting on 0.0.0.0:3025.
... continues startup ...
```
* Remove newlines from YAML error
The HTTP request context is canceled when the client disconnects. Using
this context in the session recorder prevents it from uploading the
session when it's finished.
Use the server context instead, to prevent lost recordings.
* Add logger attributes to be able to propagate logger from tests for identifying tests
* Add test case for Server's DeepCopy.
* Update test to using the testing package directly. Update dependency after upstream PR.
* kube: emit audit events using process context
Using the request context can prevent audit events from getting emitted,
if client disconnected and request context got closed.
We shouldn't be losing audit events like that.
Also, log all response errors from exec handler.
* kube: cleanup forwarder code
Rename a few config fields to be more descriptive.
Avoid embedding unless necessary, to keep the package API clean.
* kube: cache only user certificates, not the entire session
The expensive part that we need to cache is the client certificate.
Making a new one requires a round-trip to the auth server, plus entropy
for crypto operations.
The rest of clusterSession contains request-specific state, and only
adds problems if cached.
For example: clusterSession stores a reference to a remote teleport
cluster (if needed); caching requires extra logic to invalidate the
session when that cluster disappears (or tunnels drop out). Same problem
happens with kubernetes_service tunnels.
Instead, the forwarder now picks a new target for each request from the
same user, providing a kind of "load-balancing".
* Init session uploader in kubernetes service
It's started in all other services that upload sessions (app/proxy/ssh),
but was missing here. Because of this, the session storage directory for
async uploads wasn't created on disk and caused interactive sessions to
fail.
* Update logrus package to fix data races
* Introduce a logger that uses the test context to log the messages so they are output if a test fails for improved trouble-shooting.
* Revert introduction of test logger - simply leave logger configuration at debug level outputting to stderr during tests.
* Run integration test for e as well
* Use make with a cap and append to only copy the relevant roles.
* Address review comments
* Update integration test suite to use test-local logger that would only output logs iff a specific test has failed - no logs from other test cases will be output.
* Revert changes to InitLoggerForTests API
* Create a new logger instance when applying defaults or merging with file service configuration
* Introduce a local logger interface to be able to test file configuration merge.
* Fix kube integration tests w.r.t log
* Move goroutine profile dump into a separate func to handle parameters consistently for all invocations
Without this, deleted kube_services linger in the backend and show up as
obsolete kubernetes clusters in tsh.
Ideally, this TTL logic should be enforced centrally, but I'd like to
fix the bug first, and do a larger refactoring later.
* benchmark package
* use default config if path is not specified
* progressiveBench as a config method
* implement a main.go approach to run progressive tests
* make teleport client, run specified benchmark
* function and method descriptions
* make teleport client
* testing
* change interface method signatures
* dry up bench.go code, move producer goroutines to own function
* output formatting
* remove yaml
* fix linter errors
* remove print
* PR suggested changes, moved export latency profile functionality to the benchmark package
* PR fixes
* method description
* update testing
* linter
* docs and example
* PR suggestion changes
* fix coord omission bug
* remove benchmark struct
* remove threads, using open system
* recover in run
* close channel, check if open with each execution
* update testing, pr suggestions
* add more instructions to readme
* update example.go
* pass back context
* use SyncBuffer
* export response and service histograms
* update readme, exporting profiles section
* return from execute()
* export singular latency profile
* export response profile
* Revert "export response profile"
This reverts commit 5a21cb034c.
* export response profile
* update branch
* format example.go
* remove threads
* update example.go
* update branch
* goimports
* add signal handler & update docs
* PR suggestions
* exit out of interactive session
* revert execute
* PR suggestion
* run commmand on non-interactive instead of nil
Streaming requests, like `kubectl logs -f` will slowly write response
data over time. The `http.ResponseWriter` wrapper we added for capturing
the response code didn't propagate `http.Flusher` interface and
prevented the forwarder library from periodically flushing response
contents.
This caused `kubectl logs -f` results to be delayed, delivered in
batches as some internal buffer filled up.
This sets a useful server IP, when no advertise_ip is set. Previously,
the address was taken from the listener, and is usually "0.0.0.0:3022"
or "[::]:3022".
Also, add some test cases in utils for IPv6 handling.
- 'tsh kube login' fetches the latest list of kube clusters instead of
only using existing kubeconfig contexts.
This makes 'tsh kube login' succeed when a kube cluster was added
after last 'tsh login'.
- 'tsh kube ls' no longer wrongly marks selected clusters, if they
weren't generated by tsh.
- 'tctl rm' now works with kube_service objects.
- 'tsh login' now updates kubeconfig entries when a login session is
already active
- 'teleport.yaml' now uses 'labels' and 'commands' for RBAC labels on
kubernetes_service; this is consistent with ssh and app services.
Updated default admin role to support reading services.KindProxy. This
is needed by "tctl" when using credentials from ~/.tsh to generate the
join message.
Added fields:
- kube users/groups
- pod name/namespace
- container name/image
- node name
Container image and node name need to be fetched from the k8s API, they
are not known from just the client request. This fetch is optional, and
if it fails (like due to permission errors), those fields will be
missing.
Since kubernetes_service can talk to k8s API and proxy_service can't,
all session events are now emitted by kubernetes_service and skipped by
the proxy (used to be the other way around).
The `KubernetesClusters` field in `ServerSpecV2` used to be a
`[]string`:
https://github.com/gravitational/teleport/pull/4354/files#diff-50ec8b71306e75db3cb193b581cdd51139b03f90e23e7804cbef7edf712bbfac
Later, it was changed to `[]*services.KubernetesCluster`, which is
incompatible when parsing.
Unfortunately, the string version slipped into 4.4. When upgrading to
5.0, teleport fails to parse the old server object at startup and
crashes.
Rename the JSON tag from `kubernetes_clusters` to `kube_clusters` to
distinguish the different versions of this field when parsing. The old
`kubernetes_clusters` will just be ignored.
Our current parsing code runtime grows exponentially with nested
selectors (e.g. '{{a.b.c.d.e.f}}'), mostly due to memory churn from
slice allocations. With 100,000 levels of selectors, parsing takes ~80s
on my machine.
If an attacker can submit these expressions for parsing, they can DoS
the auth server with relatively small payloads (<1MB).
All real-world expressions are <10 AST nodes deep. Add a sanity check of
1000 levels to protect against malicious inputs.
We can optimize the code later on, but it's not very useful for real
world performance.
This commit fixes#4695.
Teleport in async recording mode sends all events to disk,
and uploads them to the server later.
It uploads some events synchronously to the audit log so
they show up in the global event log right away.
However if the auth server is slow, the fanout blocks the session.
This commit makes the fanout of some events to be fast,
but nonblocking and never fail so sessions will not hang
unless the disk writes hang.
It adds a backoff period and timeout after which some
events will be lost, but session will continue without locking.
improves the reliability and correctness of the cache via
various small improvements, including preventing reads of
partially initialized/reset state, and delaying watcher
init events until unhealthy states recover.
fixes an issue where reads could result in missing or
inconsistent results.
When the user does not have a session, if the user tries to access a
proxied application at it's FQDN, Teleport does best effort resolution.
This fix changes the behavior of what happens when the user has a
session but the session is expired. The user was being redirected to the
login page. This fix changes the behavior to by in sync with the
no-session behavior in doing best effort resolution.
A proxy running in pre-5.0 mode (e.g. with local kubeconfig) should
register an entry in `tsh kube clusters`.
After upgrading to 5.0, without migration to kubernetes_service, all the
new `tsh kube` commands will work as expected.
Added validation check that ensures application names are valid DNS
subdomains. This is because and application name can potentially be used
in the DNS name of the application if either a public address is not
provided or the application is accessed via a trusted cluster.
* Add labels to KubernetesCluster resources
Plumb from config to the registered object, keep dynamic labels updated.
* Check kubernetes RBAC
Checks are in some CRUD operations on the auth server and in the
kubernetes forwarder (both proxy or kubernetes_service).
The logic is essentially copy-paste of the TAA version.
1. `tsh kube clusters` - lists registered kubernetes clusters
note: this only includes clusters connected via `kubernetes_service`
2. `tsh kube credentials` - returns TLS credentials for a specific kube
cluster; this is a hidden command used as an exec plugin for kubectl
3. `tsh kube login` - switches the kubectl context to one of the
registered clusters; roughly equivalent to `kubectl config
use-context`
When updating kubeconfigs, tsh now uses the exec plugin mode:
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
This means that on each kubectl run, kubectl will execute tsh with
special arguments to get the TLS credentials.
Using tsh as exec plugin allows us to put a login prompt when certs
expire. It also lets us lazy-initialize TLS certs for kubernetes
clusters.
Updated storage configuration to not only apply to DynamoDB in the
backend package, but also DynamoDB in the events package. This allows
configuring continuous backups and auto scaling for the events table.
Pass the target address, in the case of application access
services.LocalNode, in the Dial request to the reverse tunnel subsystem
instead of filling it in within the reverse tunnel subsystem.
This change has several parts: cluster registration, cache updates,
routing and a new tctl flag.
> cluster registration
Cluster registration means adding `KubernetesClusters` to `ServerSpec`
for servers with `KindKubeService`.
`kubernetes_service` instances will parse their kubeconfig or local
`kube_cluster_name` and add them to their `ServerSpec` sent to the auth
server. They are effectively declaring that "I can serve k8s requests
for k8s cluster X".
> cache updates
This is just cache plumbing for `kubernetes_service` presence, so that
other teleport processes can fetch all of kube services. It was missed
in the previous PR implementing CRUD for `kubernetes_service`.
> routing
Now the fun part - routing logic. This logic lives in
`/lib/kube/proxy/forwarder.go` and is shared by both `proxy_service`
(with kubernetes integration enabled) and `kubernetes_service`.
The target k8s cluster name is passed in the client cert, along with k8s
users/groups information.
`kubernetes_service` only serves requests for its direct k8s cluster
(from `Forwarder.creds`) and doesn't route requests to other teleport
instances.
`proxy_service` can serve requests:
- directly to a k8s cluster (the way it works pre-5.0)
- to a leaf teleport cluster (also same as pre-5.0, based on
`RouteToCluster` field in the client cert)
- to a `kubernetes_service` (directly or over a tunnel)
The last two modes require the proxy to generate an ephemeral client TLS
cert to do an outbound mTLS connection.
> tctl flag
A flag `--kube-cluster-name` for `tctl auth sign --format=kubernetes`
which allows generating client certs for non-default k8s cluster name
(as long as it's registered in a cluster).
I used this for testing, but it could be used for automation too.
Various improvements related to extending the dynamic access
API, including:
- Support for users with no statically defined roles.
- Unify trait mapping logic (e.g. claims_to_roles) across
the connector types.
- Support for matcher syntax and claims_to_roles mappings when
configuring which roles a user is able to request.
- Allow tsh or the web UI to automatically generate wildcard
access requests when dictated by role configuration.
- Allow RBAC configuration to attach annotations to pending
access requests which can be consumed by plugins.
- Allow plugins to attach annotations to approvals/denials
which appear in the audit log, and may also be looked up
later to determine additional info about a resolution.
- Support prompts, request reasons, and approval/denial
reasons for access requests.
* lib/web: update logging to go through a package-level logger.
Updates https://github.com/gravitational/teleport/issues/4110.
* Unify uses of package-level logger. Update e
* Fix linter warning and tests
* Address review comments
Fixes#3604
This commit adds support for cluster_labels
role parameter limiting access to remote clusters by label.
New tctl update rc provides interface to set labels on remote clusters.
Consider two clusers, `one` - root and `remote` - leaf.
```bash
$ tsh clusters
Cluster Name Status
------------ ------
one online
two online
```
Create the trusted cluster join token with labels:
```bash
$ tctl tokens add --type=trusted_cluster --labels=env=prod
```
Every cluster joined using this token will inherit env:prod labels.
Alternatively, update remote cluster labels by modifying
`rc` command. Letting remote clusters to propagate their labels
creates a problem of rogue clusters updating their labels to bad values.
Instead, administrator of root cluster control the labels
using remote clusters API without fear of override:
```bash
$ tctl get rc
kind: remote_cluster
metadata:
name: two
status:
connection: online
last_heartbeat: "2020-09-14T03:13:59.35518164Z"
version: v3
```
```bash
$ tctl update rc/two --set-labels=env=prod
cluster two has been updated
```
```bash
$ tctl get rc
kind: remote_cluster
metadata:
labels:
env: prod
name: two
status:
connection: online
last_heartbeat: "2020-09-14T03:13:59.35518164Z"
```
Update the role to deny access to prod env:
```yaml
kind: role
metadata:
name: dev
spec:
allow:
logins: [root]
node_labels:
'*': '*'
# Cluster labels control what clusters user can connect to. The wildcard ('*') means
# any cluster. If no role in the role set is using labels and cluster is not labeled,
# the cluster labels check is not applied. Otherwise, cluster labels are always enforced.
# This makes the feature backwards-compatible.
cluster_labels:
'env': 'staging'
deny:
# cluster labels control what clusters user can connect to. The wildcard ('*') means
# any cluster. By default none is set in deny rules to preserve backwards compatibility
cluster_labels:
'env': 'prod'
```
```bash
$ tctl create -f dev.yaml
```
Cluster two is now invisible to user with `dev` role.
```bash
$ tsh clusters
Cluster Name Status
------------ ------
one online
```
Added support for an identity aware, RBAC enforcing, mutually
authenticated, web application proxy to Teleport.
* Updated services.Server to support an application servers.
* Updated services.WebSession to support application sessions.
* Added CRUD RPCs for "AppServers".
* Added CRUD RPCs for "AppSessions".
* Added RBAC support using labels for applications.
* Added JWT signer as a services.CertAuthority type.
* Added support for signing and verifying JWT tokens.
* Refactored dynamic label and heartbeat code into standalone packages.
* Added application support to web proxies and new "app_service" to
proxy mutually authenticated connections from proxy to an internal
application.
* Implement kubernetes_service registration and sratup
The new service now starts, registers (locally or via a join token) and
heartbeats its presence to the auth server.
This service can handle k8s requests (like a proxy) but not to remote
teleport clusters. Proxies will be responsible for routing those.
The client (tsh) will not yet go to this service, until proxy routing is
implemented. I manually tweaked server addres in kubeconfig to test it.
You can also run `tctl get kube_service` to list all registered
instances. The self-reported info is currently limited - only listening
address is set.
* Address review feedback
This is a shorthand for the larger kubernetes section:
```
proxy_service:
kube_listen_addr: "0.0.0.0:3026"
```
if equivalent to:
```
proxy_service:
kubernetes:
enabled: yes
listen_addr: "0.0.0.0:3026"
```
This shorthand is meant to be used with the new `kubernetes_service`:
https://github.com/gravitational/teleport/pull/4455
It reduces confusion when both `proxy_service` and `kubernetes_service`
are configured in the same process.
* Make k8s permissions test optional
There are several legitimate cases where it can fail:
- root proxy running inside k8s but without access to local k8s cluster
- root proxy running with a dummy kubeconfig that we recommended in the
past
Leave a ForwarderConfig flag to enforce this check, it will be useful in
kubernetes_service later that should always have the right permissions.
This commit fixes#4598
Config with multiple event backends was crashing on 4.4:
```yaml
storage:
audit_events_uri: ['dynamodb://streaming', 'stdout://', 'dynamodb://streaming2']
```
Uploader retries slower on network errors and picks the pace
after any upload has succeeded.
Records that were corrupted, will never get uploaded.
The uploader will create streams indefinitely, clogging the auth server
with streams. Now uploader writes marker for bad session uploads
and does not attempt to reupload.
* Fix local etcd test failures when etcd is not running
* Add kubernetes_service to teleport.yaml
This plumbs config fields only, they have no effect yet.
Also, remove `cluster_name` from `proxy_config.kubernetes`. This field
will only exist under `kubernetes_service` per
https://github.com/gravitational/teleport/pull/4455
* Handle IPv6 in kubernetes_service and rename label fields
* Disable k8s cluster name defaulting in user TLS certs
Need to implement service registration first.
Most users won't need this, so the behavior is optional. Default system
configs will usually trigger a password prompt, which is why this
feature is disabled by default.
`require` is a sister package to `assert` that terminates the test on
failure. `assert` records the failure but lets the test proceed, which
is un-intuitive.
Also update all existing tests to match.
This commit fixes#4511.
Fanout watcher.Close method was cancelling
the context, but did was not removing the watcher
from the fanout list.
GRPC server was not releasing memory buffers associated
with the streams after clients disconnects.
Goroutines associated with the GRPC server were closed,
but buffers remained in memory:
https://github.com/gravitational/teleport/issues/4511https://github.com/grpc/grpc-go/issues/3728#issuecomment-695883580
In Go, child context created with context.WithValue(parent)
references parent context and parent context references
child context back.
When the parent context is closed, it removes the child references,
but the child keeps referencing the parent context.
If the child context is leaked, objects associated
with the parent context are not garbage collected.
GRPC UnaryInterceptor created context.WithValue(ctx, User)
to add a user and passed this context to methods.
WatchEvents GRPC server created services.Fanout.Watcher
and referenced the child context.
Fanout watcher Close method did not remove the watcher
from the fanout buffer list causing the leak.
IOT nodes could not reconnect in rollback state because
cert authority was missing the new SSH public key.
IOT nodes were authenticating using new certificate
and were rejected.
This comit fixes#4508
Gogoproto is not compatible with APIv2 protoc-gen-go.
Track the issue here: https://github.com/gogo/protobuf/issues/678
Meanwhile, this commit switches to google protobuf to unmarshal firebase struct.
Add a missing method EmitAuditEvent causing teleport to crash
with firestore events backend.
Prior to https://github.com/gravitational/teleport/pull/3811, if users
wanted to run a root proxy without k8s clusters but leaf proxies with
k8s, they had to put a dummy `kubeconfig_file` on the root proxy.
The permissions self-test added in
https://github.com/gravitational/teleport/pull/3812 didn't take that
into account.
So, users who keep the old workaround and upgrade to 4.4 will see their
proxies fail to start. To recover, they have to realize that
`kubeconfig_file` can be removed.
In the "catch all" handler of k8s proxy, emit a new event KubeRequest
containing relevant request info.
Do best-effort parsing of the URL path to extract API fields like
namespace, resource name and kind.
Events related to discovery are suppressed due to their spamminess.
Previously, we needed:
- create on namespaces
- impersonate on all users/groups/service accounts
- list pods in kube-system namespace (via teleport-ci-test-group)
- exec/portforward on kube-dns pod in kube-system namespace (via teleport-ci-test-group)
Now, we need:
- create on namespaces
- create on pods in namespace teletest
- impersonate on all users/groups
- get/exec/portforward on pod test-pod in namespace teletest (via teleport-ci-test-group)
Unfortunately, `resourceNames` in RBAC doesn't work with `create` verbs,
so we can't scope down impersonation to just the right users/groups.
K8s cluster name is specified during login (for now) and gets plumbed to
a new extensions on the TLS cert. The name is validated against all
known k8s clusters reported via proxy heartbeats. If no name is
provided, the extension remains empty.
The name in the cert will get used by proxies for routing, once we fully
support multiple k8s clusters per teleport cluster.
This was tested with direct and github login flows.
First, this is unexpected behavior. If `tsh` fails using the identity
file, it should tell the user why and exit, instead of masking it.
Second, this can lead to a segfault, since the `TeleportClient` isn't
fully initialized for logins (e.g. uses a half-initialized Agent).
When running 'tctl auth sign' as an admin, we override the TTL on
roles/logins to allow making long-lived creds.
We didn't do that for k8s users/groups and silently filtered them out.
This change makes them consistent.
Cluster name from this field plug all clusters from kubeconfig are
stored on the auth server via heartbeats.
This info will later be used to route k8s requests back to proxies.
Updates https://github.com/gravitational/teleport/issues/3952
Matchers use a similar syntax to Expressions, but behave differently:
- Expressions get evaluated - they interpolate some values and return a
final string.
- Matchers check whether some string matches a value
Matchers implement the same logic as utils.SliceMatchesRegex and add 2
new functions:
- {{regexp.match("foo")}} - match input against a raw regex
- {{regexp.not_match("foo")}} - same as match, but inverts the result
No need to handle literal expressions (e.g. without "{{foo.bar}}"
substitutions) at the higher level. Something like "foo" is a valid
expression which always returns "foo" regardless of traits.
This helps with ELB and similar L5 load balancers that don't respect
TCP-level keep-alives. ELB for example kills connections after 60s of no
application traffic.
With this PR, you can leave a `kubectl exec` session open indefinitely
without any activity.
Use the reverse tunnel endpoint, similar to IoT nodes, to connect to the
auth server. Also add an `--insecure` flag, similar to tsh, for testing
with self-signed certs on the proxy.
This commit introduces GRPC API for streaming sessions.
It adds structured events and sync streaming
that avoids storing events on disk.
You can find design in rfd/0002-streaming.md RFD.
Adds support for Concurrent Session Control and a new
semaphore API. Roles now support two new configuration
options, `max_ssh_connections` and `max_ssh_sessions`
which correspond to the total number of authenticated
ssh connections per cluster, and the number of ssh sessions
within a connection respectively. Attempting to exceed
these limits generate variants of the `session.rejected`
audit event and cause the connection/session to be
rejected.
* Split remote cluster watching from reversetunnel.AgentPool
Separating the responsibilities:
- AgentPool takes a proxy (or LB) endpoint and manages a pool of agents
for it (each agent is a tunnel to a unique proxy process behind the
endpoint)
- RemoteClusterTunnelManager polls the auth server for a list of trusted
clusters and manages a set of AgentPools, one for each trusted cluster
Previously, AgentPool did both of the above.
Also, bundling some cleanup in the area:
- better error when dialing through tunnel and directly both fail
- rename RemoteKubeProxy to LocalKubernetes to better reflect the
meaning
- remove some dead code and simplify config structs
* reversetunnel: factor out track.Key
ClusterName is the same for all Agents in an AgentPool. track.Tracker
needs to only track proxy addresses.
* Always collect metrics about top backend requests
Previously, it was only done in debug mode. This makes some tabs in
`tctl top` empty, when auth server is not in debug mode.
* backend: use an LRU cache for top requests in Reporter
This LRU cache tracks the most frequent recent backend keys. All keys in
this cache map to existing labels in the requests metric. Any evicted
keys are also deleted from the metric.
This will keep an upper limit on our memory usage while still always
reporting the most active keys.
* DynamoDB: Build http transport from defaults before manipulating parameters, this allows the transport to be pre-populated with proxy information if set by HTTPS_PROXY/NO_PROXY environment variables.