This commit fixes#5177
Initial implementation uses dir backend as a cache and is OK
for small clusters, but will be a problem for many proxies.
This implementation uses Go autocert that is quite limited
compared to Caddy's certmagic or lego.
Autocert has no OCSP stapling and no locking for cache for example.
However, it is much simpler and has no dependencies.
It will be easier to extend to use Teleport backend as a cert cache.
```yaml
proxy_service:
public_addr: ['example.com']
# ACME - automatic certificate management environment.
#
# It provisions certificates for domains and
# valid subdomains in public_addr section.
#
# The sudomains are valid if there is a registered application.
# For example, app.example.com will get a cert if app is a regsitered
# application access app. The sudomain cookie.example.com is not.
#
# Teleport acme is using TLS-ALPN-01 challenge:
#
# https://letsencrypt.org/docs/challenge-types/#tls-alpn-01
#
acme:
# By default acme is disabled.
enabled: true
# Use a custom URI, for example staging is
#
# https://acme-staging-v02.api.letsencrypt.org/directory
#
# Default is letsencrypt.org production URL:
#
# https://acme-v02.api.letsencrypt.org/directory
uri: ''
# Set email to receive alerts and other correspondence
# from your certificate authority.
email: 'alice@example.com'
```
* Update logrus package to fix data races
* Introduce a logger that uses the test context to log the messages so they are output if a test fails for improved trouble-shooting.
* Revert introduction of test logger - simply leave logger configuration at debug level outputting to stderr during tests.
* Run integration test for e as well
* Use make with a cap and append to only copy the relevant roles.
* Address review comments
* Update integration test suite to use test-local logger that would only output logs iff a specific test has failed - no logs from other test cases will be output.
* Revert changes to InitLoggerForTests API
* Create a new logger instance when applying defaults or merging with file service configuration
* Introduce a local logger interface to be able to test file configuration merge.
* Fix kube integration tests w.r.t log
* Move goroutine profile dump into a separate func to handle parameters consistently for all invocations
This sets a useful server IP, when no advertise_ip is set. Previously,
the address was taken from the listener, and is usually "0.0.0.0:3022"
or "[::]:3022".
Also, add some test cases in utils for IPv6 handling.
Added fields:
- kube users/groups
- pod name/namespace
- container name/image
- node name
Container image and node name need to be fetched from the k8s API, they
are not known from just the client request. This fetch is optional, and
if it fails (like due to permission errors), those fields will be
missing.
Since kubernetes_service can talk to k8s API and proxy_service can't,
all session events are now emitted by kubernetes_service and skipped by
the proxy (used to be the other way around).
Our current parsing code runtime grows exponentially with nested
selectors (e.g. '{{a.b.c.d.e.f}}'), mostly due to memory churn from
slice allocations. With 100,000 levels of selectors, parsing takes ~80s
on my machine.
If an attacker can submit these expressions for parsing, they can DoS
the auth server with relatively small payloads (<1MB).
All real-world expressions are <10 AST nodes deep. Add a sanity check of
1000 levels to protect against malicious inputs.
We can optimize the code later on, but it's not very useful for real
world performance.
Fixes#3604
This commit adds support for cluster_labels
role parameter limiting access to remote clusters by label.
New tctl update rc provides interface to set labels on remote clusters.
Consider two clusers, `one` - root and `remote` - leaf.
```bash
$ tsh clusters
Cluster Name Status
------------ ------
one online
two online
```
Create the trusted cluster join token with labels:
```bash
$ tctl tokens add --type=trusted_cluster --labels=env=prod
```
Every cluster joined using this token will inherit env:prod labels.
Alternatively, update remote cluster labels by modifying
`rc` command. Letting remote clusters to propagate their labels
creates a problem of rogue clusters updating their labels to bad values.
Instead, administrator of root cluster control the labels
using remote clusters API without fear of override:
```bash
$ tctl get rc
kind: remote_cluster
metadata:
name: two
status:
connection: online
last_heartbeat: "2020-09-14T03:13:59.35518164Z"
version: v3
```
```bash
$ tctl update rc/two --set-labels=env=prod
cluster two has been updated
```
```bash
$ tctl get rc
kind: remote_cluster
metadata:
labels:
env: prod
name: two
status:
connection: online
last_heartbeat: "2020-09-14T03:13:59.35518164Z"
```
Update the role to deny access to prod env:
```yaml
kind: role
metadata:
name: dev
spec:
allow:
logins: [root]
node_labels:
'*': '*'
# Cluster labels control what clusters user can connect to. The wildcard ('*') means
# any cluster. If no role in the role set is using labels and cluster is not labeled,
# the cluster labels check is not applied. Otherwise, cluster labels are always enforced.
# This makes the feature backwards-compatible.
cluster_labels:
'env': 'staging'
deny:
# cluster labels control what clusters user can connect to. The wildcard ('*') means
# any cluster. By default none is set in deny rules to preserve backwards compatibility
cluster_labels:
'env': 'prod'
```
```bash
$ tctl create -f dev.yaml
```
Cluster two is now invisible to user with `dev` role.
```bash
$ tsh clusters
Cluster Name Status
------------ ------
one online
```
Added support for an identity aware, RBAC enforcing, mutually
authenticated, web application proxy to Teleport.
* Updated services.Server to support an application servers.
* Updated services.WebSession to support application sessions.
* Added CRUD RPCs for "AppServers".
* Added CRUD RPCs for "AppSessions".
* Added RBAC support using labels for applications.
* Added JWT signer as a services.CertAuthority type.
* Added support for signing and verifying JWT tokens.
* Refactored dynamic label and heartbeat code into standalone packages.
* Added application support to web proxies and new "app_service" to
proxy mutually authenticated connections from proxy to an internal
application.
* Implement kubernetes_service registration and sratup
The new service now starts, registers (locally or via a join token) and
heartbeats its presence to the auth server.
This service can handle k8s requests (like a proxy) but not to remote
teleport clusters. Proxies will be responsible for routing those.
The client (tsh) will not yet go to this service, until proxy routing is
implemented. I manually tweaked server addres in kubeconfig to test it.
You can also run `tctl get kube_service` to list all registered
instances. The self-reported info is currently limited - only listening
address is set.
* Address review feedback
Uploader retries slower on network errors and picks the pace
after any upload has succeeded.
Records that were corrupted, will never get uploaded.
The uploader will create streams indefinitely, clogging the auth server
with streams. Now uploader writes marker for bad session uploads
and does not attempt to reupload.
`require` is a sister package to `assert` that terminates the test on
failure. `assert` records the failure but lets the test proceed, which
is un-intuitive.
Also update all existing tests to match.
Matchers use a similar syntax to Expressions, but behave differently:
- Expressions get evaluated - they interpolate some values and return a
final string.
- Matchers check whether some string matches a value
Matchers implement the same logic as utils.SliceMatchesRegex and add 2
new functions:
- {{regexp.match("foo")}} - match input against a raw regex
- {{regexp.not_match("foo")}} - same as match, but inverts the result
No need to handle literal expressions (e.g. without "{{foo.bar}}"
substitutions) at the higher level. Something like "foo" is a valid
expression which always returns "foo" regardless of traits.
This commit introduces GRPC API for streaming sessions.
It adds structured events and sync streaming
that avoids storing events on disk.
You can find design in rfd/0002-streaming.md RFD.
Adds support for Concurrent Session Control and a new
semaphore API. Roles now support two new configuration
options, `max_ssh_connections` and `max_ssh_sessions`
which correspond to the total number of authenticated
ssh connections per cluster, and the number of ssh sessions
within a connection respectively. Attempting to exceed
these limits generate variants of the `session.rejected`
audit event and cause the connection/session to be
rejected.
* return only trace.NotFound
* supressed error message when user does not have ~/.tsh directory
* return only trace.NotFound
* handled permission issue and file path not found error
* formatting
* nit
* Make milv happy
* Revert "Make milv happy"
This reverts commit 5a24e9f725.
Co-authored-by: Ben Arent <ben@gravitational.com>
List of fixed items:
```
integration/helpers.go:1279:2 gosimple S1000: should use for range instead of for { select {} }
integration/integration_test.go:144:5 gosimple S1009: should omit nil check; len() for nil slices is defined as zero
integration/integration_test.go:173:5 gosimple S1009: should omit nil check; len() for nil slices is defined as zero
integration/integration_test.go:296:28 gosimple S1019: should use make(chan error) instead
integration/integration_test.go:570:41 gosimple S1019: should use make(chan interface{}) instead
integration/integration_test.go:685:40 gosimple S1019: should use make(chan interface{}) instead
integration/integration_test.go:759:33 gosimple S1019: should use make(chan string) instead
lib/auth/init_test.go:62:2 gosimple S1021: should merge variable declaration with assignment on next line
lib/auth/tls_test.go:1658:22 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
lib/backend/dynamo/dynamodbbk.go:420:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/dynamo/dynamodbbk.go:656:12 gosimple S1039: unnecessary use of fmt.Sprintf
lib/backend/etcdbk/etcd.go:458:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/firestore/firestorebk.go:407:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/lite/lite.go:317:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/lite/lite.go:336:6 gosimple S1004: should use !bytes.Equal(value, expected.Value) instead
lib/backend/memory/memory.go:365:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/memory/memory.go:376:5 gosimple S1004: should use !bytes.Equal(existingItem.Value, expected.Value) instead
lib/backend/test/suite.go:327:10 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
lib/client/api.go:1410:9 gosimple S1003: should use strings.ContainsRune(name, ':') instead
lib/client/api.go:2355:32 gosimple S1019: should use make([]ForwardedPort, len(spec)) instead
lib/client/keyagent_test.go:85:2 gosimple S1021: should merge variable declaration with assignment on next line
lib/client/player.go:54:33 gosimple S1019: should use make(chan int) instead
lib/config/configuration.go:1024:52 gosimple S1019: should use make(services.CommandLabels) instead
lib/config/configuration.go:1025:44 gosimple S1019: should use make(map[string]string) instead
lib/config/configuration.go:930:21 gosimple S1003: should use strings.Contains(clf.Roles, defaults.RoleNode) instead
lib/config/configuration.go:931:22 gosimple S1003: should use strings.Contains(clf.Roles, defaults.RoleAuthService) instead
lib/config/configuration.go:932:23 gosimple S1003: should use strings.Contains(clf.Roles, defaults.RoleProxy) instead
lib/service/supervisor.go:387:2 gosimple S1001: should use copy() instead of a loop
lib/tlsca/parsegen.go:140:9 gosimple S1034: assigning the result of this type assertion to a variable (switch generalKey := generalKey.(type)) could eliminate type assertions in switch cases
lib/utils/certs.go:140:9 gosimple S1034: assigning the result of this type assertion to a variable (switch generalKey := generalKey.(type)) could eliminate type assertions in switch cases
lib/utils/certs.go:167:40 gosimple S1010: should omit second index in slice, s[a:len(s)] is identical to s[a:]
lib/utils/certs.go:204:5 gosimple S1004: should use !bytes.Equal(certificateChain[0].SubjectKeyId, certificateChain[0].AuthorityKeyId) instead
lib/utils/parse/parse.go:116:45 gosimple S1003: should use strings.Contains(variable, "}}") instead
lib/utils/parse/parse.go:116:6 gosimple S1003: should use strings.Contains(variable, "{{") instead
lib/utils/socks/socks.go:192:10 gosimple S1025: should use String() instead of fmt.Sprintf
lib/utils/socks/socks.go:199:10 gosimple S1025: should use String() instead of fmt.Sprintf
lib/web/apiserver.go:1054:18 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
lib/web/apiserver.go:1954:9 gosimple S1039: unnecessary use of fmt.Sprintf
tool/tsh/tsh.go:1193:14 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
```
This code is not caught by linters because it's exported and they assume
there's some external users.
Since teleport is relatively self-contained, we can tell for sure
whether something is called or not.
Fixed findings:
```
lib/sshutils/server_test.go:163:2: SA4006: this value of `clt` is never used (staticcheck)
clt, err := ssh.Dial("tcp", srv.Addr(), &cc)
^
lib/sshutils/server_test.go:91:3: SA5001: should check returned error before deferring ch.Close() (staticcheck)
defer ch.Close()
^
lib/shell/shell_test.go:33:2: SA4006: this value of `shell` is never used (staticcheck)
shell, err = GetLoginShell("non-existent-user")
^
lib/cgroup/cgroup_test.go:111:2: SA9003: empty branch (staticcheck)
if err != nil {
^
lib/cgroup/cgroup_test.go:119:2: SA5001: should check returned error before deferring service.Close() (staticcheck)
defer service.Close()
^
lib/client/keystore_test.go:138:2: SA4006: this value of `keyCopy` is never used (staticcheck)
keyCopy, err = s.store.GetKey("host.a", "bob")
^
lib/client/api.go:1604:3: SA4004: the surrounding loop is unconditionally terminated (staticcheck)
return makeProxyClient(sshClient, m), nil
^
lib/backend/test/suite.go:156:2: SA4006: this value of `err` is never used (staticcheck)
result, err = s.B.GetRange(ctx, prefix("/prefix/c/c1"), backend.RangeEnd(prefix("/prefix/c/cz")), backend.NoLimit)
^
lib/utils/timeout_test.go:84:2: SA1019: t.Dial is deprecated: Use DialContext instead, which allows the transport to cancel dials as soon as they are no longer needed. If both are set, DialContext takes priority. (staticcheck)
t.Dial = func(network string, addr string) (net.Conn, error) {
^
lib/utils/websocketwriter.go:83:3: SA4006: this value of `err` is never used (staticcheck)
utf8, err = w.encoder.String(string(data))
^
lib/utils/loadbalancer_test.go:134:2: SA4006: this value of `out` is never used (staticcheck)
out, err = Roundtrip(frontend.String())
^
lib/utils/loadbalancer_test.go:209:2: SA4006: this value of `out` is never used (staticcheck)
out, err = RoundtripWithConn(conn)
^
lib/srv/forward/sshserver.go:582:3: SA4004: the surrounding loop is unconditionally terminated (staticcheck)
return
^
lib/service/service.go:347:4: SA4006: this value of `err` is never used (staticcheck)
i, err = auth.GenerateIdentity(process.localAuth, id, principals, dnsNames)
^
lib/service/signals.go:60:3: SA1016: syscall.SIGKILL cannot be trapped (did you mean syscall.SIGTERM?) (staticcheck)
syscall.SIGKILL, // fast shutdown
^
lib/config/configuration_test.go:184:2: SA4006: this value of `conf` is never used (staticcheck)
conf, err = ReadFromFile(s.configFileBadContent)
^
lib/config/configuration.go:129:2: SA5001: should check returned error before deferring reader.Close() (staticcheck)
defer reader.Close()
^
lib/kube/kubeconfig/kubeconfig_test.go:227:2: SA4006: this value of `err` is never used (staticcheck)
tlsCert, err := ca.GenerateCertificate(tlsca.CertificateRequest{
^
lib/srv/sess.go:720:3: SA4006: this value of `err` is never used (staticcheck)
result, err := s.term.Wait()
^
lib/multiplexer/multiplexer_test.go:169:11: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
_, err = fmt.Fprintf(conn, proxyLine.String())
^
lib/multiplexer/multiplexer_test.go:221:11: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
_, err = fmt.Fprintf(conn, proxyLine.String())
^
```
All changes should be noop, except for
`integration/integration_test.go`.
The integration test was ignoring `recordingMode` test case parameter
and always used `RecordAtNode`. When switching to `recordingMode`, test
cases with `RecordAtProxy` fail with a confusing error about missing
user agent. Filed https://github.com/gravitational/teleport/issues/3606
to track that separately and unblock enabling `structcheck` linter.
`SyncBuffer` has a goroutine running `io.Copy` to read from the
underlying pipe. Close stops the pipe, but doesn't wait for the last
chunk of data to be written by `io.Copy` to the buffer.
Both `Bytes` and `String` assume that the buffer received no further
writes after `Close`.
Add explicit synchronization between `io.Copy` goroutine and `Close`.
Spring cleaning!
A very mechanical cleanup using several linters (unused, deadcode,
structcheck). Build and tests still pass so no behavior should be
affected.
This commit fixes#3369, refs #3374
It adds support for kuberenetes_users section in roles,
allowing Teleport proxy to impersonate user identities.
It also extends variable interpolation syntax by adding
suffix and prefix to variables and function `email.local`:
Example:
```yaml
kind: role
version: v3
metadata:
name: admin
spec:
allow:
# extract email local part from the email claim
logins: ['{{email.local(external.email)}}']
# impersonate a kubernetes user with IAM prefix
kubernetes_users: ['IAM#{{external.email}}']
# the deny section uses the identical format as the 'allow' section.
# the deny rules always override allow rules.
deny: {}
```
Some notes on email.local behavior:
* This is the only function supported in the template variables for now
* In case if the email.local will encounter invalid email address,
it will interpolate to empty value, will be removed from resulting
output.
Changes in impersonation behavior:
* By default, if no kubernetes_users is set, which is a majority of cases,
user will impersonate themselves, which is the backwards-compatible behavior.
* As long as at least one `kubernetes_users` is set, the forwarder will start
limiting the list of users allowed by the client to impersonate.
* If the users' role set does not include actual user name, it will be rejected,
otherwise there will be no way to exclude the user from the list).
* If the `kuberentes_users` role set includes only one user
(quite frequently that's the real intent), teleport will default to it,
otherwise it will refuse to select.
This will enable the use case when `kubernetes_users` has just one field to
link the user identity with the IAM role, for example `IAM#{{external.email}}`
* Previous versions of the forwarding proxy were denying all external
impersonation headers, this commit allows 'Impesrsonate-User' and
'Impersonate-Group' header values that are allowed by role set.
* Previous versions of the forwarding proxy ignored 'Deny' section of the roles
when applied to impersonation, this commit fixes that - roles with deny
kubernetes_users and kubernetes_groups section will not allow
impersonation of those users and groups.
Added package cgroup to orchestrate cgroups. Only support for cgroup2
was added to utilize because cgroup2 cgroups have unique IDs that can be
used correlated with BPF events.
Added bpf package that contains three BPF programs: execsnoop,
opensnoop, and tcpconnect. The bpf package starts and stops these
programs as well correlating their output with Teleport sessions
and emitting them to the audit log.
Added support for Teleport to re-exec itself before launching a shell.
This allows Teleport to start a child process, capture it's PID, place
the PID in a cgroup, and then continue to process. Once the process is
continued it can be tracked by it's cgroup ID.
Reduced the total number of connections to a host so Teleport does not
quickly exhaust all file descriptors. Exhausting all file descriptors
happens very quickly when disk events are emitted to the audit log which
are emitted at a very high rate.
Added tarballs for exec sessions. Updated session.start and session.end
events with additional metadata. Updated the format of session tarballs
to include enhanced events.
Added file configuration for enhanced session recording. Added code to
startup enhanced session recording and pass package to SSH nodes.
When the remote address is an IP address, we cannot directly convert
byte array to string. Otherwise, we will see hex strings like
"\xac\x11\x01\xc8:443".
See more details about the bug in:
https://community.gravitational.com/t/ssh-dynamic-port-forwarding-not-working/432
The above reported issue is resolved after applying the patch.
Drop TLS_RSA_WITH_AES_128_GCM_SHA{256,384} from default ciphersuites
due to being banned by HTTP2 which breaks GRPC clients. For more
information see: https://tools.ietf.org/html/rfc7540#appendix-A.
This has a nice side effect of Teleport now only supporting ciphersuites
that support PFS.
Note that both ciphersuites can still be be added back in file
configuration.
Validate all incoming events (and archives) to ensure that the server ID
within the event matches the x509 identity of the connected host. This
check makes sure nodes can only submit events for themselves.
In addition, make sure session recordings to disk or S3 can not be
overwritten.