The HTTP request context is canceled when the client disconnects. Using
this context in the session recorder prevents it from uploading the
session when it's finished.
Use the server context instead, to prevent lost recordings.
* Add logger attributes to be able to propagate logger from tests for identifying tests
* Add test case for Server's DeepCopy.
* Update test to using the testing package directly. Update dependency after upstream PR.
* kube: emit audit events using process context
Using the request context can prevent audit events from getting emitted,
if client disconnected and request context got closed.
We shouldn't be losing audit events like that.
Also, log all response errors from exec handler.
* kube: cleanup forwarder code
Rename a few config fields to be more descriptive.
Avoid embedding unless necessary, to keep the package API clean.
* kube: cache only user certificates, not the entire session
The expensive part that we need to cache is the client certificate.
Making a new one requires a round-trip to the auth server, plus entropy
for crypto operations.
The rest of clusterSession contains request-specific state, and only
adds problems if cached.
For example: clusterSession stores a reference to a remote teleport
cluster (if needed); caching requires extra logic to invalidate the
session when that cluster disappears (or tunnels drop out). Same problem
happens with kubernetes_service tunnels.
Instead, the forwarder now picks a new target for each request from the
same user, providing a kind of "load-balancing".
* Init session uploader in kubernetes service
It's started in all other services that upload sessions (app/proxy/ssh),
but was missing here. Because of this, the session storage directory for
async uploads wasn't created on disk and caused interactive sessions to
fail.
* Update logrus package to fix data races
* Introduce a logger that uses the test context to log the messages so they are output if a test fails for improved trouble-shooting.
* Revert introduction of test logger - simply leave logger configuration at debug level outputting to stderr during tests.
* Run integration test for e as well
* Use make with a cap and append to only copy the relevant roles.
* Address review comments
* Update integration test suite to use test-local logger that would only output logs iff a specific test has failed - no logs from other test cases will be output.
* Revert changes to InitLoggerForTests API
* Create a new logger instance when applying defaults or merging with file service configuration
* Introduce a local logger interface to be able to test file configuration merge.
* Fix kube integration tests w.r.t log
* Move goroutine profile dump into a separate func to handle parameters consistently for all invocations
Without this, deleted kube_services linger in the backend and show up as
obsolete kubernetes clusters in tsh.
Ideally, this TTL logic should be enforced centrally, but I'd like to
fix the bug first, and do a larger refactoring later.
* benchmark package
* use default config if path is not specified
* progressiveBench as a config method
* implement a main.go approach to run progressive tests
* make teleport client, run specified benchmark
* function and method descriptions
* make teleport client
* testing
* change interface method signatures
* dry up bench.go code, move producer goroutines to own function
* output formatting
* remove yaml
* fix linter errors
* remove print
* PR suggested changes, moved export latency profile functionality to the benchmark package
* PR fixes
* method description
* update testing
* linter
* docs and example
* PR suggestion changes
* fix coord omission bug
* remove benchmark struct
* remove threads, using open system
* recover in run
* close channel, check if open with each execution
* update testing, pr suggestions
* add more instructions to readme
* update example.go
* pass back context
* use SyncBuffer
* export response and service histograms
* update readme, exporting profiles section
* return from execute()
* export singular latency profile
* export response profile
* Revert "export response profile"
This reverts commit 5a21cb034c.
* export response profile
* update branch
* format example.go
* remove threads
* update example.go
* update branch
* goimports
* add signal handler & update docs
* PR suggestions
* exit out of interactive session
* revert execute
* PR suggestion
* run commmand on non-interactive instead of nil
Streaming requests, like `kubectl logs -f` will slowly write response
data over time. The `http.ResponseWriter` wrapper we added for capturing
the response code didn't propagate `http.Flusher` interface and
prevented the forwarder library from periodically flushing response
contents.
This caused `kubectl logs -f` results to be delayed, delivered in
batches as some internal buffer filled up.
This sets a useful server IP, when no advertise_ip is set. Previously,
the address was taken from the listener, and is usually "0.0.0.0:3022"
or "[::]:3022".
Also, add some test cases in utils for IPv6 handling.
- 'tsh kube login' fetches the latest list of kube clusters instead of
only using existing kubeconfig contexts.
This makes 'tsh kube login' succeed when a kube cluster was added
after last 'tsh login'.
- 'tsh kube ls' no longer wrongly marks selected clusters, if they
weren't generated by tsh.
- 'tctl rm' now works with kube_service objects.
- 'tsh login' now updates kubeconfig entries when a login session is
already active
- 'teleport.yaml' now uses 'labels' and 'commands' for RBAC labels on
kubernetes_service; this is consistent with ssh and app services.
Updated default admin role to support reading services.KindProxy. This
is needed by "tctl" when using credentials from ~/.tsh to generate the
join message.
Added fields:
- kube users/groups
- pod name/namespace
- container name/image
- node name
Container image and node name need to be fetched from the k8s API, they
are not known from just the client request. This fetch is optional, and
if it fails (like due to permission errors), those fields will be
missing.
Since kubernetes_service can talk to k8s API and proxy_service can't,
all session events are now emitted by kubernetes_service and skipped by
the proxy (used to be the other way around).
The `KubernetesClusters` field in `ServerSpecV2` used to be a
`[]string`:
https://github.com/gravitational/teleport/pull/4354/files#diff-50ec8b71306e75db3cb193b581cdd51139b03f90e23e7804cbef7edf712bbfac
Later, it was changed to `[]*services.KubernetesCluster`, which is
incompatible when parsing.
Unfortunately, the string version slipped into 4.4. When upgrading to
5.0, teleport fails to parse the old server object at startup and
crashes.
Rename the JSON tag from `kubernetes_clusters` to `kube_clusters` to
distinguish the different versions of this field when parsing. The old
`kubernetes_clusters` will just be ignored.
Our current parsing code runtime grows exponentially with nested
selectors (e.g. '{{a.b.c.d.e.f}}'), mostly due to memory churn from
slice allocations. With 100,000 levels of selectors, parsing takes ~80s
on my machine.
If an attacker can submit these expressions for parsing, they can DoS
the auth server with relatively small payloads (<1MB).
All real-world expressions are <10 AST nodes deep. Add a sanity check of
1000 levels to protect against malicious inputs.
We can optimize the code later on, but it's not very useful for real
world performance.
This commit fixes#4695.
Teleport in async recording mode sends all events to disk,
and uploads them to the server later.
It uploads some events synchronously to the audit log so
they show up in the global event log right away.
However if the auth server is slow, the fanout blocks the session.
This commit makes the fanout of some events to be fast,
but nonblocking and never fail so sessions will not hang
unless the disk writes hang.
It adds a backoff period and timeout after which some
events will be lost, but session will continue without locking.
improves the reliability and correctness of the cache via
various small improvements, including preventing reads of
partially initialized/reset state, and delaying watcher
init events until unhealthy states recover.
fixes an issue where reads could result in missing or
inconsistent results.
When the user does not have a session, if the user tries to access a
proxied application at it's FQDN, Teleport does best effort resolution.
This fix changes the behavior of what happens when the user has a
session but the session is expired. The user was being redirected to the
login page. This fix changes the behavior to by in sync with the
no-session behavior in doing best effort resolution.
A proxy running in pre-5.0 mode (e.g. with local kubeconfig) should
register an entry in `tsh kube clusters`.
After upgrading to 5.0, without migration to kubernetes_service, all the
new `tsh kube` commands will work as expected.
Added validation check that ensures application names are valid DNS
subdomains. This is because and application name can potentially be used
in the DNS name of the application if either a public address is not
provided or the application is accessed via a trusted cluster.
* Add labels to KubernetesCluster resources
Plumb from config to the registered object, keep dynamic labels updated.
* Check kubernetes RBAC
Checks are in some CRUD operations on the auth server and in the
kubernetes forwarder (both proxy or kubernetes_service).
The logic is essentially copy-paste of the TAA version.
1. `tsh kube clusters` - lists registered kubernetes clusters
note: this only includes clusters connected via `kubernetes_service`
2. `tsh kube credentials` - returns TLS credentials for a specific kube
cluster; this is a hidden command used as an exec plugin for kubectl
3. `tsh kube login` - switches the kubectl context to one of the
registered clusters; roughly equivalent to `kubectl config
use-context`
When updating kubeconfigs, tsh now uses the exec plugin mode:
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
This means that on each kubectl run, kubectl will execute tsh with
special arguments to get the TLS credentials.
Using tsh as exec plugin allows us to put a login prompt when certs
expire. It also lets us lazy-initialize TLS certs for kubernetes
clusters.