* shrink/remove keystore interface
This commit introduces the keystore.Manager type to handle all
interaction between CA and the keystore backends.
Why:
* reduces the code that needs to be implemented per keystore backend to
only the necessary operations
* separate concerns of managing key material and handling CA data
structures
* define interfaces where they're used, not implemented
* delete net 245 lines of code
* reduce keystore.KeyStore stutter
The appaccess TestTCP* tests are highly reliant on time. This has been
reduced (but not eliminated) by using a fakeClock and a channel for
signaling monitor triggered connection closures.
* Fix flaky basic auth dialer test
* Block on sending error to waiters in ProxyAuthorizer, to avoid racing
against the waiter.
* Change the test to start the node in a separate goroutine and
manipulate the auth creds while it is still attempting to connect.
This will allow us to significantly speed up the test since we can
verify that the proxy authorizer is rejecting bad credentials but then
change the credentials to be valid afterwards, allowing the node to
succeed in registering and avoid long wait for it to fail to register.
* Use buffered chan
* Move the stopall defer up in case the test fails earlier
* Remove extra zero count check
Application access connection monitoring has been introduced so that, when a
lock is created, application access connections will be interrupted until the
lock has been cleared. This includes web sockets and TCP applications.
* Ensure invalid tunnel agent connections get closed
Connections from reverse tunnel agents were being marked
as invalid by the proxy under certain conditions but would
ultimately never be closed. This could lead to scenarios where
the agent thought things were fine but the proxy considered
that agent unhealthy and unroutable.
Pruning of invalid connections used to occur when a proxy
tried to retrieve a connection for that tunnel. This also
further muddied the point in time at which the proxy could
close a connection as it never explicitly stopped tracking
the connection and closed it at the same time.
To remedy this, connections are explicitly closed by the proxy
and removed from the mapping to stop tracking immediately. In order
to prevent a connection that is servicing an active connection
from being closed the proxy now tracks which connections have
sessions. Closing does not occur when there are any active
sessions to prevent them from being force terminated.
When the proxy receives a heartbeat from an agent it now restores
the connection to a valid state. In the event that too many heart
beats have been missed for an agent, the proxy will now terminate
the connection, again only if it is not serving any sessions.
Fixes#15911
* Update proxy handler and authorizer mocks
* Use a condition variable to properly sync checks for the last error received
* Fix test to check for all 3 nodes registering correctly
* Update naming of LastError to be more descriptive
* Refactor authorizer to not use a condition variable
* Remove 3rd node to speed up test
* Make test more robust
* Remove the reset count func
* Close client conns before new requests to be sure
* Connection -> Request terminology
* Change test to not mutate the environment variable, but instead manipulate the auth proxy credentials
* this way we can be sure that the test will work correctly when the credentials match. If we mutate env, we don't know whether the callers are still holding a dialer using the old env variable
* Remove extra node
* fix lint
* Fix req waiting
* Change wording in debug message
* fix comment
* Update integration/helpers/proxy.go
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
* Fix defer func
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
* Add X-Forwarded-SSL and X-Forwarded-Port to appaccess.
Application Access now adds in X-Forwarded-Ssl and X-Forwarded-Port headers.
Tests have been added and adjusted to look for these new headers as well.
* Update lib/srv/app/header_rewriter.go
Co-authored-by: Ryan Clark <ryan.clark@goteleport.com>
* Update integration/appaccess/fixtures.go
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
* Remove common.XForwardedPort
* Change order of websocket delegates.
* Make ReservedHeaders more future-proofed.
* Add local proxy middleware for db cert checking
* Use tls conversion util instead of inline
* Add middleware to local proxy config
* Add middleware configuration in tsh
* Use route to database check and set defaults func
* Dont trigger normal db login flow if using local proxy tunnel
* Split out adding client creds into helper func for testing
* Add integration test for local proxy tunnel db cert middleware
* Add unit test for local proxy middleware
* Update comment
* Make middleware on new conn block
* godoc
* Make any cert check error trigger cert renewal in local proxy middleware
* Move dbcertchecker into lib/client
* Remove unneeded mutex in local proxy and unused func in lib/utils
* Make local proxy middleware integration test more robust
* Print message before mfa prompt in proxy tunnel
* Add before prompt option to test
* Remove unneeded comment
* Change local proxy messages to be more clear
* Pass local proxy opts by reference
* Pass certs in opts instead of cert/key file path
This is so we can check if the error is recoverable while preparing
local proxy options. A tunneled local proxy can ignore the error because
it does not rely on cert files - it can just renew its certs if
necessary.
* Move db route checking back to tsh
* Fix lint err
* Fix typo and print the hint to same writer as the mfa prompt
Application sessions were previously only logged when launching an application
session via the UI, and not from the `tsh app login` command. This has been
corrected. The AppName and AppURI are now passed in as part of the gRPC
request to the auth server, which is then used to emit the audit event.
* Introduce proto types for ProvisionTokenV3
* Add methods to ProvisionTokenV3 to support ProvisionToken iface
* Start building v3 support into the client
* add support for mashalling and unmarshalling ProvisionTokenV3
* Start unit testing ProvisionTokenV3
* Remove oneof to support yaml marshal/unmarshal
* Client should try V3 methods and fallback to v2
* More tests
* Fix join tests
* Fix integration tests
* Switch integration tests to use v3 spec
* Switch iam tests to use ProvisionTokenV3
* Change ec2 join tests to use V3 tokens
* Fix events tests for V3 token
* support ProvisionTokenV3 within API client events handler
* Explicitly specify JoinMethod
* Tidy up final usage of NewProvisionTokenV2FromSpec in tests
* Improve proto docs on ProvisionTokenV3
* Fix bot join tests
* Clarify error message for invalid join method
* Adjust resource version comment
* Fix comments and return error rather than bool in V2() method
* Catch incompatible conversions case
* Include V2 ProvisionToken in tests and add appropriate DELETE IN notes
* Fix linter warnings/unit test failures
* Use nolint rather than lint:ignore
* Add more DELETE IN notes
* Run goimports on join_ec2_test.go
* Address PR comments from tim.
* Add more deprecation/delete in notices
* Improve godoc comments on checkAndSetDEfaults for provider config
* Simplify implementation by dropping client-ahead compatability
* Add some support for client-ahead but with conversion to v3
* Update code comments to include responsible party
* Rename `Role` to `RoleARN` in EC2 configuration for clarity
* Fix tests for Role -> RoleARN rename
* Move MustCreateProvisionToken out of API and into test packages
* Properly go imports files
* Reduce number of auth dials for tsh commands
One of the major areas of latency for `tsh ssh` is creating multiple
auth clients. Since the auth client is lazy and only actually performs
the dial on first use we can create an auth client once and simply
reuse it. This is done by adding an `auth.ClientI` to `ProxyClient`
which is created via `connectToProxy`. All attempts to connect to
the current auth server via the `ProxyClient` will be given the
cached `auth.ClientI`.
The new method of retrieving the current auth client also allowed
to remove a number of calls to `GetSites` which were used to obtain
the current cluster name. The local profile already contains the name
of the cluster and calls to `GetSites` were unnecessary. All instances
which relied on the site name now retrieve from information that the
`ProxyClient` already has.
In an effort to reduce ambiguity and confusion `CurrentClusterAccessPoint`
and `ClusterAccessPoint` were also removed. AccessPoint denotes that
you are connecting to a cache, but the `ProxyClient` is always going
to be hitting the auth server directly. The two have been replaced
with `CurrentCluster` and `ConnectToCluster`, which they were merely
wrappers for anyhow.
Control master functionality is currently broken in proxy recording
mode. We're aware of the issue and will disable the test until we
are able to fix the underlying issue.
Updates #16224
* Add Yubikey PrivateKey implementation for use by Teleport clients.
- Add yubikey login logic, reusing previously stored private keys.
- Fix identity file decoding with PIV keys, which sign ecdsa certificates.
- Add libpcsclite-dev pre-req for building on linux.
- Remove unnecessary keys.Signer interface and move its functionality to keys.PrivateKey.
- Move retry and jitter utils to new api/utils/retryutils package.
Update `duo-labs/webauthn` up to `20220122034320`, which is the latest version
we can get without dipping into dependency hell (`etcd` and `opentelemetry` woes
ensue after [2365c59d9f][1]).
`tstranex` could be dropped for a while now (we moved on to WebAuthn-like
interfaces for mocks). `cfssl` was only imported due to what I assume was an
IDE mishap.
I've elected to keep `fxamacker/cbor`, instead of trying to move to
[webauthncbor][2]. fxamacker is solid, past v0, seems more appropriate for
client-side libs and still backs webauthncbor.
There are no updates for `flynn/hid` and `flynn/u2f`.
Release notes for fxamacker/cbor:
https://github.com/fxamacker/cbor/releases/tag/v2.4.0.
[1]: 2365c59d9f
[2]: https://pkg.go.dev/github.com/duo-labs/webauthn@v0.0.0-20220815211337-00c9fb5711f5/protocol/webauthncbor
* Drop tstranex/u2f dependency
* Drop direct dependency to cloudflare/cfssl
* Update fxamacker/cbor/v2 to v2.4.0
* Update duo-labs/webauthn to 2022-01-22
* Fix: Make sure all credentials are set in the user
* Simplify: Drop now unnecessary AuthenticationSelection copy
Update metalinter, fix a few lint warnings and replace deprecated linters.
`deadcode`, `structcheck` and `varcheck` are abandoned and now replaced by [`unused`][1].
Since 1.19, `go fmt` reformats godocs according to https://go.dev/doc/comment. I've done a bulk-reformatting of the codebase to keep the linter happy. Backporting is mostly harmless (the exception being `lib/services/role_test.go`, that for some reason breaks the _old_ linter using the new format).
[1]: https://golangci-lint.run/usage/linters/
* Bump golangci-lint version
* Replace abandoned linters
* Fix bodyclose on lib/auth/github.com
* Fix bodyclose on lib/kube/proxy/streamproto/proto_test.go
* Fix bodyclose on lib/srv/alpnproxy/proxy_test.go
* Fix bodyclose on lib/web/conn_upgrade_test.go
* Silence staticcheck on lib/kube/proxy/forwarder_test.go
* Silence staticcheck on lib/utils/certs_test.go
* Address BuildNameToCertificate deprecation warnings
* Run `go fmt ./...`
* Run `go fmt ./...` on api/
* Ignore formatting in role_test.go
* Remove redundant initializers in lib/srv/uacc/
* Update e/
* Fix incorrect use of loop variables
This commit fixes a few occurrences of loop variables being
incorrectly used in the context of Go-routines or (most frequently)
parallel tests. To fix the issues, we create a local copy of the range
variables before the parallel tests (or Go-routine), as suggested in
the documentation of the `testing` package:
https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks
Issues were found using the `loopvarcapture` linter.
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
* fix TestTraceProvider/spans_exported_with_gRPC+TLS
* run TestSSH serially
* operator: Conserve 'created_by' data in user spec
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: Tim Ross <tim.ross@goteleport.com>
Co-authored-by: Hugo Hervieux <hugo.hervieux@goteleport.com>
Following on from #13658, this patch removes more (but unfortunately not
all) usages of the deprecated, list-based port-allocation scheme.
This patch:
1. Updates the integration test `TeleInstance` fixture to use injected
listeners rather than static ports when creating a new proxy node in
a cluster,
2. Updates tests affected by (1) to pre-allocate and inject listeners,
including handling caching the listener FDs between proxy restarts
3. Removed unnecessary port allocations when creating LoadBalancer
fixtures, and
4. Moved the remaining list-base port allocation functions out of helpers
and back into integrations and made private. These functions should
never be used by more than one test package concurrently or there is a
very high chance of a port collision. Rather than just write that rule
down in the comments, I have contained the deprecated code into the
affected package made the compiler enforce the rule for us.
See-Also: #12421
See-Also: #13658
See-Also: #14408
Making all of our integration tests run in entirely parallel requires
a large engineering effort to enforce test isolation and remove all race
conditions between tests.
A lower-effort alternative may be to split apart the various test suites
into their own Go packages, and test those packages in parallel, even if
the tests inside are still executed serially. Auditing the test suites
for races on system-level resources (e.g. files, ports) is much easier
than chasing down every p[ossible race in the testing system.
This patch acts as a trial run, breaking a fairly well-defined and
self-contained test suite out into its own package. Note that the goal of
this change is not necessarily to shave minutes off the build (although
that would be nice), but to act as an illustration of how other, less
well-formed test suites might be broken apart.
See-Also: #12421
See-Also: #14408
Primary Changes:
- Remove reliance on Private Key PEM:
- Update native and keygen packages to return PrivateKey instead of PEM key
- Add new PrivateKey interface which implements crypto.Signer
- Replace PEM encoded private key usage where possible
- Replace calls to tls.(Load)X509KeyPair with keys.(Load)X509KeyPair in
client packages
Minor Changes:
- Remove unused agent.AddedKey return from LoadKey
- Simplify sshutils and removed unused code paths
- Add ecdsa and ed25519 key support
* transport: Rewrite headers, including JWTs, for websockets.
Applications can otherwise 401 on websocket requests, as they do not
present any authentication headers.
docs: Fix the reserved JWT header name.
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
* Add test for JWT header in websocket apps
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Alex Vandiver <alex@chmrr.net>
Adds a wrapper around `ssh.Session` which injects tracing context
in a similar manner to the `ssh.Client` wrapper. All usages of
`ssh.Session` have now been replaced and have the appropriate
`context.Context` passed along
Part of #12241
## What
First part of the Kubernetes [Discovery RFD](https://github.com/gravitational/teleport/pull/13376/) to introduce a Kubernetes server per cluster.
This PR introduces a separate Kubernetes server that uses the already introduced `KubernetesClusterV3`.
## Compatibility
In previous versions, Kubernetes Clusters were part of regular `ServerV2` resource and this refactoring deprecates the `ServerV2` usage but keeps them for compatibility with previous version.
Everything is backward compatible, so v10 kubernetes agents and trusted clusters can connect fine.
## Next steps
Once this is merged, a new PR will introduce dynamic registration for Kubernetes Clusters discovered through EKS Discovery.
* Embed auth.Cache in auth.Server
* Hit the backend during Auth initialization
* Bypass the cache when rotating CAs
* Services.UpsertTrustedCluster is different
* Bypass the cache in waitForTunnelConnections
* Fix infinite recursion
* More cache bypassing during init and rotations
* Rename Services to Uncached in auth.Server
* Further cleanups
* Don't start the auth cache immediately
* Go back to Services rather than Uncached
* Comments and a missing method
* Add context.Context to session.Service interface
Updates GetSessions, GetSession, CreateSession, UpdateSesion, and
DeleteSession to take a context.Context. All call paths are updated
to properly pass along a real context instead of relying on a
to eliminate context.TODOs.
This commit adds the Teleport operator. The operator reconciles
TeleportUsers and TeleportRoles Kubernetes resources with Users and
Roles Teleport resources.
When desktop access is enabled, the TeleportReady event will not
be emitted until the WindowsDesktopReadyEvent is emitted, and it
turns out we have *never* emitted a WindowsDesktopReadyEvent.
This is likely due to desktop access being copied from kube access
since the very beginning. The same issue was recently fixed for
kube access in #9418.
Ports used by the unit tests have been allocated by pulling them out of a list, with no guarantee that the port is not actually in use. This central allocation point also means that tests cannot be split into separate packages to be run in parallel, as the ports allocated between the various packages will be allocated multiple times and end up intermittently clashing.
There is also no guarantee, even when the tests are run serially, that the ports will not clash with services already running on the machine.
This patch (largely) replaces the use of this centralised port allocation with pre-created listeners injected into the test via the file descriptor import mechanism use by Teleport to pass open ports to child processes.
There are still some cases where the old port allocation system is still in use. I felt this was already getting beyond the bounds of sensibly reviewable, so I have left those for a further PR after this.
See-Also: #12421
See-Also: #14408
Fixes an issue where the agentpool backoff channel would be redefined
each time an event was received while waiting for the backoff to complete.
This could lead to a longer backoff period than expected.
Waits for each resource to connect individually by splitting up the test into
multiple runs ran in parallel
* configure golangci-lint misspell to check for anglicized spellings
* Americanize spellings
* fix aws constant value with british spelling 🇬🇧
* update api types with americanized spellings
* use american spellings .cloudbuild/scripts
* Start postgres without TLS when multiplexing is disabled
* Add integration test for starting postgres with --insecure-no-tls
* Fix dupe postgres listener mistake
* Log the actual address of listeners
* Remove unnecessary error checking
As a prelude to breaking individual integration test suites out into
their own packages (in order to make them more amenable to running
in parallel), this patch extracts the common test fixtures and places
them in a common `helpers` package.
This will allow the integration test package to share common
infrastructure and vocabulary once they are split out.
Create spans for all public facing TeleportClient,
ProxyClient, and NodeClient methods. This makes
correlating spans easier to reason about when
looking at `tsh` traces. As a result of creating
spans, some additional context propagation is
required as well to ensure that spans are linked
properly.
This also removes the unused `quiet` argument from
`ConnectToCluster`. It's usage was not consistent
by existing callers, and it was ignored, so in order
to avoid confusion in future calls, it was removed.
#12241
This change adds IP-based validation for SSH certificates.
There's new option in role definition:
kind: role
metadata:
name: dev
spec:
options:
pin_source_ip: true
When that is set to true client IP must be the same when generating certificates and using them. It uses source_address critical option that should be supported by both teleport and sshd and only applies to certificates we send to user (like in tsh login), we don't pin IP in certificates issued for web UI as they can't leak.
This change also omits machine ID (it uses different code path) - it will be added in separate PR.
Most of the lines changed are from regenerating types.proto, change itself is not that big
Relates #11719
This change adds the --all/-R flag to tsh ls, tsh apps ls, tsh db ls, and tsh kube ls, which lets tsh list resources from across all clusters and logged in proxies.
* Add CSRF mitigations
This commit includes two fixes:
1. Enforce an application/json Content-Type server-side.
2. When checking the bearer token, verify that the user
associated with the token matches the user associated
with the cookie.
* Fix TEL-Q122-13: Access Requests Denial Of Service Via Request Reason (#125) (#127)
* Ignore input when data flow is off in TermManager
When data flow is disabled in TermManager (at the beginning or when TermManager.Off was called) we should ignore all input we receive (currently we buffer it)
* Agent forwarding socket security fix.
Co-authored-by: Lisa Kim <lisa@goteleport.com>
Co-authored-by: Joel <jwejdenstal@icloud.com>
Co-authored-by: Przemko Robakowski <przemko@przemko-robakowski.pl>
* Add support for automatic user provisioning
* Add UID parker to reexec
* Add a `teleport park` subcommand that does nothing
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
After the merge of https://github.com/gravitational/teleport/pull/12674 we no longer use the following configuration:
```yaml
teleport:
ca_signature_algo: "rsa-sha2-512"
```
As we now rely upon the `x/crypto` package to choose the signing algorithm (it defaults to `rsa-sha2-512`)
**Demo**
If we set `ca_signature_algo` (the value is irrelevant) and start `teleport` we get:
```shell
root@marco:/workspace# teleport start --debug
2022-06-02T09:33:58Z WARN ca_signing_algo config option is deprecated and will be ignored, we'll always default to rsa-sha2-512. config/configuration.go:348
2022-06-02T09:33:58Z INFO Generating new host UUID: b001159a-10e0-49a7-b4dc-61c73fbe9e42. service/service.go:726
...
```
Fixes#12905
* Add client side circuit breaker to auth clients
In order to apply back pressure we can utilize a circuit breaker that
monitors error responses from auth server. When tripped it will prevent
all outbound requests to auth for a period of time. This can also help
prevent a potential thundering heard when auth is in an unhealthy state.
By default the circuit breaker will only be tripped if 90% of the
requests made in the monitoring interval fail.
The instance metadata client added in #12593 significantly slows down integration tests. This change adds a disabled client to integration tests to improve performance.
This adds proxy peering support. A configurable setting that allows for agents
to connect to a subset of proxies and be reachable through any proxy in the
cluster. This is achieved by creating grpc connections between each proxy
server. Client connections can then be passed between proxies to the desired
agent.
This change fixes a bug in EC2 labels (#12593) involving concurrent writes to the labels map. This is fixed by making EC2.Get() return a copy instead of the actual label map.
When the client connects to teleport with invalid credentials (eg
expired ones) it will retry multiple times until the context deadline is
reached.
When it happens, we receive the generic error: context deadline
exceeded.
However, we can ask for the latest connection error, one which will give
us more information on why it happened.
To ask for this extra error we need to add the following
grpc.DialOption: grpc.WithReturnConnectionError()
After doing this, we will get the errors that happenned when trying to
connect to the grpc Server.
This should help us debug possible connection problems.
We had to refactor a little bit the way we handle the parallel
connection attempts to receive all the connection errors from the
multiple flows.
This commit upgrades the version of x/crypto we use, to the current latest
`go get -u golang.org/x/crypto`
We also replaced the deprecated variables and updated the tests to match the
current default KEX Algos
The x/crypto didn't support RSA-SHA2 algos, so we developed our own algorithm
signer. This is no longer the case, and after upgrading x/crypto to 20220518 we
can safely remove the custom code we have.
From OpenSSH 8.8+, it works if we explicitly add the older algorithm
Somthing like this: `./ssh -vvv -oPubkeyAcceptedAlgorithms=+ssh-rsa-cert-v01@openssh.com teleportadmin@moon.marco.mydemo`
* Add tracing instrumentation for ssh clients/servers
Add tracing context to the existing ProxyHelloSignature to provide
span information across ssh connections. To add span context per
ssh session on top of new connections, the same tracing context is
passed in the first global request of the session.
In order to ensure that tracing context is pulled from and inserted
into the proper context.Context, some interfaces and methods were
changed to take one as the first argument.
* run HSM tests in parallel
* add missing punctuation to commit
Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
* Improve CertAuthorityWatcher
CertAuthorityWatcher and its usage are refactored to allow for
all the following:
- eliminate retransmission of the same CAs
- reduce memory usage by having one local watcher per proxy
- adds the ability to filter only the CAs that are desired
- reduce the time required to send the first CAs
watchCertAuthorities now compares all CAs it receives from the
watcher with the previous CA of the same type and only sends to
the remote site if they are not identical. This is to reduce
unnecessary network traffic which can be problematic for a
root cluster with a larger number of leafs.
The CertAuthorityWatcher is refactored to leverage a fanout
to emit events to any number of watchers, each subscription
can be for a subset of the configured CA types. The proxy
now has only one CertAuthorityWatcher that is passed around
similarly to the LockWatcher. This reduces the memory usage
for proxies, which prior to this has one local CAWatcher per
remote site.
updateCertAuthorities no longer waits on the utils.Retry it
is provided with before starting to watch CAs. By doing this
the proxy no longer has to wait ~8 minutes before it even
starts to watch CAs.
* Update golangci-lint
To accomodate the recent Go 1.18 upgrade
* Fix new lint warnings as a result of linter upgrade
* Set golangci-lint to Go 1.18 mode
golangci-lint will automatically skip linters that don't have support
for Go 1.18.
See: https://github.com/golangci/golangci-lint/issues/2649
* Remove unused backend wrapper from Cache
* Remove double printShutdownStatus
* Fix readyz race condition
* Test coverage for the readyz.monitor fix
* Close listeners immediately in proxy.shutdown
* Use and handle net.ErrClosed correctly
This adapts utils.IsUseOfClosedNetworkError to check for net.ErrClosed
even inside trace.Aggregate errors, makes it so that we always return
something that would pass errors.Is(err, net.ErrClosed) when returning
from a (net.Listener).Accept(), and handles closed listeners within our
various Serve() loops so that we don't hit spurious backoff waits while
shutting down.
* Close listeners early and emitters late
* Test coverage for the proxy listener changes
* Revert some errors back to trace.ConnectionProblem
* Reduce PR scope to just the proxy, add comments
* Improve error logging.
Teleport now will try to extract MySQL server version from initial handshake package instead of sending `8.0.0-Teleport` every time. This string can be overridden by new configuration option `mysql.server_version`. On DB service start Teleport will also try to fetch the current version from MySQL/MariaDB instance. After that the server version will be updated on every successful connection to keep it up to date.
Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
* Revert "Make `PortList.Pop()` thread-safe (#11799)"
This reverts commit a17337d1a1.
* Revert "Ensure stateOK is reported only when all components have sent updates (#11249)"
This reverts commit b749302e2c.
* Revert "Throw startup error if `TeleportReadyEvent` is not emitted (#11725)"
This reverts commit 933e247287.
* Revert "Fix ProxyKube not reporting its readiness (#12150)"
This reverts commit 6cdcfe7721.
* Speed up TestAppServersHA
Allow test cases to be run in parrallel and allow app servers to
be spawned in parrallel to reduce test time from ~99s to ~20s.
* feat: delete app web sessions during logout
* Apply suggestions from code review
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
* refactor(auth): add VerbList action to delete user app sessions
Co-authored-by: Roman Tkachenko <roman@goteleport.com>
This deprecation was kind of a pain, because x509.CertPool becomes
a black box - there is no public API to determine how many certs
have been added to the pool. To account for this, some of our method
signatures needed to be updated to report the number of certs that
were added.
* Throw startup error if `TeleportReadyEvent` is not emitted
Before this commit, the `TeleportReadyEvent` was only waited for when a
process reload occurred. Thus, if a bug exists in the code that emits
this event (as it's currently the case since the `MetricsReady` and
`WindowsDesktopReady` events are never emitted), such a bug may go
unnoticed for a while.
This commit ensures that the `TeleportReadyEvent` is always waited for
on startup, and throws an error if the event is not emitted (after some
timeout).
This commit also:
- removes the `MetricsReady` event (as this is not produced by a
component that sends heartbeats, which is the case of every other
event required by the `TeleportReadyEvent` event mapping)
- ensures that `WindowsDesktopReady` event is emitted
- refactors some of the code in `lib/service/supervisor.go`
- moves the event mapping registration to a new `registerTeleportReadyEvent` function
Introduce Database Certificate Authority. New CA is used by Database Access to sign database certificates making them independent from Host CA.
Co-authored-by: Marek Smoliński <marek@goteleport.com>
This change updates NO_PROXY handling to allow blocking specific host:port combinations, rather than just the host. It also adds a special case for downgrading requests to plain HTTP when --insecure is true and the request goes through a plain HTTP proxy at localhost (i.e. HTTP_PROXY=http://localhost).
* Always use in-memory caches
This also cleans up now-useless fields and constants related to on-disk
caches.
* Remove the cache tombstone mechanism
As we're never reopening the same cache backend twice, this is no longer
useful.
* Warn if a cache directory exists on disk
We can't remove it automatically because we might be in the middle of an
upgrade with a old version of Teleport still running.
* Fix goroutine and memory leak in watchCertAuthorities
The CA Watcher was blocking both on writing to a channel when the watcher
was closed and on HTTP calls that had no request timeout or context passed
to cause cancellation.
All resourceWatcher implementations that had a bug which may cause them to block
on writing to a channel forever were fixed by selecting on the write and ctx.Done.
Adding context.Context to all Get/Put/Post/Delete methods on the auth HTTPClient to
force callers to propagate context. Prior all calls used context.TODO which
prevents requests from being properly cancelled.
Add context propagation to RotateCertAuthority, RotateExternalCertAuthority,
GetCertAuthority, GetCertAuthorities. This is needed to get the correct ctx
from the CertAtuhorityWatcher all the way down to the HTTPClient that makes
the call.
Closes#10648
* Fix Reverse Tunnels Not Properly reconnecting
Nodes were not generating new agents which prevented reverse tunnels
from being re-established after a collapse. Ensuring we always
release the lease if the agent pool is unable to add a new agent.
* add tunnel collapse reconnect test
Integration tests were creating the users SSH and x509 certificates at
the time of server creation, specifically at the time of the
"TeleInstance" creation.
If any component within the "TeleInstance" was slow to start, for example
if the proxy watcher had to be reset as can be seen below, the users
certificate would often be expired by the time the SSH request was issued
which would lead to the SSH server rejecting the connection and the test
failing with an unexpected error.
For most tests this is not an issue, because test certificates are valid
for 24 hours. However "TestDisconnection" scopes the TTL of certificates
down (often times to about 2 seconds) to test different certificate
lifetime disconnection scenarios which would lead to this test being flaky.
Updated integration test logic to instead "issue" the users certificate at
the time of client creation instead of server creation.
`TestProxyReverseTunnel` has been consistently failing with the following errors.
--- FAIL: TestProxyReverseTunnel (11.76s)
sshserver_test.go:1127:
Error Trace: sshserver_test.go:1127
Error: Received unexpected error:
timeout waiting for announce to be sent
Test: TestProxyReverseTunnel
FAIL
FAIL github.com/gravitational/teleport/lib/srv/regular 68.907s
--- FAIL: TestProxyReverseTunnel (23.43s)
assertion_compare.go:332:
Error Trace: sshserver_test.go:1144
Error: "5.775333527" is not less than "5"
Test: TestProxyReverseTunnel
Messages: []
FAIL
FAIL github.com/gravitational/teleport/lib/srv/regular 96.861s
These both appear to be timing related. By removing `t.Parallel()` or
increasing the timeouts it was possible to stabilize this test and have it
consistently pass. However, looking at what `TestProxyReverseTunnel` actually
tested, I don't think it can actually be removed completely. I've outlined
what it tested and why this is no longer necessary and why we should remove it.
* Reverse tunnels can be established
This test was written prior to integration tests existing. Teleport now has
integration tests that are more extensive, robust, and stable which cover
reverse tunnel functionality like `TwoClustersProxy`, `TwoClustersTunnel`, and
`TrustedTunnelNode`. The only thing they were missing that
`TestProxyReverseTunnel` had was checking `LastConnected` time. This PR has
been updated to add that to integration tests.
* Connectivity can be established over reverse tunnels
Similar to the above, we have more extensive, robust, and stable integration
test coverage for establishing connectivity over a reverse tunnel now.
The only bit of functionality that integration don't appear to have is
connecting by DNS name _and_ IP address at sshserver_test.go#L1066-L1067.
However, we do now have a dedicated unit test for this in
`TestProxySubsys_getMatchingServer`.
* Labels are synchronized
While this test does schenonize dynamic labels, it never actually checks if
they were synchronized correctly. That functionality was removed many years
ago in https://github.com/gravitational/teleport/pull/250.
We do now have unit test coverage for dynamic labels at
lib/labels/labels_test.go.
* Dynamically resolve reverse tunnel address
The reverse tunnel address is currently a static string that is
retrieved from config and passed around for the duration of a
services lifetime. When the `tunnel_public_address` is changed
on the proxy and the proxy is then restarted, all established
reverse tunnels over the old address will fail indefinintely.
As a means to get around this, #8102 introduced a mechanism
that would cause nodes to restart if their connection to the
auth server was down for a period of time. While this did
allow the nodes to pickup the new address after the nodes
restarted it was meant to be a stop gap until a more robust
solution could be applid.
Instead of using a static address, the reverse tunnel address
is now resolved via a `reversetunnel.Resolver`. Anywhere that
previoulsy relied on the static proxy address now will fetch
the actual reverse tunnel address via the webclient by using
the Resolver. In addition this builds on the refactoring done
in #4290 to further simplify the reversetunnel package. Since
we no longer track multiple proxies, all the left over bits
that did so have been removed to accomodate using a dynamic
reverse tunnel address.
* Replace cluster periodics with watchers
Remove periodically sending locks and certificate authorities to leaf clusters. Instead
we can rely on the watcher system to only deliver resources to leaf clusters when changes
occur.
Fixes#8817
Actually tracking down the cause of a failure in the integration tests can
be hard:
* It's hard to get an overall summary of what failed
* The tests sometimes emit no output before timing out, meaning any
diagnostic info is lost
* The emitted logs are too voluminous for a human to parse
* The emitted logs can present information out of order
* It's often hard to tell where the output from one test ends
and the next one begins
This patch attempts to address these concerns without attempting to rewrite
any of the underlying teleport logging.
* It improves the render-tests script to (optionally) report progress per-
test, rather than on a per-package basis. My working hypothesis on the
tests that time out with no output is that go test ./integration is
waiting for the entire set of integration tests tests to be complete
before reporting success or failure. Reporting on a per-test cycle gives
faster feedback and means that any timed-out builds should give at least
some idea of where they are stuck.
* Adds the render-tests filter to the integration and integration-root make
targets. This will show an overall summary of test results, as well as
- Discarding log output from passing tests to increase signal-to-noise
ratio, and
- Strongly delimiting the output from each failed test, making failures
easier to find.
* Removes the notion of a failure-only logger in favour of post-processing
the log events with render-tests. The failure-only logger catches log
output from the tests and only forwards it to the console if the test
fails. Unfortunately, not all log output is guaranteed to pass through
this logger (some teleport packages do not honour the configured logger,
and reports from the go race detector certainly don't), meaning some
output is presented at the time it happens, and other output is batched
and displayed at the end of the test. This makes working out what
happened where harder than it need be.
In addition, this patch also promotes the render-tests script into a fully-
fledged program, with appropriate makefile targets, make clean support, etc.
It is now also more robust in the face on non-JSON output from go test
(which happens if a package fails to compile).
This PR likely won't fix any of the flakiness, but should help in
debugging:
- Wait for TeleInstance's process to close
- Don't delete the data directory before the process is shut down
- Include site name in logging to make it easier to distinguish which
site is shutting down
- Don't call StopAll twice (it was deferred and run manually)
- Include cluster name in log output
- Remove unused TestWrapper left over from the Check framework
Move cache and resourceWatcher watchers from a 10s retry to a jittered backoff retry up to ~1min. Replace the
reconnectToAuthService interval with a retry to add jitter and backoff there as well for when a node restarts due to
changes introduced in #8102.
Fixes#6889.
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.
Fixes#9003 and #9244.
This new feature in Go 1.17 automatically restores the environment
variable to its previous value when a test ends, making it simpler
to set up the environment for tests and less likely that we accidentally
leave behind global state.
Also convert some of the remaining uses of check to standard Go tests.
Part of this change is implementing a "no secrets" policy for CI. Given that
we have to support CI for arbitrary external contributors, and
it is easy to craft a malicious PR that exfiltrates secrets during a CI build
any test that runs under CI must be able to do so without any injected secrets.
This means that several of the test we currently run under Drone will not be run on GCB, at least as part of the regular CI. The plan is to create a separate task that periodically runs tests that require external credentials (e.g. Kube tests, various backend data stores, etc.) in a more secure way and report failures asynchronously. And while these tests will not run under CI, the should still be built under CI so that required changes are caught during review.
Implements RFD 45 / "where" conditions for active sessions[1].
In few words, the purpose of the RFD is to allow the creation of roles that
permits users to only join a subset of active sessions (for example, only their
own sessions).
Implementation goes a bit further than the RFD, allowing the conditions to be
applied to `update` and `delete` verbs as well.
Originally implemented by @andrejtokarcik (#8568), tweaks by @codingllama.
[1] https://github.com/gravitational/teleport/blob/master/rfd/0045-ssh_session-where-condition.md
* Implement where conditions for active sessions list/read
* actionWithConditionForList => actionForListWithCondition
* Make Context-exposed sessions follow the RFD API
* Add tests for "where" conditions on active sessions
* Fix typos
* Fix typos and spacing
* Rename "parties" to "participants" in the context session
* Update RFD to reflect PR changes
Update RFD to reflect PR changes
Specifically, mark as implemented and rename `parties` to `participants`.
* Push list authz logic to ServerWithRoles, obsolete cond
* Remove cond from GetSessions signature
* Simplify cast in lib.utils.Fields.GetString
* Add TODO to refactor SearchSessionEvents / stored sessions
Co-authored-by: Andrej Tokarčík <andrej@goteleport.com>
Fixes#7606, where a node doesn't notice when the tunnel port changes.
Imagine you have a cluster with a node connected in via a tunnel through a proxy `proxy.example.com` on port `3024`
Now change the proxy config so that `tunnel_public_address` is `proxy.example.com:4024`. You either restart the proxy, or reload the proxy config with a `SIGHUP`.
...and then the node
a) loses its connection to auth (because the tunnel is gone), and
b) _doesn't reconnect_, because even though the proxy address hasn't changed,
the node has cached the old tunnel_public_address and keeps trying to connect
to that.
You can always manually restart the node to have it reconnect, but that would be a pain if you have thousands of nodes.
In order to not have to manually restart all nodes, this change implements a check for a connection failures to the auth server, and re-starts the node if there are multiple connection failures in a given period of time. The check as-implemented piggybacks on the node's "common.rotate" service, which can already restart the node in certain circumstances, and uses the success of the periodic rotation sync as a proxy for the health of the node's connection to the auth server.
See-Also: #7606
Some integration tests modify global "constants" to speed up test
execution (e.g. shortening polling intervals). This is occasionally
tripping the Go data race detector, so I have added explicit
serialisation to reading and writing these global settings.
These values are only ever changed in a test environment, and there
should be zero contention for them in a non-test environment.
The race condition detector is being tripped by a concurrent `Write` and
`Close` in the `PipeNetCon` in several integration tests. This is a naive
fix to serialize the write and close operations to resolve the race
condition.
The affected tests were also not handling asynchronous error reporting
correctly (i.e. it's not legal to call `require.XYZ()` from a goroutine
other than the one executing the test function.). This patch introduces
some plumbing to marshal asynchronous errors back into the main test
routine before failing the test.
* Add RBAC for Windows desktop access
This commit adds RBAC checks for Windows Desktops as described in
RFD 33 and RFD 34:
- add Windows desktop logins & labels to role definition
- introduce new file config for host labels based on a regexp match
- auth server API performs access checking for Windows desktop resources
- add RDP client callback to authorize the user
- support user/role locks
- respect the client idle timeout setting
Note: in cases where an connection is terminated to to RBAC, the web UI
currently displays "websocket connection failed" because the connection
is closed from the server. We'll need to follow up with a nice error
message for the client side to improve the UX here.
Other changes:
* Remove OSS RBAC migration marked for deletion
* Stop creating a default admin role
* add wildcard desktop access to the preset access role
Updates #7761
This commit contains 2 changes:
1. Rename GenerateServerKeys to GenerateHostCerts.
This is a more accurate name and consistent with the existing
GenerateUserCerts endpoint.
2. Change the request type to include a single role, rather than a
list of roles. We only ever allowed a single role in the list
anyway, so this change will prevent future mis-use of the API.
Note: a side effect of this change is we now have two similar endpoints:
- GenerateHostCert: old API that generates SSH cert only
- GenerateHostCerts: a newer API that generates SSH and TLS certs
To avoid making this change too big, we'll aim to deprecate
GenerateHostCert in the future.
* Require that public TLS and SSH keys are provided to register via token
The original behavior attempted to make providing public keys optional,
and would generate keys if they were not provided. This had several
problems:
- The auth server is generating private keys for nodes and is
potentially able to share them over the network.
- The return value for keys.Key would sometimes be set and sometimes
be empty (the key is only set if the auth server generated it and
knows what the key is)
- We only ever relied on this behavior as a shortcut in test code.
In the production code this behavior was never used (and actually
never worked due to a bug that would overwrite and discard the
generated private key)
This commit requires that public keys are always provided, ensuring
that the private key is generated locally and never known by the
auth server.
It also results in a cleaner error message when either or both of the
public keys are missing from the request.
* Address review comments
* Fix tests that relied on certs being generated
* Connect proxy <-> windows_desktop_service <-> RDP server
Link together the proxy (websocket), service (mTPS) and RDP client. Pass
target desktop UUID via SNI on the TLS connection from the proxy.
* Use client CAs to validate incoming desktop_service connections
* Send binary frames on desktop websocket
Introduce new make targets to check and add license headers to files
("make lint-license" and "make fix-license"). License checking is now a part of
"make lint" as well.
Initial attempts used goheader, but it caused "make lint-go" to become about 9x
slower (if not more), plus it only targets go files. Google's addlicense is fast
enough and targets however many file types we want.
Existing files that were missing licenses got the header added, using the
current year as the license date.
* Introduce lint-license and fix-license make targets
* Ignore generated files
* Add license to go files
* Replace irregular licenses with standard copyright/license
* Add license to proto files
* Install addlicense in build.assets Dockerfile
Teleport will fail to start when when a k8s cluster is unavailable when
using the kubeconfig in a `kubernetes_service` configuration. This means
that a single missing cluster can disrupt _all_ of the configured
clusters, even if the others are online.
This change makes failing the cluster credential enumeration a
per-k8s-cluster warning, rather than a stop-the-world error.
It also expands the testing shims inside the k8s proxy to allow more
sophisticated mocked scenarios, in order to test the above.
See-Also: #7215
Adds the ability to block network traffic on SSH sessions.
The deny/allow lists of IPs are specified in teleport.yaml file.
Supports both IPv4 and IPv6 communication.
This feature currently relies on enhanced recording for
cgroup management so that needs to be enabled as well.
-- Design rationale:
This patch uses Linux Security Module (LSM) hooks, specifically
security_socket_connect and security_socket_sendmsg, to control
egress traffic. The LSM provides two advantages over socket filtering
program types.
- It's executed early enough that the task information is available.
This makes it easy to report PID, COMM, etc.
- It becomes a model for extending restrictions beyond networking.
The set of enforced cgroups is stored in a BPF hash map and the
deny/allow lists are stored in BPF trie maps. An IP address is
first checked against the allow list. If found, it's checked for
an override in the deny list. The policy is default deny. However,
the absence of the NetworkRestrictions API object is allow all.
IPv4 addresses are additionally registered in IPv6 trie (as mapped)
to account for dual stacks. However it is unclear if this is sufficient
as 4-to-6 transition methods utilize a multitude of translation and
tunneling methods.
Prior to this change, TCP forwarding over SSH could only be disallowed
by user-based rules, rather than by individual target nodes.
This change adds:
* the`port_forwarding` key to the yaml SSH config block, with a boolean value
* Plumbing to pipe the resulting config value through to the SSH server
* A predicate check in the SSH server to [dis]allow port forwarding based on the setting.
This change also:
* adds a common way for integration tests to await the establishment of an SSH session
* refactors several integration tests to use this new method rather than manually waiting
* adds some marshaling code to move errors from spawned goroutines back into the
main test routine in verifySessionJoin()
See-Also: Issue #6783
* hsm: migrate CA storage schema
Migrate types.CertAuthorityV2 schema according to
https://github.com/gravitational/teleport/blob/master/rfd/0025-hsm.md#backend-storage
Includes proto changes, types.CertAuthority wrapper changes and data
migration.
Note that we keep and update the old fields for backwards-compatibility.
If a cluster is upgraded to v7 and then downgraded back to v6,
everything should keep working.
* Address review feedback
* Remove JSON schema validation
Removing JSON schema validation from all resource unmarshalers.
--- what JSON schema gets us
Looking at the JSON schema spec and our usage, here are the supposed benefits:
- type validation - make sure incoming data uses the right types for the right fields
- required fields - make sure that mandatory fields are set
- defaulting - set defaults for fields
- documentation - schema definition for our API objects
Note that it does _not_ do:
- fail on unknown fields in data
- fail on a required field with an empty value
--- what replaces it
Based on the above, it may seem like JSON schema provides value.
But it's not the case, let's break it down one by one:
- type validation - unmarshaling JSON into a typed Go struct does this
- required fields - only checks that the field was provided, doesn't actually check that a value is set (e.g. `"name": ""` will pass the `required` check)
- so it's pretty useless for any real validation
- and we already have a separate place for proper validation - `CheckAndSetDefaults` methods
- defaulting - done in `CheckAndSetDefaults` methods
- `Version` is the only annoying field, had to add it in a bunch of objects
- documentation - protobuf definitions are the source of truth for our API schema
--- the benefits
- performance - schema validation does a few rounds of `json.Marshal/Unmarshal` in addition to actual validation; now we simply skip all that
- maintenance - no need to keep protobuf and JSON schema definitions in sync anymore
- creating new API objects - one error-prone step removed
- (future) fewer dependencies - we can _almost_ remove the Go libraries for schema validation (one transient dependency keeping them around)
* Remove services.SkipValidation
No more JSON schema validation so this option is a noop.
In an attempt to make it easier to
1) navigate the integration test output,
2) find the cause of test failures, and
3) run individual tests, make it easier to run individual
integration tests from the command line,
...this change ports some of the OSS integration tests away from
GoCheck and implements them in terms of the standard `testing`
package.
The main changes are:
* Test suites are now constructed as a normal Test function
with many subtests.
* The GoCheck assertions have been replaced with equivalent
assertions from `testify/require`, for example:
`c.Assert(err, check.IsNil)`
becomes
`require.NoError(t, err)`
... and so on
* Avoid test flake by ensuring the gRPC server is shutdown gracefully before closing the audit log
* Fix lint warnings. Nove tunnel server's Close to earlier to close the proxy watcher and release grpc traffic
* Use graceful shutdown selectively until all tests have improved support for it
* Move session recorder clean up to session.Close
* Always use graceful shutdown for TLS.
Prior to this change, `tsh` will only ever forward the internal key
agent managed by `tsh` to a remote machine.
This change allows a user to specify that `tsh` should forward either
the `tsh`-internal keystore, or the system key agent at `$SSH_AUTH_SOCK`.
This change also brings the `-A` command-line option into line with
OpenSSH.
For more info refer to RFD-0022.
See-Also: #1571
Currently, an app's target FQDN can be obtained only using the endpoint
for creating new app sessions. The OAuth-style back-and-forth redirects
between the app launcher and the app itself are therefore forced to
generate an unnecessary additional app session just to resolve the FQDN.
The new endpoint introduced here allows to resolve such FQDNs by
invoking a dedicated endpoint.
* Attempt to isolate and improve state handling of a NodeSession.
* Add terminal close for kube terminal tests
* Address review comments
* Small tweaks
Co-authored-by: Andrew Lytvynov <andrew@goteleport.com>