Application sessions were previously only logged when launching an application
session via the UI, and not from the `tsh app login` command. This has been
corrected. The AppName and AppURI are now passed in as part of the gRPC
request to the auth server, which is then used to emit the audit event.
* Introduce proto types for ProvisionTokenV3
* Add methods to ProvisionTokenV3 to support ProvisionToken iface
* Start building v3 support into the client
* add support for mashalling and unmarshalling ProvisionTokenV3
* Start unit testing ProvisionTokenV3
* Remove oneof to support yaml marshal/unmarshal
* Client should try V3 methods and fallback to v2
* More tests
* Fix join tests
* Fix integration tests
* Switch integration tests to use v3 spec
* Switch iam tests to use ProvisionTokenV3
* Change ec2 join tests to use V3 tokens
* Fix events tests for V3 token
* support ProvisionTokenV3 within API client events handler
* Explicitly specify JoinMethod
* Tidy up final usage of NewProvisionTokenV2FromSpec in tests
* Improve proto docs on ProvisionTokenV3
* Fix bot join tests
* Clarify error message for invalid join method
* Adjust resource version comment
* Fix comments and return error rather than bool in V2() method
* Catch incompatible conversions case
* Include V2 ProvisionToken in tests and add appropriate DELETE IN notes
* Fix linter warnings/unit test failures
* Use nolint rather than lint:ignore
* Add more DELETE IN notes
* Run goimports on join_ec2_test.go
* Address PR comments from tim.
* Add more deprecation/delete in notices
* Improve godoc comments on checkAndSetDEfaults for provider config
* Simplify implementation by dropping client-ahead compatability
* Add some support for client-ahead but with conversion to v3
* Update code comments to include responsible party
* Rename `Role` to `RoleARN` in EC2 configuration for clarity
* Fix tests for Role -> RoleARN rename
* Move MustCreateProvisionToken out of API and into test packages
* Properly go imports files
* Reduce number of auth dials for tsh commands
One of the major areas of latency for `tsh ssh` is creating multiple
auth clients. Since the auth client is lazy and only actually performs
the dial on first use we can create an auth client once and simply
reuse it. This is done by adding an `auth.ClientI` to `ProxyClient`
which is created via `connectToProxy`. All attempts to connect to
the current auth server via the `ProxyClient` will be given the
cached `auth.ClientI`.
The new method of retrieving the current auth client also allowed
to remove a number of calls to `GetSites` which were used to obtain
the current cluster name. The local profile already contains the name
of the cluster and calls to `GetSites` were unnecessary. All instances
which relied on the site name now retrieve from information that the
`ProxyClient` already has.
In an effort to reduce ambiguity and confusion `CurrentClusterAccessPoint`
and `ClusterAccessPoint` were also removed. AccessPoint denotes that
you are connecting to a cache, but the `ProxyClient` is always going
to be hitting the auth server directly. The two have been replaced
with `CurrentCluster` and `ConnectToCluster`, which they were merely
wrappers for anyhow.
Control master functionality is currently broken in proxy recording
mode. We're aware of the issue and will disable the test until we
are able to fix the underlying issue.
Updates #16224
* Add Yubikey PrivateKey implementation for use by Teleport clients.
- Add yubikey login logic, reusing previously stored private keys.
- Fix identity file decoding with PIV keys, which sign ecdsa certificates.
- Add libpcsclite-dev pre-req for building on linux.
- Remove unnecessary keys.Signer interface and move its functionality to keys.PrivateKey.
- Move retry and jitter utils to new api/utils/retryutils package.
Update `duo-labs/webauthn` up to `20220122034320`, which is the latest version
we can get without dipping into dependency hell (`etcd` and `opentelemetry` woes
ensue after [2365c59d9f][1]).
`tstranex` could be dropped for a while now (we moved on to WebAuthn-like
interfaces for mocks). `cfssl` was only imported due to what I assume was an
IDE mishap.
I've elected to keep `fxamacker/cbor`, instead of trying to move to
[webauthncbor][2]. fxamacker is solid, past v0, seems more appropriate for
client-side libs and still backs webauthncbor.
There are no updates for `flynn/hid` and `flynn/u2f`.
Release notes for fxamacker/cbor:
https://github.com/fxamacker/cbor/releases/tag/v2.4.0.
[1]: 2365c59d9f
[2]: https://pkg.go.dev/github.com/duo-labs/webauthn@v0.0.0-20220815211337-00c9fb5711f5/protocol/webauthncbor
* Drop tstranex/u2f dependency
* Drop direct dependency to cloudflare/cfssl
* Update fxamacker/cbor/v2 to v2.4.0
* Update duo-labs/webauthn to 2022-01-22
* Fix: Make sure all credentials are set in the user
* Simplify: Drop now unnecessary AuthenticationSelection copy
Update metalinter, fix a few lint warnings and replace deprecated linters.
`deadcode`, `structcheck` and `varcheck` are abandoned and now replaced by [`unused`][1].
Since 1.19, `go fmt` reformats godocs according to https://go.dev/doc/comment. I've done a bulk-reformatting of the codebase to keep the linter happy. Backporting is mostly harmless (the exception being `lib/services/role_test.go`, that for some reason breaks the _old_ linter using the new format).
[1]: https://golangci-lint.run/usage/linters/
* Bump golangci-lint version
* Replace abandoned linters
* Fix bodyclose on lib/auth/github.com
* Fix bodyclose on lib/kube/proxy/streamproto/proto_test.go
* Fix bodyclose on lib/srv/alpnproxy/proxy_test.go
* Fix bodyclose on lib/web/conn_upgrade_test.go
* Silence staticcheck on lib/kube/proxy/forwarder_test.go
* Silence staticcheck on lib/utils/certs_test.go
* Address BuildNameToCertificate deprecation warnings
* Run `go fmt ./...`
* Run `go fmt ./...` on api/
* Ignore formatting in role_test.go
* Remove redundant initializers in lib/srv/uacc/
* Update e/
* Fix incorrect use of loop variables
This commit fixes a few occurrences of loop variables being
incorrectly used in the context of Go-routines or (most frequently)
parallel tests. To fix the issues, we create a local copy of the range
variables before the parallel tests (or Go-routine), as suggested in
the documentation of the `testing` package:
https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks
Issues were found using the `loopvarcapture` linter.
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
* fix TestTraceProvider/spans_exported_with_gRPC+TLS
* run TestSSH serially
* operator: Conserve 'created_by' data in user spec
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: Tim Ross <tim.ross@goteleport.com>
Co-authored-by: Hugo Hervieux <hugo.hervieux@goteleport.com>
Following on from #13658, this patch removes more (but unfortunately not
all) usages of the deprecated, list-based port-allocation scheme.
This patch:
1. Updates the integration test `TeleInstance` fixture to use injected
listeners rather than static ports when creating a new proxy node in
a cluster,
2. Updates tests affected by (1) to pre-allocate and inject listeners,
including handling caching the listener FDs between proxy restarts
3. Removed unnecessary port allocations when creating LoadBalancer
fixtures, and
4. Moved the remaining list-base port allocation functions out of helpers
and back into integrations and made private. These functions should
never be used by more than one test package concurrently or there is a
very high chance of a port collision. Rather than just write that rule
down in the comments, I have contained the deprecated code into the
affected package made the compiler enforce the rule for us.
See-Also: #12421
See-Also: #13658
See-Also: #14408
Making all of our integration tests run in entirely parallel requires
a large engineering effort to enforce test isolation and remove all race
conditions between tests.
A lower-effort alternative may be to split apart the various test suites
into their own Go packages, and test those packages in parallel, even if
the tests inside are still executed serially. Auditing the test suites
for races on system-level resources (e.g. files, ports) is much easier
than chasing down every p[ossible race in the testing system.
This patch acts as a trial run, breaking a fairly well-defined and
self-contained test suite out into its own package. Note that the goal of
this change is not necessarily to shave minutes off the build (although
that would be nice), but to act as an illustration of how other, less
well-formed test suites might be broken apart.
See-Also: #12421
See-Also: #14408
Primary Changes:
- Remove reliance on Private Key PEM:
- Update native and keygen packages to return PrivateKey instead of PEM key
- Add new PrivateKey interface which implements crypto.Signer
- Replace PEM encoded private key usage where possible
- Replace calls to tls.(Load)X509KeyPair with keys.(Load)X509KeyPair in
client packages
Minor Changes:
- Remove unused agent.AddedKey return from LoadKey
- Simplify sshutils and removed unused code paths
- Add ecdsa and ed25519 key support
* transport: Rewrite headers, including JWTs, for websockets.
Applications can otherwise 401 on websocket requests, as they do not
present any authentication headers.
docs: Fix the reserved JWT header name.
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
* Add test for JWT header in websocket apps
Signed-off-by: Roman Tkachenko <roman@goteleport.com>
Co-authored-by: Alex Vandiver <alex@chmrr.net>
Adds a wrapper around `ssh.Session` which injects tracing context
in a similar manner to the `ssh.Client` wrapper. All usages of
`ssh.Session` have now been replaced and have the appropriate
`context.Context` passed along
Part of #12241
## What
First part of the Kubernetes [Discovery RFD](https://github.com/gravitational/teleport/pull/13376/) to introduce a Kubernetes server per cluster.
This PR introduces a separate Kubernetes server that uses the already introduced `KubernetesClusterV3`.
## Compatibility
In previous versions, Kubernetes Clusters were part of regular `ServerV2` resource and this refactoring deprecates the `ServerV2` usage but keeps them for compatibility with previous version.
Everything is backward compatible, so v10 kubernetes agents and trusted clusters can connect fine.
## Next steps
Once this is merged, a new PR will introduce dynamic registration for Kubernetes Clusters discovered through EKS Discovery.
* Embed auth.Cache in auth.Server
* Hit the backend during Auth initialization
* Bypass the cache when rotating CAs
* Services.UpsertTrustedCluster is different
* Bypass the cache in waitForTunnelConnections
* Fix infinite recursion
* More cache bypassing during init and rotations
* Rename Services to Uncached in auth.Server
* Further cleanups
* Don't start the auth cache immediately
* Go back to Services rather than Uncached
* Comments and a missing method
* Add context.Context to session.Service interface
Updates GetSessions, GetSession, CreateSession, UpdateSesion, and
DeleteSession to take a context.Context. All call paths are updated
to properly pass along a real context instead of relying on a
to eliminate context.TODOs.
This commit adds the Teleport operator. The operator reconciles
TeleportUsers and TeleportRoles Kubernetes resources with Users and
Roles Teleport resources.
When desktop access is enabled, the TeleportReady event will not
be emitted until the WindowsDesktopReadyEvent is emitted, and it
turns out we have *never* emitted a WindowsDesktopReadyEvent.
This is likely due to desktop access being copied from kube access
since the very beginning. The same issue was recently fixed for
kube access in #9418.
Ports used by the unit tests have been allocated by pulling them out of a list, with no guarantee that the port is not actually in use. This central allocation point also means that tests cannot be split into separate packages to be run in parallel, as the ports allocated between the various packages will be allocated multiple times and end up intermittently clashing.
There is also no guarantee, even when the tests are run serially, that the ports will not clash with services already running on the machine.
This patch (largely) replaces the use of this centralised port allocation with pre-created listeners injected into the test via the file descriptor import mechanism use by Teleport to pass open ports to child processes.
There are still some cases where the old port allocation system is still in use. I felt this was already getting beyond the bounds of sensibly reviewable, so I have left those for a further PR after this.
See-Also: #12421
See-Also: #14408
Fixes an issue where the agentpool backoff channel would be redefined
each time an event was received while waiting for the backoff to complete.
This could lead to a longer backoff period than expected.
Waits for each resource to connect individually by splitting up the test into
multiple runs ran in parallel
* configure golangci-lint misspell to check for anglicized spellings
* Americanize spellings
* fix aws constant value with british spelling 🇬🇧
* update api types with americanized spellings
* use american spellings .cloudbuild/scripts
* Start postgres without TLS when multiplexing is disabled
* Add integration test for starting postgres with --insecure-no-tls
* Fix dupe postgres listener mistake
* Log the actual address of listeners
* Remove unnecessary error checking
As a prelude to breaking individual integration test suites out into
their own packages (in order to make them more amenable to running
in parallel), this patch extracts the common test fixtures and places
them in a common `helpers` package.
This will allow the integration test package to share common
infrastructure and vocabulary once they are split out.
Create spans for all public facing TeleportClient,
ProxyClient, and NodeClient methods. This makes
correlating spans easier to reason about when
looking at `tsh` traces. As a result of creating
spans, some additional context propagation is
required as well to ensure that spans are linked
properly.
This also removes the unused `quiet` argument from
`ConnectToCluster`. It's usage was not consistent
by existing callers, and it was ignored, so in order
to avoid confusion in future calls, it was removed.
#12241
This change adds IP-based validation for SSH certificates.
There's new option in role definition:
kind: role
metadata:
name: dev
spec:
options:
pin_source_ip: true
When that is set to true client IP must be the same when generating certificates and using them. It uses source_address critical option that should be supported by both teleport and sshd and only applies to certificates we send to user (like in tsh login), we don't pin IP in certificates issued for web UI as they can't leak.
This change also omits machine ID (it uses different code path) - it will be added in separate PR.
Most of the lines changed are from regenerating types.proto, change itself is not that big
Relates #11719
This change adds the --all/-R flag to tsh ls, tsh apps ls, tsh db ls, and tsh kube ls, which lets tsh list resources from across all clusters and logged in proxies.
* Add CSRF mitigations
This commit includes two fixes:
1. Enforce an application/json Content-Type server-side.
2. When checking the bearer token, verify that the user
associated with the token matches the user associated
with the cookie.
* Fix TEL-Q122-13: Access Requests Denial Of Service Via Request Reason (#125) (#127)
* Ignore input when data flow is off in TermManager
When data flow is disabled in TermManager (at the beginning or when TermManager.Off was called) we should ignore all input we receive (currently we buffer it)
* Agent forwarding socket security fix.
Co-authored-by: Lisa Kim <lisa@goteleport.com>
Co-authored-by: Joel <jwejdenstal@icloud.com>
Co-authored-by: Przemko Robakowski <przemko@przemko-robakowski.pl>
* Add support for automatic user provisioning
* Add UID parker to reexec
* Add a `teleport park` subcommand that does nothing
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
After the merge of https://github.com/gravitational/teleport/pull/12674 we no longer use the following configuration:
```yaml
teleport:
ca_signature_algo: "rsa-sha2-512"
```
As we now rely upon the `x/crypto` package to choose the signing algorithm (it defaults to `rsa-sha2-512`)
**Demo**
If we set `ca_signature_algo` (the value is irrelevant) and start `teleport` we get:
```shell
root@marco:/workspace# teleport start --debug
2022-06-02T09:33:58Z WARN ca_signing_algo config option is deprecated and will be ignored, we'll always default to rsa-sha2-512. config/configuration.go:348
2022-06-02T09:33:58Z INFO Generating new host UUID: b001159a-10e0-49a7-b4dc-61c73fbe9e42. service/service.go:726
...
```
Fixes#12905
* Add client side circuit breaker to auth clients
In order to apply back pressure we can utilize a circuit breaker that
monitors error responses from auth server. When tripped it will prevent
all outbound requests to auth for a period of time. This can also help
prevent a potential thundering heard when auth is in an unhealthy state.
By default the circuit breaker will only be tripped if 90% of the
requests made in the monitoring interval fail.
The instance metadata client added in #12593 significantly slows down integration tests. This change adds a disabled client to integration tests to improve performance.
This adds proxy peering support. A configurable setting that allows for agents
to connect to a subset of proxies and be reachable through any proxy in the
cluster. This is achieved by creating grpc connections between each proxy
server. Client connections can then be passed between proxies to the desired
agent.
This change fixes a bug in EC2 labels (#12593) involving concurrent writes to the labels map. This is fixed by making EC2.Get() return a copy instead of the actual label map.
When the client connects to teleport with invalid credentials (eg
expired ones) it will retry multiple times until the context deadline is
reached.
When it happens, we receive the generic error: context deadline
exceeded.
However, we can ask for the latest connection error, one which will give
us more information on why it happened.
To ask for this extra error we need to add the following
grpc.DialOption: grpc.WithReturnConnectionError()
After doing this, we will get the errors that happenned when trying to
connect to the grpc Server.
This should help us debug possible connection problems.
We had to refactor a little bit the way we handle the parallel
connection attempts to receive all the connection errors from the
multiple flows.
This commit upgrades the version of x/crypto we use, to the current latest
`go get -u golang.org/x/crypto`
We also replaced the deprecated variables and updated the tests to match the
current default KEX Algos
The x/crypto didn't support RSA-SHA2 algos, so we developed our own algorithm
signer. This is no longer the case, and after upgrading x/crypto to 20220518 we
can safely remove the custom code we have.
From OpenSSH 8.8+, it works if we explicitly add the older algorithm
Somthing like this: `./ssh -vvv -oPubkeyAcceptedAlgorithms=+ssh-rsa-cert-v01@openssh.com teleportadmin@moon.marco.mydemo`
* Add tracing instrumentation for ssh clients/servers
Add tracing context to the existing ProxyHelloSignature to provide
span information across ssh connections. To add span context per
ssh session on top of new connections, the same tracing context is
passed in the first global request of the session.
In order to ensure that tracing context is pulled from and inserted
into the proper context.Context, some interfaces and methods were
changed to take one as the first argument.
* run HSM tests in parallel
* add missing punctuation to commit
Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
* Improve CertAuthorityWatcher
CertAuthorityWatcher and its usage are refactored to allow for
all the following:
- eliminate retransmission of the same CAs
- reduce memory usage by having one local watcher per proxy
- adds the ability to filter only the CAs that are desired
- reduce the time required to send the first CAs
watchCertAuthorities now compares all CAs it receives from the
watcher with the previous CA of the same type and only sends to
the remote site if they are not identical. This is to reduce
unnecessary network traffic which can be problematic for a
root cluster with a larger number of leafs.
The CertAuthorityWatcher is refactored to leverage a fanout
to emit events to any number of watchers, each subscription
can be for a subset of the configured CA types. The proxy
now has only one CertAuthorityWatcher that is passed around
similarly to the LockWatcher. This reduces the memory usage
for proxies, which prior to this has one local CAWatcher per
remote site.
updateCertAuthorities no longer waits on the utils.Retry it
is provided with before starting to watch CAs. By doing this
the proxy no longer has to wait ~8 minutes before it even
starts to watch CAs.
* Update golangci-lint
To accomodate the recent Go 1.18 upgrade
* Fix new lint warnings as a result of linter upgrade
* Set golangci-lint to Go 1.18 mode
golangci-lint will automatically skip linters that don't have support
for Go 1.18.
See: https://github.com/golangci/golangci-lint/issues/2649
* Remove unused backend wrapper from Cache
* Remove double printShutdownStatus
* Fix readyz race condition
* Test coverage for the readyz.monitor fix
* Close listeners immediately in proxy.shutdown
* Use and handle net.ErrClosed correctly
This adapts utils.IsUseOfClosedNetworkError to check for net.ErrClosed
even inside trace.Aggregate errors, makes it so that we always return
something that would pass errors.Is(err, net.ErrClosed) when returning
from a (net.Listener).Accept(), and handles closed listeners within our
various Serve() loops so that we don't hit spurious backoff waits while
shutting down.
* Close listeners early and emitters late
* Test coverage for the proxy listener changes
* Revert some errors back to trace.ConnectionProblem
* Reduce PR scope to just the proxy, add comments
* Improve error logging.
Teleport now will try to extract MySQL server version from initial handshake package instead of sending `8.0.0-Teleport` every time. This string can be overridden by new configuration option `mysql.server_version`. On DB service start Teleport will also try to fetch the current version from MySQL/MariaDB instance. After that the server version will be updated on every successful connection to keep it up to date.
Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
* Revert "Make `PortList.Pop()` thread-safe (#11799)"
This reverts commit a17337d1a1.
* Revert "Ensure stateOK is reported only when all components have sent updates (#11249)"
This reverts commit b749302e2c.
* Revert "Throw startup error if `TeleportReadyEvent` is not emitted (#11725)"
This reverts commit 933e247287.
* Revert "Fix ProxyKube not reporting its readiness (#12150)"
This reverts commit 6cdcfe7721.
* Speed up TestAppServersHA
Allow test cases to be run in parrallel and allow app servers to
be spawned in parrallel to reduce test time from ~99s to ~20s.