Move cache and resourceWatcher watchers from a 10s retry to a jittered backoff retry up to ~1min. Replace the
reconnectToAuthService interval with a retry to add jitter and backoff there as well for when a node restarts due to
changes introduced in #8102.
Fixes#6889.
Some integration tests modify global "constants" to speed up test
execution (e.g. shortening polling intervals). This is occasionally
tripping the Go data race detector, so I have added explicit
serialisation to reading and writing these global settings.
These values are only ever changed in a test environment, and there
should be zero contention for them in a non-test environment.
This commit contains 2 changes:
1. Rename GenerateServerKeys to GenerateHostCerts.
This is a more accurate name and consistent with the existing
GenerateUserCerts endpoint.
2. Change the request type to include a single role, rather than a
list of roles. We only ever allowed a single role in the list
anyway, so this change will prevent future mis-use of the API.
Note: a side effect of this change is we now have two similar endpoints:
- GenerateHostCert: old API that generates SSH cert only
- GenerateHostCerts: a newer API that generates SSH and TLS certs
To avoid making this change too big, we'll aim to deprecate
GenerateHostCert in the future.
Teleport will fail to start when when a k8s cluster is unavailable when
using the kubeconfig in a `kubernetes_service` configuration. This means
that a single missing cluster can disrupt _all_ of the configured
clusters, even if the others are online.
This change makes failing the cluster credential enumeration a
per-k8s-cluster warning, rather than a stop-the-world error.
It also expands the testing shims inside the k8s proxy to allow more
sophisticated mocked scenarios, in order to test the above.
See-Also: #7215
* hsm: migrate CA storage schema
Migrate types.CertAuthorityV2 schema according to
https://github.com/gravitational/teleport/blob/master/rfd/0025-hsm.md#backend-storage
Includes proto changes, types.CertAuthority wrapper changes and data
migration.
Note that we keep and update the old fields for backwards-compatibility.
If a cluster is upgraded to v7 and then downgraded back to v6,
everything should keep working.
* Address review feedback
Prior to this change, `tsh` will only ever forward the internal key
agent managed by `tsh` to a remote machine.
This change allows a user to specify that `tsh` should forward either
the `tsh`-internal keystore, or the system key agent at `$SSH_AUTH_SOCK`.
This change also brings the `-A` command-line option into line with
OpenSSH.
For more info refer to RFD-0022.
See-Also: #1571
```diff
~/.tsh/
└── keys
├── one.example.com --> Proxy hostname
│ ├── certs.pem --> TLS CA certs for the Teleport CA
│ ├── foo --> RSA Private Key for user "foo"
│ ├── foo.pub --> Public Key
- │ ├── foo-cert.pub --> SSH certificate for proxies and nodes
│ ├── foo-x509.pem --> TLS client certificate for Auth Server
+ │ ├── foo-ssh --> SSH certs for user "foo"
+ │ │ ├── root-cert.pub --> SSH cert for Teleport cluster "root"
+ │ │ └── leaf-cert.pub --> SSH cert for Teleport cluster "leaf"
```
When `-J` is provided, this also loads/reissues the SSH cert for the cluster associated with the jumphost's certificate. Fixes#5637.
* auth: API for requesting per-connection certificates
See https://github.com/gravitational/teleport/blob/master/rfd/0014-session-2FA.md#api
This API is a wrapper around GenerateUserCerts with a few differences:
- performs an MFA check before generating a cert
- enforces a single usage (ssh/k8s/db for now)
- embeds client IP in the cert
- marks a cert to distinguish from regular user certs
- enforces a 1min TTL
* Apply suggestions from code review
Co-authored-by: a-palchikov <deemok@gmail.com>
Co-authored-by: a-palchikov <deemok@gmail.com>
* Update logrus package to fix data races
* Introduce a logger that uses the test context to log the messages so they are output if a test fails for improved trouble-shooting.
* Revert introduction of test logger - simply leave logger configuration at debug level outputting to stderr during tests.
* Run integration test for e as well
* Use make with a cap and append to only copy the relevant roles.
* Address review comments
* Update integration test suite to use test-local logger that would only output logs iff a specific test has failed - no logs from other test cases will be output.
* Revert changes to InitLoggerForTests API
* Create a new logger instance when applying defaults or merging with file service configuration
* Introduce a local logger interface to be able to test file configuration merge.
* Fix kube integration tests w.r.t log
* Move goroutine profile dump into a separate func to handle parameters consistently for all invocations
Added support for an identity aware, RBAC enforcing, mutually
authenticated, web application proxy to Teleport.
* Updated services.Server to support an application servers.
* Updated services.WebSession to support application sessions.
* Added CRUD RPCs for "AppServers".
* Added CRUD RPCs for "AppSessions".
* Added RBAC support using labels for applications.
* Added JWT signer as a services.CertAuthority type.
* Added support for signing and verifying JWT tokens.
* Refactored dynamic label and heartbeat code into standalone packages.
* Added application support to web proxies and new "app_service" to
proxy mutually authenticated connections from proxy to an internal
application.
This commit introduces GRPC API for streaming sessions.
It adds structured events and sync streaming
that avoids storing events on disk.
You can find design in rfd/0002-streaming.md RFD.
* Split remote cluster watching from reversetunnel.AgentPool
Separating the responsibilities:
- AgentPool takes a proxy (or LB) endpoint and manages a pool of agents
for it (each agent is a tunnel to a unique proxy process behind the
endpoint)
- RemoteClusterTunnelManager polls the auth server for a list of trusted
clusters and manages a set of AgentPools, one for each trusted cluster
Previously, AgentPool did both of the above.
Also, bundling some cleanup in the area:
- better error when dialing through tunnel and directly both fail
- rename RemoteKubeProxy to LocalKubernetes to better reflect the
meaning
- remove some dead code and simplify config structs
* reversetunnel: factor out track.Key
ClusterName is the same for all Agents in an AgentPool. track.Tracker
needs to only track proxy addresses.
This allows users to manually switch to a different algorithm by:
- setting the config file field
- running "tctl auth rotate"
If config file field is not set, existing signing algorithm of the CA is
preserved.
Store the signing algorithm along the CA private key. When reading old
CAs that don't have it set, default to UNKNOWN proto enum which
corresponds to the old SHA1-based signing alg.
The only time you get a SHA2 signature is when creating a fresh cluster
and generating a new CA. This can be disabled in the config.
Our auth middleware already attaches a TLS identity as context value.
Plumb contexts through and extract the username when recording events.
If the received context doesn't have an identity attached, use "system"
as username.
Lots of noise here due to missing context.Context plumbing :(
We should eventually plumb contexts to all those RPC interfaces.
Updates #3816
Changes agent channel setup behavior to be consistent
openssh by having servers lazily request agent channels
when they are needed, rather than immediately starting a
single connection-wide channel as soon as forwarding is
requested. Fixes an issue introduced in #3613 which
caused openssh clients to hang on exit due to persistent
agent channel.
List of fixed items:
```
integration/helpers.go:1279:2 gosimple S1000: should use for range instead of for { select {} }
integration/integration_test.go:144:5 gosimple S1009: should omit nil check; len() for nil slices is defined as zero
integration/integration_test.go:173:5 gosimple S1009: should omit nil check; len() for nil slices is defined as zero
integration/integration_test.go:296:28 gosimple S1019: should use make(chan error) instead
integration/integration_test.go:570:41 gosimple S1019: should use make(chan interface{}) instead
integration/integration_test.go:685:40 gosimple S1019: should use make(chan interface{}) instead
integration/integration_test.go:759:33 gosimple S1019: should use make(chan string) instead
lib/auth/init_test.go:62:2 gosimple S1021: should merge variable declaration with assignment on next line
lib/auth/tls_test.go:1658:22 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
lib/backend/dynamo/dynamodbbk.go:420:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/dynamo/dynamodbbk.go:656:12 gosimple S1039: unnecessary use of fmt.Sprintf
lib/backend/etcdbk/etcd.go:458:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/firestore/firestorebk.go:407:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/lite/lite.go:317:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/lite/lite.go:336:6 gosimple S1004: should use !bytes.Equal(value, expected.Value) instead
lib/backend/memory/memory.go:365:5 gosimple S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/memory/memory.go:376:5 gosimple S1004: should use !bytes.Equal(existingItem.Value, expected.Value) instead
lib/backend/test/suite.go:327:10 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
lib/client/api.go:1410:9 gosimple S1003: should use strings.ContainsRune(name, ':') instead
lib/client/api.go:2355:32 gosimple S1019: should use make([]ForwardedPort, len(spec)) instead
lib/client/keyagent_test.go:85:2 gosimple S1021: should merge variable declaration with assignment on next line
lib/client/player.go:54:33 gosimple S1019: should use make(chan int) instead
lib/config/configuration.go:1024:52 gosimple S1019: should use make(services.CommandLabels) instead
lib/config/configuration.go:1025:44 gosimple S1019: should use make(map[string]string) instead
lib/config/configuration.go:930:21 gosimple S1003: should use strings.Contains(clf.Roles, defaults.RoleNode) instead
lib/config/configuration.go:931:22 gosimple S1003: should use strings.Contains(clf.Roles, defaults.RoleAuthService) instead
lib/config/configuration.go:932:23 gosimple S1003: should use strings.Contains(clf.Roles, defaults.RoleProxy) instead
lib/service/supervisor.go:387:2 gosimple S1001: should use copy() instead of a loop
lib/tlsca/parsegen.go:140:9 gosimple S1034: assigning the result of this type assertion to a variable (switch generalKey := generalKey.(type)) could eliminate type assertions in switch cases
lib/utils/certs.go:140:9 gosimple S1034: assigning the result of this type assertion to a variable (switch generalKey := generalKey.(type)) could eliminate type assertions in switch cases
lib/utils/certs.go:167:40 gosimple S1010: should omit second index in slice, s[a:len(s)] is identical to s[a:]
lib/utils/certs.go:204:5 gosimple S1004: should use !bytes.Equal(certificateChain[0].SubjectKeyId, certificateChain[0].AuthorityKeyId) instead
lib/utils/parse/parse.go:116:45 gosimple S1003: should use strings.Contains(variable, "}}") instead
lib/utils/parse/parse.go:116:6 gosimple S1003: should use strings.Contains(variable, "{{") instead
lib/utils/socks/socks.go:192:10 gosimple S1025: should use String() instead of fmt.Sprintf
lib/utils/socks/socks.go:199:10 gosimple S1025: should use String() instead of fmt.Sprintf
lib/web/apiserver.go:1054:18 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
lib/web/apiserver.go:1954:9 gosimple S1039: unnecessary use of fmt.Sprintf
tool/tsh/tsh.go:1193:14 gosimple S1024: should use time.Until instead of t.Sub(time.Now())
```
Original finding list:
```
tool/tctl/common/node_command.go:163:3: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(string(out))
^
tool/tctl/common/status_command.go:110:2: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(view())
^
tool/tctl/common/status_command.go:126:3: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(view())
^
tool/tctl/common/token_command.go:201:3: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(tokensView())
^
tool/tctl/common/token_command.go:207:3: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(string(data))
^
tool/tctl/common/user_command.go:248:2: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(string(out))
^
tool/tctl/common/user_command.go:294:3: SA1006: printf-style function with dynamic format string and no further arguments should use print-style function instead (staticcheck)
fmt.Printf(string(out))
^
integration/helpers.go:200:2: SA4006: this value of `err` is never used (staticcheck)
cryptoPubKey, err := sshutils.CryptoPublicKey(cfg.Pub)
^
integration/helpers.go:399:3: SA4006: this value of `roles` is never used (staticcheck)
roles = append(roles, role)
^
integration/helpers.go:597:4: SA4006: this value of `roles` is never used (staticcheck)
roles = append(roles, role)
^
integration/helpers.go:599:4: SA4006: this value of `roles` is never used (staticcheck)
roles = user.Roles
^
integration/integration_test.go:1625:2: SA4006: this value of `err` is never used (staticcheck)
adminsRole, err := services.NewRole(mainAdmins, services.RoleSpecV3{
^
integration/integration_test.go:2185:2: SA4006: this value of `output` is never used (staticcheck)
output, err = runCommand(main, []string{"echo", "hello world"}, cfg, 1)
^
integration/integration_test.go:2340:2: SA4006: this value of `output` is never used (staticcheck)
output, err = runCommand(main, []string{"echo", "hello world"}, cfgProxy, 1)
^
integration/kube_integration_test.go:154:2: SA4006: this value of `err` is never used (staticcheck)
role, err := services.NewRole("kubemaster", services.RoleSpecV3{
^
integration/kube_integration_test.go:321:2: SA4006: this value of `err` is never used (staticcheck)
role, err := services.NewRole("kubemaster", services.RoleSpecV3{
^
integration/kube_integration_test.go:366:2: SA4006: this value of `err` is never used (staticcheck)
role, err := services.NewRole("kubemaster", services.RoleSpecV3{
^
integration/kube_integration_test.go:386:2: SA4006: this value of `err` is never used (staticcheck)
pods, err := s.CoreV1().Pods(kubeSystemNamespace).List(metav1.ListOptions{
^
integration/kube_integration_test.go:465:2: SA4006: this value of `err` is never used (staticcheck)
mainRole, err := services.NewRole("main-kube", services.RoleSpecV3{
^
integration/kube_integration_test.go:579:2: SA4006: this value of `err` is never used (staticcheck)
pods, err := proxyClient.CoreV1().Pods(kubeSystemNamespace).List(metav1.ListOptions{
^
integration/kube_integration_test.go:727:2: SA4006: this value of `err` is never used (staticcheck)
mainRole, err := services.NewRole("main-kube", services.RoleSpecV3{
^
integration/kube_integration_test.go:840:2: SA4006: this value of `err` is never used (staticcheck)
pods, err := proxyClient.CoreV1().Pods(kubeSystemNamespace).List(metav1.ListOptions{
^
integration/kube_integration_test.go:1008:2: SA4006: this value of `err` is never used (staticcheck)
role, err := services.NewRole("kubemaster", services.RoleSpecV3{
^
tool/teleport/common/teleport_test.go:83:2: SA4006: this value of `cmd` is never used (staticcheck)
cmd, conf := Run(Options{
^
tool/teleport/common/teleport_test.go:91:2: SA4006: this value of `cmd` is never used (staticcheck)
cmd, conf = Run(Options{
^
tool/tsh/tsh.go:170:2: SA4006: this value of `cmdLine` is never used (staticcheck)
cmdLine := []string{}
^
integration/helpers.go:399:11: SA4010: this result of append is never used, except maybe in other appends (staticcheck)
roles = append(roles, role)
^
integration/helpers.go:597:12: SA4010: this result of append is never used, except maybe in other appends (staticcheck)
roles = append(roles, role)
^
integration/integration_test.go:1092:7: SA4000: identical expressions on the left and right side of the '&&' operator (staticcheck)
for len(b.Tunnel.GetSites()) < 2 && len(b.Tunnel.GetSites()) < 2 {
^
integration/integration_test.go:1426:6: SA4000: identical expressions on the left and right side of the '&&' operator (staticcheck)
for len(main.Tunnel.GetSites()) < 2 && len(main.Tunnel.GetSites()) < 2 {
^
integration/integration_test.go:1691:6: SA4000: identical expressions on the left and right side of the '&&' operator (staticcheck)
for len(main.Tunnel.GetSites()) < 2 && len(main.Tunnel.GetSites()) < 2 {
^
integration/integration_test.go:1895:6: SA4000: identical expressions on the left and right side of the '&&' operator (staticcheck)
for len(main.Tunnel.GetSites()) < 2 && len(main.Tunnel.GetSites()) < 2 {
^
integration/kube_integration_test.go:548:6: SA4000: identical expressions on the left and right side of the '&&' operator (staticcheck)
for len(main.Tunnel.GetSites()) < 2 && len(main.Tunnel.GetSites()) < 2 {
^
integration/kube_integration_test.go:814:6: SA4000: identical expressions on the left and right side of the '&&' operator (staticcheck)
for len(main.Tunnel.GetSites()) < 2 && len(main.Tunnel.GetSites()) < 2 {
^
```
Changes the lifetime of agent forwarding to be scoped
to the underlying ssh connection, instead of the
specific ssh channel which initially passed the agent
forwarding request.
TeleInstance manages an auth server and a set of proxies/nodes.
TeleInstance.Stop only stops the auth server. A bunch of tests used it
assuming it also cleans up any running nodes.
This has caused a lot of log spam from failing heartbeats and generally
wasted CPU cycles.
Rename it to Stop to StopAuth to make it's purpose more obvious. Add
TeleInstance.StopAll that cleans up everything, suitable for deferring
in tests.
Top-level `make lint` rule that scans everything and a CI-specific rule
for Jenkins.
Currently only enable "unused", since it's reliable. The list will
expand.
Also clean up stragglers that somehow slipped through in #3552.
Updates #3551
This commit fixes#3252
Security patches 4.2 introduced a regression - leaf clusters ignore role mapping
and attempt to use role names coming from identity of the root cluster
whenever GetNodes method was used.
This commit reverts back the logic, however it ensures that the original
fix is preserved - traits and groups are updated on the user object.
Integration test has been extended to avoid the regression in the future.
If an attacker can force a username change at an IdP, upon second login,
the services.User object of the original user can be updated with new
roles and traits. If these new roles and traits differ, the original
user can have their privileges raised (or lowered).
To mitigate this, encode roles and traits within the certificate and use
these when fetching roles to make RBAC decisions. If roles and traits are
not encoded within an certificate (for example for old style SSH
certificates then fallback to using the services.User object and log a
warning.