* Refactor component heartbeat callbacks
Consolidate the OK/degraded broadcasts so the same logic isn't
duplicated for each component.
* Periodically update discovered desktops
Fixes#8644
* Allow customizing the desktop search
With this change, we support a discovery base DN other than '*',
and add support for further filtering the results with additional
LDAP filters.
Additionally, we filter out group managed service accounts, which
show up in LDAP searches for (objectClass=computer), despite not
being comptuers. (This is mostly harmless, as the service accounts
aren't present in DNS, so Teleport just ignores them. It does, however,
log a DNS error message that could be confusing, so we explicitly
filter these out just to be safe. This was discovered when testing
on AWS managed AD, which creates a gMSA for DNS.
This new feature in Go 1.17 automatically restores the environment
variable to its previous value when a test ends, making it simpler
to set up the environment for tests and less likely that we accidentally
leave behind global state.
Also convert some of the remaining uses of check to standard Go tests.
Fixes#7606, where a node doesn't notice when the tunnel port changes.
Imagine you have a cluster with a node connected in via a tunnel through a proxy `proxy.example.com` on port `3024`
Now change the proxy config so that `tunnel_public_address` is `proxy.example.com:4024`. You either restart the proxy, or reload the proxy config with a `SIGHUP`.
...and then the node
a) loses its connection to auth (because the tunnel is gone), and
b) _doesn't reconnect_, because even though the proxy address hasn't changed,
the node has cached the old tunnel_public_address and keeps trying to connect
to that.
You can always manually restart the node to have it reconnect, but that would be a pain if you have thousands of nodes.
In order to not have to manually restart all nodes, this change implements a check for a connection failures to the auth server, and re-starts the node if there are multiple connection failures in a given period of time. The check as-implemented piggybacks on the node's "common.rotate" service, which can already restart the node in certain circumstances, and uses the success of the periodic rotation sync as a proxy for the health of the node's connection to the auth server.
See-Also: #7606
* Connect to LDAP on port 636 (LDAPS)
Rather than connecting to LDAP on the insecure port and attempting
to upgrade the connection to TLS, simply connect on the LDAPS port.
* WIP
* check connection state
* check connection state
* printing certs
* confirming cert verf fails
* checkpoint
* updates config and new ldap client to make use of ldaps
* change VerifyCA to SkipVerifyCA
* checking for proper certificate in applyWindowsDesktopConfig
* fixes yaml value
* CR
* error wording change
* allow the user to use the system cert pool
* var change
* removing todo and renaming some fields
* refactors error messages and simplifies logic
Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
Some integration tests modify global "constants" to speed up test
execution (e.g. shortening polling intervals). This is occasionally
tripping the Go data race detector, so I have added explicit
serialisation to reading and writing these global settings.
These values are only ever changed in a test environment, and there
should be zero contention for them in a non-test environment.
Since our LDAP-based desktop discovery is not very configurable,
we opt to have it disabled by default.
Teleport will log a warning if desktop discovery is disabled and there
are no statically defined Windows desktops. In this case, the Windows
Desktop Service will simply sit idle, as there will be no desktops
available to connect to.
This commit implements the above, but also paves the way towards
a more flexible discovery system (described in the RFD).
We introduce a new config section:
discovery:
base_dn: '*'
filters:
- filter1
- filter2
For now, the only valid value for base_dn is the wildcard, which
instructs Teleport to search from the domain root. Additionally,
teleport will validate that any provided filters are valid but
does not currently respect them when performing the search.
Future updates will allow for changing the base DN to something
more specific and filtering the results with LDAP filters.
* Add RBAC for Windows desktop access
This commit adds RBAC checks for Windows Desktops as described in
RFD 33 and RFD 34:
- add Windows desktop logins & labels to role definition
- introduce new file config for host labels based on a regexp match
- auth server API performs access checking for Windows desktop resources
- add RDP client callback to authorize the user
- support user/role locks
- respect the client idle timeout setting
Note: in cases where an connection is terminated to to RBAC, the web UI
currently displays "websocket connection failed" because the connection
is closed from the server. We'll need to follow up with a nice error
message for the client side to improve the UX here.
Other changes:
* Remove OSS RBAC migration marked for deletion
* Stop creating a default admin role
* add wildcard desktop access to the preset access role
Updates #7761
* PIV authentication for RDP
This uncomfortably large change fully implements smartcard PIV
authentication for RDP clients using the Teleport CA:
- PIV applet implementation in emulated RDP smartcard
- generating Windows-compatible certificates using Teleport CA with a
dedicated RPC
- generating dummy CRLs for Teleport CA and publishing it via LDAP
The CRLs are required by Windows for any smartcard login certificate, we
can't avoid that. But we can avoid making it public: the CRL can live in
ActiveDirectory instead of a public endpoint of a Teleport service.
Here, we use LDAP to publish the CRL on startup, valid for a year.
There are a few unhandled cases in the current implementation:
- LDAP server certificate is not validated when upgrading to TLS
- multiple active CAs (with HSMs) are not supported, only one CRL is
published
- CA rotation is not supported, CRL is not re-published on rotation
All of the above issues will be handled in future PRs as this one is
already too large.
* Address review feedback
* Fix linter errors
This change originally intended to improve the reliability of the FirestoreDB unit tests. These tests run against a local fierestore emulator, and after much wailing and gnashing of teeth, I have come to the conclusion that the emulator is not a good match for what we need it to do.
The firetore tests are currently providing negative value, so I have disabled their execution. They can be enabled by defining the `TELEPORT_FIRESTORE_TEST` environment var (in the same manner as the `etcd` tests). This way, the tests will still be compiled so we can at least detect any obviously breaking changes.
Also; previously to this patch the DynamoDB tests were not compiled without the `dynamodb` build tag being set. To bring the DynamoDB backend into line with the others, the DynamoDB tests are now automatically compiled, but are skipped by default during a test run. They can be enabled by defining the TELEPORT_DYNAMODB_TEST
This patch still contains some Firestore test cleanup and additional commentary from the original patch, but that is not the main point of the change.
* Connect proxy <-> windows_desktop_service <-> RDP server
Link together the proxy (websocket), service (mTPS) and RDP client. Pass
target desktop UUID via SNI on the TLS connection from the proxy.
* Use client CAs to validate incoming desktop_service connections
* Send binary frames on desktop websocket
Introduce new make targets to check and add license headers to files
("make lint-license" and "make fix-license"). License checking is now a part of
"make lint" as well.
Initial attempts used goheader, but it caused "make lint-go" to become about 9x
slower (if not more), plus it only targets go files. Google's addlicense is fast
enough and targets however many file types we want.
Existing files that were missing licenses got the header added, using the
current year as the license date.
* Introduce lint-license and fix-license make targets
* Ignore generated files
* Add license to go files
* Replace irregular licenses with standard copyright/license
* Add license to proto files
* Install addlicense in build.assets Dockerfile
Boilerplate for a new service and API objects:
- windows_desktop_service config section
- service registration and heartbeats
- static host registration and heartbeats
- caching, permissions, etc
- "tctl get" support
For new connections the service aborts after authentication, since the
RDP client implementation is not ready yet (pending in
https://github.com/gravitational/teleport/pull/7824).
Tested that the service starts, registers (both over a tunnel and
directly) and creates the API objects.
* Revert "Send web idle timeout with new web session response (#7839)"
which contains a bug where web idle timeout returns zero despite settings
* Retrieving web idle timeout in auth service and setting it with new web
session fixes the bug
Teleport will fail to start when when a k8s cluster is unavailable when
using the kubeconfig in a `kubernetes_service` configuration. This means
that a single missing cluster can disrupt _all_ of the configured
clusters, even if the others are online.
This change makes failing the cluster credential enumeration a
per-k8s-cluster warning, rather than a stop-the-world error.
It also expands the testing shims inside the k8s proxy to allow more
sophisticated mocked scenarios, in order to test the above.
See-Also: #7215
Fixed two issues that were causing a performance issue with the Web UI.
The first issue was that when an "Authorizer" was being created at
process startup by Auth Service, it was by-passing the cache and always
hitting the backend directly. All services have been updated to now use
an cached access point.
The second issue was that the Web UI was not using the local cache when
fetching the list of roles for a user. The Web UI has been updated to
now use the local cached access point.
Adds the ability to block network traffic on SSH sessions.
The deny/allow lists of IPs are specified in teleport.yaml file.
Supports both IPv4 and IPv6 communication.
This feature currently relies on enhanced recording for
cgroup management so that needs to be enabled as well.
-- Design rationale:
This patch uses Linux Security Module (LSM) hooks, specifically
security_socket_connect and security_socket_sendmsg, to control
egress traffic. The LSM provides two advantages over socket filtering
program types.
- It's executed early enough that the task information is available.
This makes it easy to report PID, COMM, etc.
- It becomes a model for extending restrictions beyond networking.
The set of enforced cgroups is stored in a BPF hash map and the
deny/allow lists are stored in BPF trie maps. An IP address is
first checked against the allow list. If found, it's checked for
an override in the deny list. The policy is default deny. However,
the absence of the NetworkRestrictions API object is allow all.
IPv4 addresses are additionally registered in IPv6 trie (as mapped)
to account for dual stacks. However it is unclear if this is sufficient
as 4-to-6 transition methods utilize a multitude of translation and
tunneling methods.
Multiple routines were fighting over the global logrus `Logger`
instance, causing the race detector to trip roughly once in every 10
test runs.
This patch addresses this race condition by supplying each of the
competing processes an entirely separate logger, and ensuring that
these log instances are plumbed through to the code that would otherwise
trip the race detector.
* Adds the idle_timeout_message to the auth_service config file block
* Plumbs the value through to the session monitor
* Writes the message to stderr when a session times out due to inactivity
* Adds some machinery to the test helpers to configure appropriate tests
See-Also: #6091
Prior to this change, TCP forwarding over SSH could only be disallowed
by user-based rules, rather than by individual target nodes.
This change adds:
* the`port_forwarding` key to the yaml SSH config block, with a boolean value
* Plumbing to pipe the resulting config value through to the SSH server
* A predicate check in the SSH server to [dis]allow port forwarding based on the setting.
This change also:
* adds a common way for integration tests to await the establishment of an SSH session
* refactors several integration tests to use this new method rather than manually waiting
* adds some marshaling code to move errors from spawned goroutines back into the
main test routine in verifySessionJoin()
See-Also: Issue #6783
* Use cmp.Equal instead of manual Equals methods
Equals methods can get out-of-sync with the fields added in structs they
compare. Using `cmp.Equal` handles that, removes a ton of code and makes
it more explicit when specific fields are excluded from comparison.
* Use gogoproto equal plugin for comparing proto values
This will be faster than reflect-based go-cmp.
* Init web handler with auth server feature flags on proxy init
* Retrieve auth server features by calling Ping when connecting
to auth svc which contains the server feature flags in the response