This addresses some observations from @stevenGravy about a couple places
in the docs where Enterprise-specific installation steps are missing.
- Edit the Machine ID Getting Started guide: Use the `install-linux.mdx`
partial for download instructions, since these include instructiosn
for Enterprise users.
- Add Enterprise instructions to Helm guides that only include OSS
installation instructions.
* Allow custom trace exporter for tsh
Trace forwarding via `tsh --trace` only works to date if Auth is
configured with the `tracing_service` enabled. In all other scenarios
the traces are still forwarded to Auth but are silently dropped.
This makes it difficult to capture valuable traces from customers
with latency issues as they are first required to setup a Telemetry
backend and enable tracing in their cluster.
A new `--trace-exporter` flag is added to `tsh` to make it possible
to direct traces from `tsh` to a file or local instance of jaeger
without hacing to modify their Teleport cluster. The url must follow
the same semantics as the config file equivalent.
One important caveat is that **only** the `tsh` spans will be captured.
Any corresponding `teleport` spans are exported acording to the
`tracing_service`. While this only paints half the picture, it is
still a good indicator of where `tsh` may be experiencing latency.
An example usage to send traces to local files:
```bash
tsh --trace --trace-exporter=file:///some/path/traces ssh user@foo
```
The `RegisterWithAuthServer` and `SetExpectedInstanceRole` functions have
been made public for use by the Teleport Enterprise server. Additionally, a
new config option to more "Ready" events before the Teleport instance is
considered Ready has been added so that enterprise features can add to this
list.
A few of our tests that run certificate rotation fail with an error similar to this:
```
can only switch to phase update_clients from init, the current phase is
```
This happens when a duplicated private key is returned and set as a new key during CA rotation. During the CA rotation, we look for duplicated keys 962e5a25bc/lib/auth/rotate.go (L218) and try to rotate all of them at one. Because some of our tests use still use `testauthority` that only has 4 keys there is a huge chance that the same key will be returned twice, which causes the issue.
I swapped the logic to use a bigger pool with keys introduced in https://github.com/gravitational/teleport/pull/18750. I also removed the randomization part that, after some testing, I discovered was also causing some problems.
We could probably refactor the code in many tests now to make it simpler, but I want to keep this change as simple as possible. Otherwise, backporting bigger changes to older branches takes a long time.
At one point, we had moved the Ansible Server Access guide to the
Machine ID section and added a redirect. Since we restored the Ansible
Server Access guide, the redirect is no longer correct.
* Add a guide to deploying an HA cluster
Closes#16751
This is a general guide that prefaces our HA deployment guides.
Also adds an introduction to the "deploy-a-cluster" section.
* Respond to some PR feedback
- Add mentions of specific services (Auth and Proxy) where it would help
with clarity.
- Edit port tables to remove ports that shouldn't be publicly exposed.
- Clarify the supported backends.
- Clarify that Let's Encrypt is not required for TLS credential
management.
* Respond to more PR feedback
- Add context around DNS records
- Use "Layer 4" instead of "Layer Four"
- Explain optional ports more explicitly
- Fix spacing issues
- Indicate that you can use an S3-compatible object store
- Clarify cert-fetching behavior for applications
- Describe separate port configs depending on whether TLS Routing is
enabled, and add a brief section re: whether to enable TLS Routing
- Expand the Teleport configuration section to accommodate TLS Routing
and separate listeners
* Add more context to the "Deploy a Cluster" intro
* Small language tweaks
* Add an image
* Respond to zmb3 feedback
- Use "cluster state backend" and "session recording backend"
- Mention cross-zone load balancing
- Link to the Backends Reference instead of including example backend
configurations
- Use v3 for example configs
- Fix example config indentation
* Add CTA for Teleport Cloud and forScopes
* DatabaseService resource: client and server CRUD
In the context of Teleport Discover, we must be able to known if there's
any DatabaseService available to proxy a given Database resource.
If there's none available, we will offer a script for the user to run
and install a DatabaseService which proxies the desired Database
resource.
By DatabaseService, we mean the process that Teleport binary manages
when the `teleport.yaml` config has the following section:
```yaml
db_service:
enabled: "yes"
```
To accomplish this we are creating a new resource: DatabaseService.
The UI will fetch all DatabaseServices and check if there's any
ResourceMatcher that matches the DatabaseLabels.
A previous PR created the DatabaseService resource.
This PR creates all the necessary CRUD operations:
- Add Service to manage DatabaseServices resources in the backend.
- Add GetAll, Upsert, Delete and DeleteAll operations to Client.
- Add DatabaseService support to ListResources.
- Add WebAPI endpoint to list DatabaseServices using ListResources.
The next PR will add the heartbeat mechanism to the DatabaseService
process.
Currently only supports a minimal build of teleport on ARM32 and ARM64,
but the intent is to expand it until it becomes the single source of
truth for the linux buildboxes for all supported architectures.
* Fix Flaky TestTerminalRouting tests
Subtests that failed resolution result in the server closing the websocket
prior to the test closing the websocket on cleanup. Thus resulting in
`tls: failed to send closeNotify alert` errors. Failed subtests now
ensure that any errors returned from closing the websocket pass the
`utils.IsOKNetworkError` check.
Wire device authentication into `tsh`, so it attempts to acquire device
certificates after user login. This affects direct logins (`tsh login`),
indirect logins (RetryWithRelogin) and Connect.
If authentication fails (non-Enterprise cluster, device not enrolled, etc) `tsh`
proceeds as usual, but the final user certificate won't contain device
extensions.
gravitational/teleport.e#514
* Describe enabling services in the config reference
Closes#4214
Add a section to the configuration reference indicating which Teleport
services must be enabled/disabled explicitly so they run when Teleport
starts.
The configuration reference wasn't divided into H2s before this, so I
have organized the long Admonition that begins the section into H2s as
well.
* Respond to PR feedback
Co-authored-by: Steven Martin <steven@goteleport.com>
* Add support for GHES joining
* Add tests for GHES ID Token validation
* Add test covering Auth server github join with GHES override
* Use correct ctx
* Name interface method parameters for clarity
* Use "scheme" instead of "proto"
* Improve docs, validation and add tests
Web sessions were dialing with the sever hostname in cases where
the server UUID was known and should have been used which resulted
in #19415. All sessions launched via the "Connect" button from the
UI are guaranteed to use the server UUID. Manual connections via
the UI attempt find a matching host and use the UUID but may fallback
to using the hostname/ip instead if multiple matches are found.
When resolving servers by hostname or address `ListResources` is
now used directly to populate the `SearchKeywords` field to limit
the number of servers returned to only those fuzzily matching the
server. Prior to this **all** servers in the cluster were fetched
and filtered out by the web api.
The error displayed when ambiguous hosts are found has also been
improved from:
```shell
disconnected
err-node-is-ambiguous
```
to the following, which more closely mirrors the error returned by
`tsh`:
```shell
disconnected
error: ambiguous host could match multiple nodes
Hint: try addressing the node by unique id (ex: user@node-id)
```
Closes#19415
In preparation for moving some builds from Drone to GHA, we need some
way for Drone to invoke a workflow in GHA and await the result. Handles
timeouts and workflow inputs as well.
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Anton Miniailo <anton@goteleport.com>
* Add a guide to exporting events to Splunk
Closes#13158
Since there are multiple guides to exporting audit events now, this also
adds a new section of the docs for these guides.
Also fixes a tiny error in the Elastic Stack guide.
* Respond to PR feedback
* Fix linter issues
* Respond to PR feedback
- Use consistent naming
- Mention the `--ttl` flag for `tctl auth sign`
- Mention Machine ID
- Add a quick Troubleshooting section
- Fix `chmod` command
- Add systemd configs (also added this to the Elastic Stack Event
Handler guide. I didn't do this to the Fluentd guide because the
structure of the guide is different from the Splunk guide's).
* Prevent "session.start" from being overwritten by "session.exec"
The `session.exec` event was not being passed through the session
recorder, which resulted in said event having an event index of 0.
This caused the original `session.start` event which also has an
`eid` of 0 to be overwritten by the `session.exec` event.
By emitting the `session.exec` event via the same mechanism as the
`session.start` event it gets a proper event index and no longer
overwrites the `session.start`.
Closes#13622
In a previous PR (#19363) we created a new resource type:
DatabaseService
Its only spec property for now is the ResourceMatcher field
This field should mirror what we offer in the configuration within
`db_service.resurces` from `teleport.yaml`.
Even though, the current implementation of
`types.DatabaseService.ResourceMatchers` is convertible to/from
`services.ResourceMatchers` - because its only field (from the latter)
is a list of labels - we would incur in breaking changes later on if the
`services.ResourceMatchers` got new fields.
This new resource is only in master (not yet released) and the backport
to v11 must include this change to prevent a breaking change from
happening in the future.
* Reduce latency of `tsh ls -R`
Listing nodes across clusters was done one cluster at a time. To
improve latency the same mechanism used by `tsh db ls -R` was copied
to ensure listing happens in parallel with an upper limit.
* Enable nolintlint linter
* Fix nolint comments in the api package
* Fix RDP client comment
* Address review comment
Co-authored-by: Alan Parra <alan.parra@goteleport.com>
* Allow unused for nolintlint linter
* Remove redundant casting
* Add comment on why allowed unused is enabled
Co-authored-by: Alan Parra <alan.parra@goteleport.com>
Co-authored-by: Alan Parra <alan.parra@goteleport.com>
In the context of Discover we must be able to know when a DatabaseService process is running, in order to give feedback to the user.
DatabaseService process is the process that teleport binary manages when the `teleport.yaml` has
```yaml
db_service:
enabled: "yes"
```
To do so, we are creating a new resource: DatabaseService.
It will be similar to the way DatabaseServers work, using the same heartbeat mechanism.
To ease the review, we'll only add the RPC calls in this PR. Following PRs will:
- CRUD management over rpc
- DatabaseService hearthbeat
- webapi endpoint
Part of #19032