Flaky tests in teleport integration suite uncovered a problem.
It is possible that main cluster rotates certificate authority,
and will try to dial to the remote cluster with new credentials
before the remote cluster could fetch the new CA to trust.
To fix this, phase "update_clients" was split in two phases:
* Init and Update clients
Init phase does nothing on the main cluster except generating
new certificate authorities, that are trusted but not used in the
cluster.
This phase exists to give remote clusters opporunity
to update the list of trusted certificate authorities
of the main cluster, before main cluster reconnects with new clients
in "Update clients" phase.
Rotation on CA could not be performed on data migrated
from 2.5 versions, because they do not have rotation
property.
This commit fixes the problem by making rotation
property optional.
* Cache services.ClusterConfig within srv.ServerContext for the duration
of a connection.
* Create a single websocket between the browser and the proxy for all
* terminal bytes and events.
This commit fixes#1741
* If bolt backend was used as a default,
new teleport continues using it as a default to prevent
regressions on start.
* Otherwise, dir backend is used as a default.
This commit fixes#1803, fixes#1889
* Adds support for public_addr for Proxy and Auth
* Parameter advertise_ip now supports host:port format
* Fixes incorrect output for tctl get proxies
* Fixes duplicate output of some error messages.
In case if multiple requests to get session
event data were issued to the auth server
at the same time, multiple download requests
were originated, and sometimes partial data
was returned.
This commit serializes downloads of the session
in the context of the same auth server.
This commit implements #1860
During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.
The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.
Operator can launch CA rotation in auto or manual modes.
In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.
In auto mode state transitions are happening automatically
on a specified schedule.
The design documentation is embedded in the code:
lib/auth/rotate.go
Fixes#1836
When events.DiskSessionLogger.Finalize() is called
twice, the panic was happening.
At the same time it turns out the old buffering
logic is obsolete, as teleport always writes to disk,
so it was removed.
This fixes the race with systemd reload.
P - parent, C - child
During live reload scenario,
the following happens:
P -> forks C
P -> blocks on pipe read
C -> writes to pipe
C -> writes pid file
P < - reads message from pipe
P <- shuts down
However, there is a race:
P -> forks C
P -> blocks on pipe read
C -> writes to pipe
P < - reads message from pipe
P <- shuts down
C -> writes pid file
In this case parent process exited
before child process writes new pid file
what makes systemd think that main process
is down and stop both processes.
This fix changes the sequence to:
P -> forks C
P -> blocks on pipe read
C -> writes pid file
C -> writes to pipe
P < - reads message from pipe
P <- shuts down
to make sure the race can't happen any more.
This commit allows teleport parent process to track
the status of the forked child process using os.Pipe.
Child process signals success to parent process by writing
to Pipe.
This allows HUP and USR2 to be more intelligent as they
can now detect the failure or success of the process.
This PR improves session recording:
* Nodes and proxies always buffer recorded sessions
to disk during the session what improves performance
and makes the recording more resilient to network failures.
* Async uploader running on proxy or node always uploads the
session tarball to the audit log server.
* Audit log server is the only component uploading
to the S3 or any other API.
ignore (Go's runtime respects SIG_IGN, btw, by not setting a handler).
If the handler is reset unconditionally, no Go code can ask to be
notified of Interrupt signal as the system default handler obviously
knows nothing about Go code.
fixes#1785, fixes#1776
This commit fixes several issues with output:
First teleport start now prints output
matching quickstart guide and sets default
console logging to ERROR.
SIGCHLD handler now only collects
processes PID forked during live restart
to avoid confusing other wait calls that
have no process status to collect any more.