Flaky tests in teleport integration suite uncovered a problem.
It is possible that main cluster rotates certificate authority,
and will try to dial to the remote cluster with new credentials
before the remote cluster could fetch the new CA to trust.
To fix this, phase "update_clients" was split in two phases:
* Init and Update clients
Init phase does nothing on the main cluster except generating
new certificate authorities, that are trusted but not used in the
cluster.
This phase exists to give remote clusters opporunity
to update the list of trusted certificate authorities
of the main cluster, before main cluster reconnects with new clients
in "Update clients" phase.
Rotation on CA could not be performed on data migrated
from 2.5 versions, because they do not have rotation
property.
This commit fixes the problem by making rotation
property optional.
* Cache services.ClusterConfig within srv.ServerContext for the duration
of a connection.
* Create a single websocket between the browser and the proxy for all
* terminal bytes and events.
This commit fixes#1741
* If bolt backend was used as a default,
new teleport continues using it as a default to prevent
regressions on start.
* Otherwise, dir backend is used as a default.
This commit fixes#1803, fixes#1889
* Adds support for public_addr for Proxy and Auth
* Parameter advertise_ip now supports host:port format
* Fixes incorrect output for tctl get proxies
* Fixes duplicate output of some error messages.
In case if multiple requests to get session
event data were issued to the auth server
at the same time, multiple download requests
were originated, and sometimes partial data
was returned.
This commit serializes downloads of the session
in the context of the same auth server.
This commit implements #1860
During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.
The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.
Operator can launch CA rotation in auto or manual modes.
In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.
In auto mode state transitions are happening automatically
on a specified schedule.
The design documentation is embedded in the code:
lib/auth/rotate.go
Fixes#1836
When events.DiskSessionLogger.Finalize() is called
twice, the panic was happening.
At the same time it turns out the old buffering
logic is obsolete, as teleport always writes to disk,
so it was removed.
This fixes the race with systemd reload.
P - parent, C - child
During live reload scenario,
the following happens:
P -> forks C
P -> blocks on pipe read
C -> writes to pipe
C -> writes pid file
P < - reads message from pipe
P <- shuts down
However, there is a race:
P -> forks C
P -> blocks on pipe read
C -> writes to pipe
P < - reads message from pipe
P <- shuts down
C -> writes pid file
In this case parent process exited
before child process writes new pid file
what makes systemd think that main process
is down and stop both processes.
This fix changes the sequence to:
P -> forks C
P -> blocks on pipe read
C -> writes pid file
C -> writes to pipe
P < - reads message from pipe
P <- shuts down
to make sure the race can't happen any more.
This commit allows teleport parent process to track
the status of the forked child process using os.Pipe.
Child process signals success to parent process by writing
to Pipe.
This allows HUP and USR2 to be more intelligent as they
can now detect the failure or success of the process.
This PR improves session recording:
* Nodes and proxies always buffer recorded sessions
to disk during the session what improves performance
and makes the recording more resilient to network failures.
* Async uploader running on proxy or node always uploads the
session tarball to the audit log server.
* Audit log server is the only component uploading
to the S3 or any other API.
ignore (Go's runtime respects SIG_IGN, btw, by not setting a handler).
If the handler is reset unconditionally, no Go code can ask to be
notified of Interrupt signal as the system default handler obviously
knows nothing about Go code.
fixes#1785, fixes#1776
This commit fixes several issues with output:
First teleport start now prints output
matching quickstart guide and sets default
console logging to ERROR.
SIGCHLD handler now only collects
processes PID forked during live restart
to avoid confusing other wait calls that
have no process status to collect any more.
Large directories with on disk recordings
take a lot of time to migrate, this patch
makes the operation async.
Do not use modification time for audit log
search, replace it with file name parsing.
Updates #1755
Design
------
This commit adds support for pluggable events and
sessions recordings and adds several plugins.
In case if external sessions recording storage
is used, nodes or proxies depending on configuration
store the session recordings locally and
then upload the recordings in the background.
Non-print session events are always sent to the
remote auth server as usual.
In case if remote events storage is used, auth
servers download recordings from it during playbacks.
DynamoDB event backend
----------------------
Transient DynamoDB backend is added for events
storage. Events are stored with default TTL of 1 year.
External lambda functions should be used
to forward events from DynamoDB.
Parameter audit_table_name in storage section
turns on dynamodb backend.
The table will be auto created.
S3 sessions backend
-------------------
If audit_sessions_uri is specified to s3://bucket-name
node or proxy depending on recording mode
will start uploading the recorded sessions
to the bucket.
If the bucket does not exist, teleport will
attempt to create a bucket with versioning and encryption
turned on by default.
Teleport will turn on bucket-side encryption for the tarballs
using aws:kms key.
File sessions backend
---------------------
If audit_sessions_uri is specified to file:///folder
teleport will start writing tarballs to this folder instead
of sending records to the file server.
This is helpful for plugin writers who can use fuse or NFS
mounted storage to handle the data.
Working dynamic configuration.