This commit implements #1860
During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.
The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.
Operator can launch CA rotation in auto or manual modes.
In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.
In auto mode state transitions are happening automatically
on a specified schedule.
The design documentation is embedded in the code:
lib/auth/rotate.go
Updates #1755
Design
------
This commit adds support for pluggable events and
sessions recordings and adds several plugins.
In case if external sessions recording storage
is used, nodes or proxies depending on configuration
store the session recordings locally and
then upload the recordings in the background.
Non-print session events are always sent to the
remote auth server as usual.
In case if remote events storage is used, auth
servers download recordings from it during playbacks.
DynamoDB event backend
----------------------
Transient DynamoDB backend is added for events
storage. Events are stored with default TTL of 1 year.
External lambda functions should be used
to forward events from DynamoDB.
Parameter audit_table_name in storage section
turns on dynamodb backend.
The table will be auto created.
S3 sessions backend
-------------------
If audit_sessions_uri is specified to s3://bucket-name
node or proxy depending on recording mode
will start uploading the recorded sessions
to the bucket.
If the bucket does not exist, teleport will
attempt to create a bucket with versioning and encryption
turned on by default.
Teleport will turn on bucket-side encryption for the tarballs
using aws:kms key.
File sessions backend
---------------------
If audit_sessions_uri is specified to file:///folder
teleport will start writing tarballs to this folder instead
of sending records to the file server.
This is helpful for plugin writers who can use fuse or NFS
mounted storage to handle the data.
Working dynamic configuration.
This commit introduces signal handling.
Parent teleport process is now capable of forking
the child process and passing listeners file descriptors
to the child.
Parent process then can gracefully shutdown
by tracking the amount of current connections and
closing listeners once the amount goes to 0.
Here are the signals handled:
* USR2 signal will cause the parent to fork
a child process and pass listener file descriptors to it.
Child process will close unused file descriptors
and will bind to the used ones.
At this moment two processes - the parent
and the forked child process will be serving requests.
After looking at the traffic and the log files,
administrator can either shut down the parent process
or the child process if the child process is not functioning
as expected.
* TERM, INT signals will trigger graceful process shutdown.
Auth, node and proxy processes will wait until the amount
of active connections goes down to 0 and will exit after that.
* KILL, QUIT signals will cause immediate non-graceful
shutdown.
* HUP signal combines USR2 and TERM signals in a convenient
way: parent process will fork a child process and
self-initate graceful shutdown. This is a more convenient
than USR2/TERM sequence, but less agile and robust
as if the connection to the parent process drops, but
the new process exits with error, administrators
can lock themselves out of the environment.
Additionally, boltdb backend has to be phased out,
as it does not support read/writes by two concurrent
processes. This had required refactoring of the dir
backend to use file locking to allow inter-process
collaboration on read/write operations.
Support configuration for web and reverse tunnel
proxies to listen on the same port.
* Default config are not changed for backwards compatibility.
* If administrator configures web and reverse tunnel
addresses to be on the same port, multiplexing is turned on
* In trusted clusters configuration reverse_tunnel_addr
defaults to web_addr.
This commit introduced mutual TLS authentication
for auth server API server.
Auth server multiplexes HTTP over SSH - existing
protocol and HTTP over TLS - new protocol
on the same listening socket.
Nodes and users authenticate with 2.5.0 Teleport
using TLS mutual TLS except backwards-compatibility
cases.
Instead of quietly changing behavior because `DEBUG` envar was set to
true, Teleport now explicitly requires scary --insecure flag to enable
this behavior.
This is to support Teleconsole/Telecast features, namely:
- When a user is added programmatically, it's actually returned.
- When a server is being created, it will not create users if
they exist already, instead it will just sign their public keys
Teleport configuration now has a new field: NoAudit (false by default,
which means audit is always on).
When this option is set, Teleport will not record events and will not
record sessions.
It's implemented by adding "DiscardLogger" which implements the same
interface as teh real logger, and it's plugged into the system instead.
NOTE: this option is not exposed in teleport in any way: no config file,
no switch, etc. I quickly needed it for Telecast.