Commit graph

160 commits

Author SHA1 Message Date
Sasha Klizhentas 8356ae6a74 Use in-memory cache for the auth server API.
This commit expands the usage of the caching layer
for auth server API:

* Introduces in-memory cache that is used to serve all
Auth server API requests. This is done to achieve scalability
on 10K+ node clusters, where each node fetches certificate authorities,
roles, users and join tokens. It is not possible to scale
DynamoDB backend or other backends on 10K reads per seconds
on a single shard or partition. The solution is to introduce
an in-memory cache of the backend state that is always used
for reads.

* In-memory cache has been expanded to support all resources
required by the auth server.

* Experimental `tctl top` command has been introduced to display
common single node metrics.

Replace SQLite Memory Backend with BTree

SQLite in memory backend was suffering from
high tail latencies under load (up to 8 seconds
in 99.9%-ile on load configurations).

This commit replaces the SQLite memory caching
backend with in-memory BTree backend that
brought down tail latencies to 2 seconds (99.9%-ile)
and brought overall performance improvement.
2019-04-12 14:23:09 -07:00
Russell Jones ae074ede36 Always validate certificate (or key) algorithm.
Added utils.CertChecker that wraps a ssh.CertChecker. The new
certificate checker first checks if the certificate is a valid
certificate for Teleport. At the moment that is 2048-bit RSA then calls
the underlying certificate checker to perform the requested validation.
2019-03-19 17:47:53 -07:00
Russell Jones ac9af87dfb Emit data transfer events.
Created *utils.TrackingConn that wraps the server side net.Conn and is
used to track how much data is transmitted and received over the
net.Conn. At the close of a connection (close of a *srv.ServerContext)
the total data transmitted and received is emitted to the Audit Log.
2019-03-08 19:22:20 +00:00
Russell Jones 7a62b25921 Validate host certificates in both tsh as well as the recording proxy.
Add IP addresses to host certificate.
2018-12-12 16:33:03 -08:00
Russell Jones ec7e53370d Fix formatting issues. 2018-10-19 16:25:01 -07:00
Sasha Klizhentas a451d626cd Add fast unmarshal and skip schema validation 2018-09-28 11:00:36 -07:00
Sasha Klizhentas b7242e2ad7 Faster implementation for checking host cert.
This commit improves checking of the host
certificate by reducing the amount of times
reverse tunnel polls the database and
not feching all certificate authorities at once.
2018-09-28 11:00:36 -07:00
Sasha Klizhentas e84bf10889 Batch get for tunnel connection, remote cluster
Use batch fetch for tunnel connections
and remote cluster objects to speed up
teleport in scenarios with many trusted clusters.
2018-09-28 11:00:36 -07:00
Sasha Klizhentas 02a33675ed Detect remote cluster by SNI name
This commit improves performance of teleport with
hundreds of connected trusted clusters.

TLS handshake protocol expects server to send a
list of trusted certificate authorities to the client
and client must present certificate signed by those.

With Teleport current implementation, every remote cluster
client is signed by local certificate and is not cross
signed.

Auth server now expects clients to announce the
remote cluster they are connecting from using SNI.

Auth server will send only certificate authorities
of the cluster announced via SNI.

Alternative idea is to cross sign the certificate
of the client of the remote cluster. We will explore
this idea in the next releases.

This commit also removes unnecessary reads
from the database to check the remote server status
that slows down user interface and other clients.

This is done at the expense of proxies showing
servers as offline in case if this individual
proxy does not have the connection, although
it's a small UI price to pay for not reading
the database, as proxy will eventually
get the connection thanks to the discovery
protocol.
2018-09-28 11:00:36 -07:00
Sasha Klizhentas 08ac5959f4 Remove verbose line
This commit fixes #2218
2018-09-14 13:22:09 -07:00
Sasha Klizhentas 3fd997bc92 Consider HTTP proxy setting in reverse tunnel 2018-08-31 14:38:13 -07:00
Sasha Klizhentas ff5cfc6b43 Add support for no_proxy environment variable
NO_PROXY or no_proxy environment variables
can be set to override variable HTTP_PROXY
or HTTPS_PROXY. Current implementation
is taken from Go standard library.
2018-08-20 15:53:19 -07:00
Sasha Klizhentas 1f3b4e2c96 Kubernetes configuration, fetch proxy settings.
This commit moves proxy kubernetes configuration
to a separate nested block to provide more fine
grained settings:

```yaml
auth:
  kubernetes_ca_cert_path: /tmp/custom-ca
proxy:
  enabled: yes
  kubernetes:
    enabled: yes
    public_addr: [custom.example.com:port]
    api_addr: kuberentes.example.com:443
    listen_addr: localhost:3026
```

1. Kubernetes config section is explicitly enabled
and disabled. It is disabled by default.

2. Public address in kubernetes section
is propagated to tsh profile

The other part of the commit updates Ping
endpoint to send proxy configuration back to
the client, including kubernetes public address
and ssh listen address.

Clients updates profile accordingly to configuration
received from the proxy.
2018-08-06 11:57:36 -07:00
Sasha Klizhentas 031168bbd4 Add readyz endpoint and clusters metrics.
This commit fixes #1610.

New readyz endpoint is added to existing
/metrics and /healthz endpoints activated by
diag addr-flag:

`teleport start --diag-addr=127.0.0.1:1234`

Readyz endpoint will report 503 if node or
proxy failed to connect to the cluster and 200 OK
otherwise.

Additional prometheus gagues report connection
count for trusted and remote clusters:

```
remote_clusters{cluster="one"} 1
remote_clusters{cluster="two"} 1

trusted_clusters{cluster="one",state="connected"} 0
trusted_clusters{cluster="one",state="connecting"} 0
trusted_clusters{cluster="one",state="disconnected"} 0
trusted_clusters{cluster="one",state="discovered"} 1
trusted_clusters{cluster="one",state="discovering"} 0
```
2018-07-20 19:01:15 -07:00
Sasha Klizhentas 66fa34bcde Add framework for trusted cluster K8s access 2018-06-22 12:56:58 -07:00
Russell Jones a62102c3e8 Add ability to detect when a proxy has been removed forever to discovery
protocol.
2018-06-21 23:14:52 +00:00
Sasha Klizhentas cece4be212 Initial implementation of Kubernetes support
This issue updates #1986.

This is intial, experimental implementation that will
be updated with tests and edge cases prior to production 2.7.0 release.

Teleport proxy adds support for Kubernetes API protocol.
Auth server uses Kubernetes API to receive certificates
issued by Kubernetes CA.

Proxy intercepts and forwards API requests to the Kubernetes
API server and captures live session traffic, making
recordings available in the audit log.

Tsh login now updates kubeconfig configuration to use
Teleport as a proxy server.
2018-06-03 12:55:13 -07:00
Russell Jones 367f1572d0 Set ciphers in reversetunnel server. 2018-06-01 21:31:46 +00:00
Sasha Klizhentas 3e144cb900 Teleport certificate authority rotation.
This commit implements #1860

During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.

The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.

Operator can launch CA rotation in auto or manual modes.

In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.

In auto mode state transitions are happening automatically
on a specified schedule.

The design documentation is embedded in the code:

lib/auth/rotate.go
2018-04-30 12:58:57 -07:00
Russell Jones 6be8af16c5 Removed depreciated code and re-factored tests to use
golang.org/x/crypto.
2018-04-05 23:14:20 +00:00
Russell Jones bfb4c41891 Refactor code use updated interfaces for golang.org/x/crypto/ssh. 2018-04-05 22:49:46 +00:00
Sasha Klizhentas bad1b0498d External events and sessions storage.
Updates #1755

Design
------

This commit adds support for pluggable events and
sessions recordings and adds several plugins.

In case if external sessions recording storage
is used, nodes or proxies depending on configuration
store the session recordings locally and
then upload the recordings in the background.

Non-print session events are always sent to the
remote auth server as usual.

In case if remote events storage is used, auth
servers download recordings from it during playbacks.

DynamoDB event backend
----------------------

Transient DynamoDB backend is added for events
storage. Events are stored with default TTL of 1 year.

External lambda functions should be used
to forward events from DynamoDB.

Parameter audit_table_name in storage section
turns on dynamodb backend.

The table will be auto created.

S3 sessions backend
-------------------

If audit_sessions_uri is specified to s3://bucket-name
node or proxy depending on recording mode
will start uploading the recorded sessions
to the bucket.

If the bucket does not exist, teleport will
attempt to create a bucket with versioning and encryption
turned on by default.

Teleport will turn on bucket-side encryption for the tarballs
using aws:kms key.

File sessions backend
---------------------

If audit_sessions_uri is specified to file:///folder
teleport will start writing tarballs to this folder instead
of sending records to the file server.

This is helpful for plugin writers who can use fuse or NFS
mounted storage to handle the data.

Working dynamic configuration.
2018-03-15 12:42:43 -07:00
Russell Jones b139f72cab Create single instance of keygen per process. Use cache of precomputed
certificates when using recording proxy.
2018-02-15 21:23:30 +00:00
Sasha Klizhentas 68b65f5b24 Teleport signal handling and live reload.
This commit introduces signal handling.
Parent teleport process is now capable of forking
the child process and passing listeners file descriptors
to the child.

Parent process then can gracefully shutdown
by tracking the amount of current connections and
closing listeners once the amount goes to 0.

Here are the signals handled:

* USR2 signal will cause the parent to fork
a child process and pass listener file descriptors to it.
Child process will close unused file descriptors
and will bind to the used ones.

At this moment two processes - the parent
and the forked child process will be serving requests.
After looking at the traffic and the log files,
administrator can either shut down the parent process
or the child process if the child process is not functioning
as expected.

* TERM, INT signals will trigger graceful process shutdown.
Auth, node and proxy processes will wait until the amount
of active connections goes down to 0 and will exit after that.

* KILL, QUIT signals will cause immediate non-graceful
shutdown.

* HUP signal combines USR2 and TERM signals in a convenient
way: parent process will fork a child process and
self-initate graceful shutdown. This is a more convenient
than USR2/TERM sequence, but less agile and robust
as if the connection to the parent process drops, but
the new process exits with error, administrators
can lock themselves out of the environment.

Additionally, boltdb backend has to be phased out,
as it does not support read/writes by two concurrent
processes. This had required refactoring of the dir
backend to use file locking to allow inter-process
collaboration on read/write operations.
2018-02-13 15:18:47 -08:00
Russell Jones 6ef46821d0 During proxy transport, if not remote Auth Servers are found, log a
warning and exit.
2018-01-17 21:30:24 +00:00
Sasha Klizhentas b82336ae06 Use gzip for session recordings, fixes #1579
* Session recordings are created with gzip compression.
* Migration compresses old recordings and converts to new format.
2018-01-15 13:34:01 -08:00
Russell Jones 1c65651658 In-memory forwarding servers now have a random server_id. 2018-01-12 19:33:23 +00:00
Sasha Klizhentas ef473d809e Join address for web, reverse tunnel, fixes #1544
Support configuration for web and reverse tunnel
proxies to listen on the same port.

* Default config are not changed for backwards compatibility.
* If administrator configures web and reverse tunnel
addresses to be on the same port, multiplexing is turned on
* In trusted clusters configuration reverse_tunnel_addr
defaults to web_addr.
2018-01-05 16:20:56 -08:00
Sasha Klizhentas 71c15e5835 Add support for NFS-friendly log protocol.
* Session events are delivered in continuous
batches in a guaranteed order with every event
and print event ordered from session start.

* Each auth server writes to a separate folder
on disk to make sure that no two processes write
to the same file at a time.

* When retrieving sessions, auth servers fetch
and merge results recorded by each auth server.

* Migrations and compatibility modes are in place
for older clients not aware of the new format,
but compatibility mode is not NFS friendly.

* On disk migrations are launched automatically
during auth server upgrades.
2018-01-04 18:54:37 -08:00
Sasha Klizhentas e114fbd46c Add support for remote_cluster, implements #1526
This commit adds remote cluster resource that specifies
connection and trust of the remote trusted cluster to the local
cluster. Deleting remote cluster resource deletes trust
established between clusters on the local cluster side
and terminates all reverse tunnel connections.

Migrations make sure that remote cluster resources exist
after upgrade of the auth server.
2017-12-28 17:48:30 -08:00
Sasha Klizhentas 0130c6aa41 Mutual TLS Auth server and clients.
This commit introduced mutual TLS authentication
for auth server API server.

Auth server multiplexes HTTP over SSH - existing
protocol and HTTP over TLS - new protocol
on the same listening socket.

Nodes and users authenticate with 2.5.0 Teleport
using TLS mutual TLS except backwards-compatibility
cases.
2017-12-27 11:37:19 -08:00
Russell Jones 3bfe61dc0b Added integration tests and minor fixes. 2017-12-19 17:40:05 -08:00
Russell Jones a56b0870a7 Added the ability to generate host certificates to tctl. 2017-12-09 19:37:49 +00:00
Russell Jones 37ab1596c4 Updated reverse tunnel to allow use to forwarding server. 2017-12-09 19:29:20 +00:00
mricher b58cb051e8
Correct various typos
This was fixed running the `misspell` linter in fix mode using
`gometalinter`. The exact command I ran was :
```
gometalinter --vendor --disable-all -E misspell --linter='misspell:misspell -w {path}:^(?P<path>.*?\.go):(?P<line>\d+):(?P<col>\d+):\s*(?P<message>.*)$' ./...
```

Some typo were fixed by hand on top of it.
2017-10-20 10:20:26 +02:00
Sasha Klizhentas 7b87c73f6b fix cluster name fix 2017-10-19 00:36:32 +00:00
Russell Jones 3634291bd9 Add ClusterName to discovery request. 2017-10-19 00:36:03 +00:00
Sasha Klizhentas 039249507d update according to code review comments 2017-10-13 19:26:49 -07:00
Sasha Klizhentas 4b36d77f31 remove data race on channel close 2017-10-13 10:21:10 -07:00
Sasha Klizhentas 6471bc32da fix data race 2017-10-13 09:11:13 -07:00
Sasha Klizhentas b2ed270bb6 fix data race and update lock file digest 2017-10-12 17:38:58 -07:00
Sasha Klizhentas e461b4e6bd fix tests 2017-10-12 16:51:18 -07:00
Sasha Klizhentas 7b82e31150 add fast and slow pace tickers 2017-10-11 17:23:03 -07:00
Sasha Klizhentas e82ac5601a tweak the docs 2017-10-10 09:26:33 -07:00
Sasha Klizhentas aa62a1d627 document the discovery algo 2017-10-09 19:59:14 -07:00
Sasha Klizhentas f12024031a more work on logging and stats 2017-10-09 18:58:24 -07:00
Sasha Klizhentas a55116dd00 fixes before revendoring 2017-10-09 15:56:18 -07:00
Sasha Klizhentas eb4cfa12d9 refactoring complete 2017-10-08 23:07:39 -07:00
Sasha Klizhentas bb5f77854e before refactoring 2017-10-08 18:07:01 -07:00
Sasha Klizhentas d3f05872cc fix some backend problems 2017-10-08 12:42:16 -07:00