Commit graph

100 commits

Author SHA1 Message Date
Andrew Lytvynov 0a03d3b70d Ensure that all integration.TeleInstance processes get cleaned up
TeleInstance manages an auth server and a set of proxies/nodes.
TeleInstance.Stop only stops the auth server. A bunch of tests used it
assuming it also cleans up any running nodes.
This has caused a lot of log spam from failing heartbeats and generally
wasted CPU cycles.

Rename it to Stop to StopAuth to make it's purpose more obvious. Add
TeleInstance.StopAll that cleans up everything, suitable for deferring
in tests.
2020-04-17 21:25:47 +00:00
Andrew Lytvynov d1ea40d074 Enable linters: deadcode,goimports,govet,typecheck
And fix the relevant findings for these linters.

Also, set extra flags for `golangci-lint run` to make sure no findings
are suppressed.
2020-04-17 17:46:51 +00:00
Andrew Lytvynov b994920aa8
Add make rules for linting (#3563)
Top-level `make lint` rule that scans everything and a CI-specific rule
for Jenkins.
Currently only enable "unused", since it's reliable. The list will
expand.

Also clean up stragglers that somehow slipped through in #3552.

Updates #3551
2020-04-10 11:37:09 -07:00
Vladimir Kochnev 690052ec13 Expose auth server in integration tests.
For plugins to connect to auth server, its public_addr should be set.
2020-01-30 11:21:10 -08:00
Sasha Klizhentas 83d0f7e7bb Fix role mapping for trusted clusters
This commit fixes #3252

Security patches 4.2 introduced a regression - leaf clusters ignore role mapping
and attempt to use role names coming from identity of the root cluster
whenever GetNodes method was used.

This commit reverts back the logic, however it ensures that the original
fix is preserved - traits and groups are updated on the user object.

Integration test has been extended to avoid the regression in the future.
2020-01-15 12:57:33 -08:00
Russell Jones 9135a5ade7 Use roles and traits in certificate for RBAC.
If an attacker can force a username change at an IdP, upon second login,
the services.User object of the original user can be updated with new
roles and traits. If these new roles and traits differ, the original
user can have their privileges raised (or lowered).

To mitigate this, encode roles and traits within the certificate and use
these when fetching roles to make RBAC decisions. If roles and traits are
not encoded within an certificate (for example for old style SSH
certificates then fallback to using the services.User object and log a
warning.
2019-09-03 13:44:20 -07:00
Sasha Klizhentas 838e75468c Add support for ProxyJump.
This commit implements #2543

In SSH terms ProxyJump is a shortcut for SSH client
connecting the proxy/jumphost and requesting .port forwarding to the
target node.

This commit adds support for direct-tcpip request support
in teleport proxy service that is an alias to the existing proxy
subsystem and reuses most of the code.

This commit also adds support to "route to cluster" metadata
encoded in SSH certificate making it possible to have client
SSH certificates to include the metadata that will cause the proxy
to route the client requests to a specific cluster.

`tsh ssh -J proxy:port ` is supported in a limited way:

Only one jump host is supported (-J supports chaining
that teleport does not utilise) and tsh will return with error
in case of two jumphosts: -J a,b will not work.

In case if `tsh ssh -J user@proxy` is used, it overrides
the SSH proxy coming from the tsh profile and port-forwarding
is used instead of the existing teleport proxy subsystem
2019-07-26 10:58:11 -07:00
Forrest Marshall cda5a39d28 Add TLS certs to user identity files
- Updates the identity files exported by `tctl auth sign` to include the
user's TLS certificate, as well as the set of available TLS root CA
certs.

- Adds a new GRPC-based auth server method, `GenerateUserCerts`, which
exports both SSH and TLS based certs.
2019-07-03 12:54:03 -07:00
Russell Jones 1bf2bbc5fd Configurable timeout for tunnel offline threshold.
Added support for a configurable offline threshold based off the keep
alive interval and max count for marking a connection from an agent as
offline.
2019-05-22 09:36:18 -07:00
Sasha Klizhentas e3ca4df5fc Simplify IOT reverse tunnel logic.
In case of IOT (whenever teleport nodes are
connecting to the proxy), there is no need
to create ReverseTunnel objects in the backend,
as there is always one reverse tunnel per node.

This commit removes the logic that created
reverse tunnel object in the backed in IOT cases
and refactors some other parts of the code.
2019-05-03 10:51:06 -07:00
Russell Jones 6d1c16f745 Added support for nodes dialing back to cluster.
Updated services.ReverseTunnel to support type (proxy or node). For
proxy types, which represent trusted cluster connections, when a
services.ReverseTunnel is created, it's created on the remote side with
name /reverseTunnels/example.com. For node types, services.ReverseTunnel
is created on the main side as /reverseTunnels/{nodeUUID}.clusterName.

Updated services.TunnelConn to support type (proxy or node). For proxy
types, which represent trusted cluster connections, tunnel connections
are created on the main side under
/tunnelConnections/remote.example.com/{proxyUUID}-remote.example.com.
For nodes, tunnel connections are created on the main side under
/tunnelConnections/example.com/{proxyUUID}-example.com. This allows
searching for tunnel connections by cluster then allows easily creating
a set of proxies that are missing matching services.TunnelConn.

The reverse tunnel server has been updated to handle heartbeats from
proxies as well as nodes. Proxy heartbeat behavior has not changed.
Heartbeats from nodes now add remote connections to the matching local
site. In addition, the reverse tunnel server now proxies connection to
the Auth Server for requests that are already authenticated (a second
authentication to the Auth Server is required).

For registration, nodes try and connect to the Auth Server to fetch host
credentials. Upon failure, nodes now try and fallback to fetching host
credentials from the web proxy.

To establish a connection to an Auth Server, nodes first try and connect
directly, and if the connection fails, fallback to obtaining a
connection to the Auth Server through the reverse tunnel. If a
connection is established directly, node startup behavior has not
changed. If a node establishes a connection through the reverse tunnel,
it creates an AgentPool that attempts to dial back to the cluster and
establish a reverse tunnel.

When nodes heartbeat, they also heartbeat if they are connected directly
to the cluster or through a reverse tunnel. For nodes that are connected
through a reverse tunnel, the proxy subsystem now directs the reverse
tunnel server to establish a connection through the reverse tunnel
instead of directly.

When sending discovery requests, the domain field has been replaced with
tunnelID. The tunnelID field is either the cluster name (same as before)
for proxies, or {nodeUUID}.example.com for nodes.
2019-04-26 15:41:45 -07:00
Sasha Klizhentas 8356ae6a74 Use in-memory cache for the auth server API.
This commit expands the usage of the caching layer
for auth server API:

* Introduces in-memory cache that is used to serve all
Auth server API requests. This is done to achieve scalability
on 10K+ node clusters, where each node fetches certificate authorities,
roles, users and join tokens. It is not possible to scale
DynamoDB backend or other backends on 10K reads per seconds
on a single shard or partition. The solution is to introduce
an in-memory cache of the backend state that is always used
for reads.

* In-memory cache has been expanded to support all resources
required by the auth server.

* Experimental `tctl top` command has been introduced to display
common single node metrics.

Replace SQLite Memory Backend with BTree

SQLite in memory backend was suffering from
high tail latencies under load (up to 8 seconds
in 99.9%-ile on load configurations).

This commit replaces the SQLite memory caching
backend with in-memory BTree backend that
brought down tail latencies to 2 seconds (99.9%-ile)
and brought overall performance improvement.
2019-04-12 14:23:09 -07:00
Russell Jones ae074ede36 Always validate certificate (or key) algorithm.
Added utils.CertChecker that wraps a ssh.CertChecker. The new
certificate checker first checks if the certificate is a valid
certificate for Teleport. At the moment that is 2048-bit RSA then calls
the underlying certificate checker to perform the requested validation.
2019-03-19 17:47:53 -07:00
Russell Jones 7a62b25921 Validate host certificates in both tsh as well as the recording proxy.
Add IP addresses to host certificate.
2018-12-12 16:33:03 -08:00
Sasha Klizhentas f40df845db Events and GRPC API
This commit introduces several key changes to
Teleport backend and API infrastructure
in order to achieve scalability improvements
on 10K+ node deployments.

Events and plain keyspace
--------------------------

New backend interface supports events,
pagination and range queries
and moves away from buckets to
plain keyspace, what better aligns
with DynamoDB and Etcd featuring similar
interfaces.

All backend implementations are
exposing Events API, allowing
multiple subscribers to consume the same
event stream and avoid polling database.

Replacing BoltDB, Dir with SQLite
-------------------------------

BoltDB backend does not support
having two processes access the database at the
same time. This prevented Teleport
using BoltDB backend to be live reloaded.

SQLite supports reads/writes by multiple
processes and makes Dir backend obsolete
as SQLite is more efficient on larger collections,
supports transactions and can detect data
corruption.

Teleport automatically migrates data from
Bolt and Dir backends into SQLite.

GRPC API and protobuf resources
-------------------------------

GRPC API has been introduced for
the auth server. The auth server now serves both GRPC
and JSON-HTTP API on the same TLS socket and uses
the same client certificate authentication.

All future API methods should use GRPC and HTTP-JSON
API is considered obsolete.

In addition to that some resources like
Server and CertificateAuthority are now
generated from protobuf service specifications in
a way that is fully backward compatible with
original JSON spec and schema, so the same resource
can be encoded and decoded from JSON, YAML
and protobuf.

All models should be refactored
into new proto specification over time.

Streaming presence service
--------------------------

In order to cut bandwidth, nodes
are sending full updates only when changes
to labels or spec have occured, otherwise
new light-weight GRPC keep alive updates are sent
over to the presence service, reducing
bandwidth usage on multi-node deployments.

In addition to that nodes are no longer polling
auth server for certificate authority rotation
updates, instead they subscribe to event updates
to detect updates as soon as they happen.

This is a new API, so the errors are inevitable,
that's why polling is still done, but
on a way slower rate.
2018-12-10 17:20:24 -08:00
Cove Schneider 8b299e9c28 spelling cleanup 2018-11-15 12:44:51 -08:00
Russell Jones 97074076cb Split public_addr into web_proxy_addr and ssh_proxy_addr. 2018-08-31 16:33:54 -07:00
Russell Jones 3d9c34f1f0 Don't pass and clone client *tls.Config, instead pass cipher suites and
create new *tls.Config. Add test coverage for this.
2018-08-21 17:09:57 -07:00
Russell Jones 617b35128c Made GetNodes identity aware to only return nodes which that user has
access to. In debug mode, made RBAC failures more verbose.
2018-08-20 16:10:35 -07:00
Sasha Klizhentas 1f3b4e2c96 Kubernetes configuration, fetch proxy settings.
This commit moves proxy kubernetes configuration
to a separate nested block to provide more fine
grained settings:

```yaml
auth:
  kubernetes_ca_cert_path: /tmp/custom-ca
proxy:
  enabled: yes
  kubernetes:
    enabled: yes
    public_addr: [custom.example.com:port]
    api_addr: kuberentes.example.com:443
    listen_addr: localhost:3026
```

1. Kubernetes config section is explicitly enabled
and disabled. It is disabled by default.

2. Public address in kubernetes section
is propagated to tsh profile

The other part of the commit updates Ping
endpoint to send proxy configuration back to
the client, including kubernetes public address
and ssh listen address.

Clients updates profile accordingly to configuration
received from the proxy.
2018-08-06 11:57:36 -07:00
Russell Jones ce1c7476b9 Updated dir backend to a flat keyspace. Added UpsertItems endpoint to
all backends to support bulk insertion. Added UpsertNodes endpoint,
which is used by the state cache to speed up GetNodes.
2018-07-13 20:12:34 +00:00
Sasha Klizhentas 95dcc8bbe7 Introduce disconnect client logic.
This commit implements #1935, fixes #2038

Auth server now supports global
defaults for timeout behavior:

```
auth_service:
  client_idle_timeout:  15m
  disconnect_expired_cert: no
```

New role options were introduced:

```
kind: role
version: v3
metadata:
  name: intern
  spec:
    options:
    # these two settings override the global ones:
    client_idle_timeout:  1m
    disconnect_expired_cert: yes
```
2018-07-12 19:00:13 -07:00
Sasha Klizhentas e570b24eeb Update tsh login to select clusters.
The following changes have been introduced
to tsh login behavior:

1. tsh login now accepts cluster name
as an optional positional argument:

$ tsh login clustername

2. If tsh login is called without arguments
and the current credentials are valid,
tsh login now prints status, previous behavior
always forced login:

$ tsh login
... print status if logged in...

2. If tsh login is called with the proxy
equal to current, tsh login selects cluster,
otherwise it will re-login to another proxy:

$ tsh login one
... selected cluster one

$ tsh login two
... selected cluster two

$ tsh login --proxy=example.com three
... selected cluster three because
proxy is the same

$ tsh login --proxy=acme.example.com four
...will switch to proxy acme.example.com
and cluster four
2018-06-25 18:28:46 -07:00
Russell Jones a62102c3e8 Add ability to detect when a proxy has been removed forever to discovery
protocol.
2018-06-21 23:14:52 +00:00
Sasha Klizhentas 03069a2aad Kubernetes proxy integration tests.
This PR contains Kubernetes proxy
integration tests and associated internal changes.
2018-06-14 16:47:52 -07:00
Sasha Klizhentas 074961892a Precompute keys only for auth and proxies.
This commit fixes #1886

Previously the code was precomputing keys
even for SSH nodes, that do not need precomputed
private keys pool.
2018-05-04 13:41:13 -07:00
Sasha Klizhentas 3e144cb900 Teleport certificate authority rotation.
This commit implements #1860

During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.

The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.

Operator can launch CA rotation in auto or manual modes.

In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.

In auto mode state transitions are happening automatically
on a specified schedule.

The design documentation is embedded in the code:

lib/auth/rotate.go
2018-04-30 12:58:57 -07:00
Russell Jones 6be8af16c5 Removed depreciated code and re-factored tests to use
golang.org/x/crypto.
2018-04-05 23:14:20 +00:00
Russell Jones 9454d0133a Create context once either "session" or "direct-tcpip" channel has been
opened in the forwarding server.
2018-04-02 15:04:48 -07:00
Sasha Klizhentas bad1b0498d External events and sessions storage.
Updates #1755

Design
------

This commit adds support for pluggable events and
sessions recordings and adds several plugins.

In case if external sessions recording storage
is used, nodes or proxies depending on configuration
store the session recordings locally and
then upload the recordings in the background.

Non-print session events are always sent to the
remote auth server as usual.

In case if remote events storage is used, auth
servers download recordings from it during playbacks.

DynamoDB event backend
----------------------

Transient DynamoDB backend is added for events
storage. Events are stored with default TTL of 1 year.

External lambda functions should be used
to forward events from DynamoDB.

Parameter audit_table_name in storage section
turns on dynamodb backend.

The table will be auto created.

S3 sessions backend
-------------------

If audit_sessions_uri is specified to s3://bucket-name
node or proxy depending on recording mode
will start uploading the recorded sessions
to the bucket.

If the bucket does not exist, teleport will
attempt to create a bucket with versioning and encryption
turned on by default.

Teleport will turn on bucket-side encryption for the tarballs
using aws:kms key.

File sessions backend
---------------------

If audit_sessions_uri is specified to file:///folder
teleport will start writing tarballs to this folder instead
of sending records to the file server.

This is helpful for plugin writers who can use fuse or NFS
mounted storage to handle the data.

Working dynamic configuration.
2018-03-15 12:42:43 -07:00
Sasha Klizhentas 68b65f5b24 Teleport signal handling and live reload.
This commit introduces signal handling.
Parent teleport process is now capable of forking
the child process and passing listeners file descriptors
to the child.

Parent process then can gracefully shutdown
by tracking the amount of current connections and
closing listeners once the amount goes to 0.

Here are the signals handled:

* USR2 signal will cause the parent to fork
a child process and pass listener file descriptors to it.
Child process will close unused file descriptors
and will bind to the used ones.

At this moment two processes - the parent
and the forked child process will be serving requests.
After looking at the traffic and the log files,
administrator can either shut down the parent process
or the child process if the child process is not functioning
as expected.

* TERM, INT signals will trigger graceful process shutdown.
Auth, node and proxy processes will wait until the amount
of active connections goes down to 0 and will exit after that.

* KILL, QUIT signals will cause immediate non-graceful
shutdown.

* HUP signal combines USR2 and TERM signals in a convenient
way: parent process will fork a child process and
self-initate graceful shutdown. This is a more convenient
than USR2/TERM sequence, but less agile and robust
as if the connection to the parent process drops, but
the new process exits with error, administrators
can lock themselves out of the environment.

Additionally, boltdb backend has to be phased out,
as it does not support read/writes by two concurrent
processes. This had required refactoring of the dir
backend to use file locking to allow inter-process
collaboration on read/write operations.
2018-02-13 15:18:47 -08:00
Russell Jones cefe4ff7f1 Increased delay when in TestTwoClusters and some more debugging logs in
integration tests.
2018-02-13 12:36:52 -08:00
Russell Jones 9a3aa9999c Wait for *Ready events for all Start* functions for TeleInstance.
Increase delay to 250 millisecond between runCommands.
2018-02-12 17:26:44 -08:00
Russell Jones 1af26fe9c1 Wrap connection used in HTTP CONNECT proxy to first read any buffered
data then ready from the underlying connection.
2018-02-09 21:33:08 +00:00
Russell Jones f2b8bbd1c1 Added *Ready events that indicate a service has started. Wait on these
events in integration events before starting a test.
2018-02-06 16:52:46 -08:00
Sasha Klizhentas bb9b00e451 Cache recently accessed items.
Introduce cache for items that were accessed
by proxies and nodes within 2 second window to reduce
load on database under high load.
2018-01-31 16:35:18 -08:00
Russell Jones b3d4d36fde Added cert_format to role as well as tsh to control how a certificate is
generated.
2018-01-09 14:57:35 -08:00
Sasha Klizhentas ef473d809e Join address for web, reverse tunnel, fixes #1544
Support configuration for web and reverse tunnel
proxies to listen on the same port.

* Default config are not changed for backwards compatibility.
* If administrator configures web and reverse tunnel
addresses to be on the same port, multiplexing is turned on
* In trusted clusters configuration reverse_tunnel_addr
defaults to web_addr.
2018-01-05 16:20:56 -08:00
Sasha Klizhentas 0130c6aa41 Mutual TLS Auth server and clients.
This commit introduced mutual TLS authentication
for auth server API server.

Auth server multiplexes HTTP over SSH - existing
protocol and HTTP over TLS - new protocol
on the same listening socket.

Nodes and users authenticate with 2.5.0 Teleport
using TLS mutual TLS except backwards-compatibility
cases.
2017-12-27 11:37:19 -08:00
Russell Jones 3bfe61dc0b Added integration tests and minor fixes. 2017-12-19 17:40:05 -08:00
Russell Jones 37ab1596c4 Updated reverse tunnel to allow use to forwarding server. 2017-12-09 19:29:20 +00:00
Russell Jones 7018852c5d Added forwarding SSH server. 2017-12-04 17:01:52 -08:00
Russell Jones 4765e32473 Updated ClusterConfig to V3. 2017-10-26 12:34:51 -07:00
Russell Jones 432a7ad787 Added services.ClusterConfig resource which controls where (and if) a
session is recorded.
2017-10-25 21:09:21 +00:00
Sasha Klizhentas e461b4e6bd fix tests 2017-10-12 16:51:18 -07:00
Sasha Klizhentas 0290cccb57 integration tests for proxies 2017-10-12 10:35:46 -07:00
Ev Kontsevoy 93f7dd3bf9 Better handling of "development mode"
Instead of quietly changing behavior because `DEBUG` envar was set to
true, Teleport now explicitly requires scary --insecure flag to enable
this behavior.
2017-09-10 13:45:14 -07:00
Russell Jones c543067001 Removed namespaces and expires from user interface. 2017-08-30 18:11:13 +00:00
Sasha Klizhentas 8b81a0c384 Migrate to golang/dep for dependency management
Update following packages:

* Replace Sirupsen/log with sirupsen/log everywhere
* Update etcd client to 3.2.4
* Update docker/term to moby/term
* Update kr/pty to v1.0.0 release
* Update K8s client to 2.0
2017-08-22 15:30:30 -07:00
Russell Jones b4c805fe23 Re-factored cluster configuration. 2017-08-07 17:20:16 -07:00