Commit graph

106 commits

Author SHA1 Message Date
Sasha Klizhentas 8356ae6a74 Use in-memory cache for the auth server API.
This commit expands the usage of the caching layer
for auth server API:

* Introduces in-memory cache that is used to serve all
Auth server API requests. This is done to achieve scalability
on 10K+ node clusters, where each node fetches certificate authorities,
roles, users and join tokens. It is not possible to scale
DynamoDB backend or other backends on 10K reads per seconds
on a single shard or partition. The solution is to introduce
an in-memory cache of the backend state that is always used
for reads.

* In-memory cache has been expanded to support all resources
required by the auth server.

* Experimental `tctl top` command has been introduced to display
common single node metrics.

Replace SQLite Memory Backend with BTree

SQLite in memory backend was suffering from
high tail latencies under load (up to 8 seconds
in 99.9%-ile on load configurations).

This commit replaces the SQLite memory caching
backend with in-memory BTree backend that
brought down tail latencies to 2 seconds (99.9%-ile)
and brought overall performance improvement.
2019-04-12 14:23:09 -07:00
Russell Jones 1a2a8bf66d Re-vendor github.com/gravitational/kingpin. 2019-02-05 10:31:24 -08:00
Sasha Klizhentas f40df845db Events and GRPC API
This commit introduces several key changes to
Teleport backend and API infrastructure
in order to achieve scalability improvements
on 10K+ node deployments.

Events and plain keyspace
--------------------------

New backend interface supports events,
pagination and range queries
and moves away from buckets to
plain keyspace, what better aligns
with DynamoDB and Etcd featuring similar
interfaces.

All backend implementations are
exposing Events API, allowing
multiple subscribers to consume the same
event stream and avoid polling database.

Replacing BoltDB, Dir with SQLite
-------------------------------

BoltDB backend does not support
having two processes access the database at the
same time. This prevented Teleport
using BoltDB backend to be live reloaded.

SQLite supports reads/writes by multiple
processes and makes Dir backend obsolete
as SQLite is more efficient on larger collections,
supports transactions and can detect data
corruption.

Teleport automatically migrates data from
Bolt and Dir backends into SQLite.

GRPC API and protobuf resources
-------------------------------

GRPC API has been introduced for
the auth server. The auth server now serves both GRPC
and JSON-HTTP API on the same TLS socket and uses
the same client certificate authentication.

All future API methods should use GRPC and HTTP-JSON
API is considered obsolete.

In addition to that some resources like
Server and CertificateAuthority are now
generated from protobuf service specifications in
a way that is fully backward compatible with
original JSON spec and schema, so the same resource
can be encoded and decoded from JSON, YAML
and protobuf.

All models should be refactored
into new proto specification over time.

Streaming presence service
--------------------------

In order to cut bandwidth, nodes
are sending full updates only when changes
to labels or spec have occured, otherwise
new light-weight GRPC keep alive updates are sent
over to the presence service, reducing
bandwidth usage on multi-node deployments.

In addition to that nodes are no longer polling
auth server for certificate authority rotation
updates, instead they subscribe to event updates
to detect updates as soon as they happen.

This is a new API, so the errors are inevitable,
that's why polling is still done, but
on a way slower rate.
2018-12-10 17:20:24 -08:00
Russell Jones e77c8f5a54 Re-vendor github.com/gravitational/roundtrip. 2018-10-12 17:42:37 -07:00
Alexey Kontsevoy ab86a567ec add new resource - License 2018-09-12 16:17:28 -04:00
Sasha Klizhentas bcc25f971f Upgrade etcd backend
New Etcd backend is using GRPC api v3,
dependencies were updated accordingly.
2018-09-10 15:58:05 -07:00
Sasha Klizhentas dce45f1c4d Additional licensing hooks 2018-08-10 11:12:05 -07:00
Russell Jones 116bd4d08e Vendor github.com/Microsoft/go-winio. 2018-08-03 11:06:08 -07:00
Russell Jones a4b070c750 Fixed vendoring issues.
* Changed import path from github.com/moby/moby to canonical path of
    github.com/docker/docker.
  * Updated dependency for github.com/docker/docker/pkg/term.
  * Updated dependency for github.com/Azure/go-ansiterm.
2018-07-25 13:51:50 -07:00
Sasha Klizhentas e595c3793d Log events to multiple destinations
This commit implements #2070

```yaml
teleport:
  storage:
    type: dir
    audit_events_uri:  [file:///var/lib/teleport/events, dynamodb://test_grv8_events]
    audit_sessions_uri: s3://testgrv8records
```
2018-07-16 18:34:13 -07:00
Russell Jones 5ae6195d79 Use protobufs to communicate between proxy and web client. 2018-07-16 14:44:50 -07:00
Sasha Klizhentas 273b96bd87 Add prune settings and remove unused files 2018-06-29 16:23:59 -07:00
Sasha Klizhentas 026e8e4383 Fix proxying long polling requests.
Fixes #2039

This commit fixes long polling cases with teleport
for K8s that did not work because flush was not
called during io.Copy commands.
2018-06-29 15:34:43 -07:00
Sasha Klizhentas cece4be212 Initial implementation of Kubernetes support
This issue updates #1986.

This is intial, experimental implementation that will
be updated with tests and edge cases prior to production 2.7.0 release.

Teleport proxy adds support for Kubernetes API protocol.
Auth server uses Kubernetes API to receive certificates
issued by Kubernetes CA.

Proxy intercepts and forwards API requests to the Kubernetes
API server and captures live session traffic, making
recordings available in the audit log.

Tsh login now updates kubeconfig configuration to use
Teleport as a proxy server.
2018-06-03 12:55:13 -07:00
Russell Jones 87010f5239 Revendor github.com/gravitational/roundtrip and enable sanitizer on it. 2018-06-02 00:38:46 +00:00
Russell Jones f10c024458 Validate all URL paths. 2018-05-30 19:53:41 +00:00
Sasha Klizhentas 540a63dde1 Update library to be less nosy.
This addresses the first part, fixes #1865
2018-05-04 11:10:46 -07:00
Sasha Klizhentas 3e144cb900 Teleport certificate authority rotation.
This commit implements #1860

During the the rotation procedure issuing TLS and SSH
certificate authorities are re-generated and all internal
components of the cluster re-register to get new
credentials.

The rotation procedure is based on a distributed
state machine algorithm - certificate authorities have
explicit rotation state and all parts of the cluster sync
local state machines by following transitions between phases.

Operator can launch CA rotation in auto or manual modes.

In manual mode operator moves cluster bewtween rotation states
and watches the states of the components to sync.

In auto mode state transitions are happening automatically
on a specified schedule.

The design documentation is embedded in the code:

lib/auth/rotate.go
2018-04-30 12:58:57 -07:00
Sasha Klizhentas bad1b0498d External events and sessions storage.
Updates #1755

Design
------

This commit adds support for pluggable events and
sessions recordings and adds several plugins.

In case if external sessions recording storage
is used, nodes or proxies depending on configuration
store the session recordings locally and
then upload the recordings in the background.

Non-print session events are always sent to the
remote auth server as usual.

In case if remote events storage is used, auth
servers download recordings from it during playbacks.

DynamoDB event backend
----------------------

Transient DynamoDB backend is added for events
storage. Events are stored with default TTL of 1 year.

External lambda functions should be used
to forward events from DynamoDB.

Parameter audit_table_name in storage section
turns on dynamodb backend.

The table will be auto created.

S3 sessions backend
-------------------

If audit_sessions_uri is specified to s3://bucket-name
node or proxy depending on recording mode
will start uploading the recorded sessions
to the bucket.

If the bucket does not exist, teleport will
attempt to create a bucket with versioning and encryption
turned on by default.

Teleport will turn on bucket-side encryption for the tarballs
using aws:kms key.

File sessions backend
---------------------

If audit_sessions_uri is specified to file:///folder
teleport will start writing tarballs to this folder instead
of sending records to the file server.

This is helpful for plugin writers who can use fuse or NFS
mounted storage to handle the data.

Working dynamic configuration.
2018-03-15 12:42:43 -07:00
Roman Tkachenko 143b834e57 Changes for the upcoming teleport pro:
* Allow external audit log plugins
* Add support for auth API server plugins
* Add license file path configuration parameter (not used in open-source)
* Extend audit log with user login events
2017-11-21 17:35:58 -08:00
Sasha Klizhentas db4952b788 revendor trace and logger, fixes #1450 2017-11-20 12:08:56 -08:00
Sasha Klizhentas fed7d2f116 fix audit log file leak, fixes #1433
This is a fix for file leak in audit log server caused
by design issue:

Session file descriptors in audit log were opened on demand
when the session event or byte stream chunk  was reported.

AuditLog server relied on SessionEnd event to close the
file descriptors associated with the session.

However, when SessionEnd event does not arrive (e.g.
there is a timeout or disconnect), the file descriptors
were not closed. This commit adds periodic clean up
of inactive sessions.

SessionEnd is now used as an optimization measure
to close the files, but is not used as the only
trigger to close files.

Now, inactive idle sessions, will close file descriptors
after periods of inactivity and will reopen the file
descriptors when the session activity resumes.

SessionLogger was not designed to open/close files
multiple times as it was reseting offsets
every time the session files were opened. This
change fixes this condition as well.
2017-11-15 18:39:27 -08:00
Sasha Klizhentas cd2d2726de Update SDK 2017-11-10 12:22:47 -08:00
Russell Jones 9a667fe527 Re-vendor github.com/sirupsen/logrus from github.com/gravitational/logrus. 2017-11-02 14:59:05 -07:00
Sasha Klizhentas 9543bf2208 Merge branch 'master' into sasha/curiosity 2017-10-12 16:57:41 -07:00
Russell Jones 1f5ec5b89c Re-vendor github.com/gravitational/go-oidc. 2017-10-11 22:55:39 +00:00
Sasha Klizhentas f12024031a more work on logging and stats 2017-10-09 18:58:24 -07:00
Sasha Klizhentas 8839b85539 update trace 2017-10-09 12:15:56 -07:00
Sasha Klizhentas 6e4d6b0cb2 more work, discovery works 2017-10-07 18:11:03 -07:00
Sasha Klizhentas 53f4a0128e introduce curiosity protocol and fix logs 2017-10-06 15:38:15 -07:00
Ev Kontsevoy 0cc39838ae Removed 'goterm' dependency
goterm had no license, I quickly replaced it with our own little table
formatter.

also rewrote some tsh commands, that were using home-made formatting, to
the new table, so the output is now much nicer.
2017-09-06 19:06:48 -07:00
Ev Kontsevoy 082b391d57 Re-added my re-vendored kingpin 2017-09-05 22:50:36 -07:00
Sasha Klizhentas 48ef293118 update kingpin 2017-08-29 17:34:59 -07:00
Sasha Klizhentas ee80f947e0 vendor dependencies 2017-08-25 11:12:40 -07:00
Sasha Klizhentas ddfacb923b remove extra Sirupsen 2017-08-23 11:35:37 -07:00
Sasha Klizhentas 8b81a0c384 Migrate to golang/dep for dependency management
Update following packages:

* Replace Sirupsen/log with sirupsen/log everywhere
* Update etcd client to 3.2.4
* Update docker/term to moby/term
* Update kr/pty to v1.0.0 release
* Update K8s client to 2.0
2017-08-22 15:30:30 -07:00
Ev Kontsevoy 0ce13c8b1b Fixed shell globbing for scp 2017-06-08 22:18:09 -07:00
Ev Kontsevoy 369cab4698 Re-vendored osext dependency 2017-06-02 16:01:55 -07:00
Russell Jones 5215b07612 Revendor gosaml2 and goxmldsig. 2017-06-02 13:39:48 -07:00
Sasha Klizhentas 9fa1ea56dc update deps 2017-05-27 15:36:14 -07:00
Sasha Klizhentas 91b4a663b9 instrument with monitoring tools, fixes #935
* Add prometheus endpoint to expose system stats
* Add heealthz endpoint
* Add gops endpoint for real time troubleshooting
* Deprecate httprof endpoint
2017-05-13 18:32:10 -07:00
Russell Jones 836517251d Revendor gosaml2 and goxmldsig. 2017-05-12 14:10:19 -07:00
Sasha Klizhentas f8641681f6 SAML 2.0 initial implementation 2017-05-12 14:10:18 -07:00
Russell Jones 6686592e30 Use shellescape library which uses single quotes so environment
variables are not expanded.
2017-05-03 16:52:39 -07:00
Sasha Klizhentas 684c6207fd add hdr histogram 2017-04-30 16:28:07 -07:00
Russell Jones f5c90a02e6 Removed shell parsing for scp code as well. 2017-04-19 12:02:17 -07:00
Sasha Klizhentas 3c2570fa35 Sasha High Availability. 2017-04-07 16:54:15 -07:00
Russell Jones c7956899d5 Merge claims from UserInfo endpoint into claims from ID token. Also,
fallback to Base64 decoding if Base64-URL decoding fails.
2017-03-30 17:40:00 -07:00
Russell Jones f7934b5be4 Set default PTY size if an invalid size is requested and
correctly split command.
2017-03-21 16:50:07 -07:00
Ev Kontsevoy 4a07dd3e22 Improved CLI login procedure
This commit adds several improvements to how CLI SSH login works

- Validated keys are added to the SSH agent [1]
- tsh will does not verify host keys twice anymore
- error messages for "access denied" look clean now

[1] This is huge. This means that tsh login can "feed" the keys to the
    built-in SSH agents of the OS and OpenSSH can fetch them from there.

QUESTION: why do we even need `tsh agent` option then? ssh-agent is
installed on every Linux/OSX machine.
2017-01-24 19:54:41 -08:00