Commit graph

48 commits

Author SHA1 Message Date
Sasha Klizhentas 743ea57f87 Refactor discovery protocol.
This commit refactor discovery protocol
to make it less dependent on the database and
scale better on large numbers of tunnels.

Reverse tunnel is now always sending
back the list of all proxies registered in the
cluster in the form of discovery requests.

Before this commit, reverse tunnel server was comparing
existing TunnelConnection with the Proxies
and sending back the list of proxies that were not
discovered.

This required nodes to register tunnel connections
in the database and servers poll the connections.

On 10K clusters this is not scalable. Instead,
the change assumes that there is not a lot of
proxies so it's OK to send the information about
them back to all connected agents.

Agent pools can make up their own mind about what to
do with the information - they can ignore
the request as long as they observe all agents
connected to the requested proxies.

At the same time, to avoid using too much traffic,
reverse tunnel server only sends the discovery requests
after the first agent heartbeat and in case if
proxy list changes. To make it possible reverse tunnel
sets up a watch on the proxies.
2019-05-14 11:26:45 -07:00
Sasha Klizhentas 7467e47718 Cache auth servers and new find endpoint
Whenever many IOT style nodes are connecting
back to the web proxy server, they all
call /find endpoint to discover the configuration.

This new endpoint is designed to be fast and not
hit the database.

In addition to that every proxy reverse tunnel
connection handler was fetching auth servers and
this commit adds caching for the auth servers
on the proxy side.
2019-04-30 17:43:01 -07:00
Sasha Klizhentas 4917d33851 Skip schema validation for reverse tunnels
This commit skips slow JSON schema validation
for reverse tunnels in some hot spots to
improve scalability.
2019-04-29 13:01:19 -07:00
Sasha Klizhentas 8356ae6a74 Use in-memory cache for the auth server API.
This commit expands the usage of the caching layer
for auth server API:

* Introduces in-memory cache that is used to serve all
Auth server API requests. This is done to achieve scalability
on 10K+ node clusters, where each node fetches certificate authorities,
roles, users and join tokens. It is not possible to scale
DynamoDB backend or other backends on 10K reads per seconds
on a single shard or partition. The solution is to introduce
an in-memory cache of the backend state that is always used
for reads.

* In-memory cache has been expanded to support all resources
required by the auth server.

* Experimental `tctl top` command has been introduced to display
common single node metrics.

Replace SQLite Memory Backend with BTree

SQLite in memory backend was suffering from
high tail latencies under load (up to 8 seconds
in 99.9%-ile on load configurations).

This commit replaces the SQLite memory caching
backend with in-memory BTree backend that
brought down tail latencies to 2 seconds (99.9%-ile)
and brought overall performance improvement.
2019-04-12 14:23:09 -07:00
Sasha Klizhentas f40df845db Events and GRPC API
This commit introduces several key changes to
Teleport backend and API infrastructure
in order to achieve scalability improvements
on 10K+ node deployments.

Events and plain keyspace
--------------------------

New backend interface supports events,
pagination and range queries
and moves away from buckets to
plain keyspace, what better aligns
with DynamoDB and Etcd featuring similar
interfaces.

All backend implementations are
exposing Events API, allowing
multiple subscribers to consume the same
event stream and avoid polling database.

Replacing BoltDB, Dir with SQLite
-------------------------------

BoltDB backend does not support
having two processes access the database at the
same time. This prevented Teleport
using BoltDB backend to be live reloaded.

SQLite supports reads/writes by multiple
processes and makes Dir backend obsolete
as SQLite is more efficient on larger collections,
supports transactions and can detect data
corruption.

Teleport automatically migrates data from
Bolt and Dir backends into SQLite.

GRPC API and protobuf resources
-------------------------------

GRPC API has been introduced for
the auth server. The auth server now serves both GRPC
and JSON-HTTP API on the same TLS socket and uses
the same client certificate authentication.

All future API methods should use GRPC and HTTP-JSON
API is considered obsolete.

In addition to that some resources like
Server and CertificateAuthority are now
generated from protobuf service specifications in
a way that is fully backward compatible with
original JSON spec and schema, so the same resource
can be encoded and decoded from JSON, YAML
and protobuf.

All models should be refactored
into new proto specification over time.

Streaming presence service
--------------------------

In order to cut bandwidth, nodes
are sending full updates only when changes
to labels or spec have occured, otherwise
new light-weight GRPC keep alive updates are sent
over to the presence service, reducing
bandwidth usage on multi-node deployments.

In addition to that nodes are no longer polling
auth server for certificate authority rotation
updates, instead they subscribe to event updates
to detect updates as soon as they happen.

This is a new API, so the errors are inevitable,
that's why polling is still done, but
on a way slower rate.
2018-12-10 17:20:24 -08:00
Sasha Klizhentas e84bf10889 Batch get for tunnel connection, remote cluster
Use batch fetch for tunnel connections
and remote cluster objects to speed up
teleport in scenarios with many trusted clusters.
2018-09-28 11:00:36 -07:00
Russell Jones ce1c7476b9 Updated dir backend to a flat keyspace. Added UpsertItems endpoint to
all backends to support bulk insertion. Added UpsertNodes endpoint,
which is used by the state cache to speed up GetNodes.
2018-07-13 20:12:34 +00:00
Sasha Klizhentas ef20e45208 Enforce trusted cluster resource name, fixes #1543
This commit makes sure that trusted cluster resource
name is the same name as the cluster name it conects to.

If user supplies name of the trusted cluster resource
that is different from the cluster name, the warning
will be issued and trusted cluster will be renamed.

Upgrade procedure renames existing trusted clusters
in place.

If user supplies trusted cluster without role
mappings, or with role mappings referring to
non-existent roles that do not exist, the
error will be returned.
2018-01-11 14:13:30 -08:00
Sasha Klizhentas e114fbd46c Add support for remote_cluster, implements #1526
This commit adds remote cluster resource that specifies
connection and trust of the remote trusted cluster to the local
cluster. Deleting remote cluster resource deletes trust
established between clusters on the local cluster side
and terminates all reverse tunnel connections.

Migrations make sure that remote cluster resources exist
after upgrade of the auth server.
2017-12-28 17:48:30 -08:00
Sasha Klizhentas e461b4e6bd fix tests 2017-10-12 16:51:18 -07:00
Sasha Klizhentas e12ec7422c refactoring 2017-10-05 17:29:31 -07:00
Russell Jones ff63e664de Refactored Trusted Cluster creation/update. 2017-09-12 18:44:49 -07:00
Russell Jones 4719c4bdfa Allow enable or disable of a TrustedCluster without performing the
exchange again.
2017-08-18 20:14:42 +00:00
Sasha Klizhentas 3c2570fa35 Sasha High Availability. 2017-04-07 16:54:15 -07:00
Russell Jones 2f70866e5a Added TrustedCluster resource. 2017-03-09 13:49:44 -08:00
Sasha Klizhentas 1eec7c0ebd refactor, refactor refactor 2016-12-29 12:23:58 -08:00
Sasha Klizhentas c98624c038 more migration code 2016-12-28 14:07:03 -08:00
Sasha Klizhentas 30739de741 more exciting code 2016-12-27 18:54:55 -08:00
Sasha Klizhentas 5abf6d44d5 continue fixing tests and code 2016-12-18 16:58:53 -08:00
Sasha Klizhentas cb143dab46 ssh server tests recovered 2016-12-18 13:36:02 -08:00
Sasha Klizhentas 698e615fd7 make API backwards compatible with pre-namespaces 2016-12-13 14:20:52 -08:00
Sasha Klizhentas 5ce39ffb85 introduce namespaces and roles 2016-12-12 16:18:31 -08:00
Sasha Klizhentas 44a8380cc4 more work 2016-12-10 11:34:39 -08:00
Roman Tkachenko 20e281916a Relax requirements to domain name 2016-10-10 14:24:34 -07:00
bn0ir 6cddb989cb Fix import format 2016-10-05 13:15:49 +05:00
bn0ir bd0ba96a43 Sort labels by their key alphabetically in ./tctl nodes ls 2016-10-04 17:22:51 +05:00
Ev Kontsevoy 126a9e9ff8 Minor bugs regarding reverse tunnels
- Friendly error messages when parsing configuration and establishing
  connection

- Bugs related to "first start" vs subsequent starts (reverse tunnells
  added to YAML file won't be seen upon restart)

- Nicer logging
2016-06-09 19:17:07 -07:00
klizhentas d68e693cad migrate to trace errors 2016-04-12 11:07:14 -07:00
klizhentas f398534515 moving code around and splitting interfaces 2016-04-04 17:09:00 -07:00
klizhentas 6edd6675e6 re-introduce reverse tunnels into teleport
Reverse tunnels are now first class citizens of teleport.
There's no longer static configuration for reverse tunnel agents
in the config. Instead, admins can add and remove reverse tunnels
using tctl reversetunnel (hidden) commands.

* tctl reversetunnel ls
  lists reverse tunnels

* tctl reversetunnel upsert a.example.com 10.0.0.4:2023,10.0.0.5:2033 --ttl=10m
  updates or inserts reverse tunnel for 10 minutes

* tctl reversetunnel del a.example.com
  deletes a reverse tunnel

Teleport proxies watch changes in the reverse tunnels on the backend and
spin up / spin down reverse tunnels according to these changes.
2016-03-18 17:13:22 -07:00
Ev Kontsevoy 232fde7770 PR comments 2016-03-15 10:23:15 -07:00
Ev Kontsevoy b184319181 Implemented label filtering on TSH
Works with:
- ssh
- ls
- scp
2016-03-14 18:44:28 -07:00
Ev Kontsevoy a0f9e3f8b0 Removed the list of auth servers from "Remote Site" interface 2016-03-14 12:23:56 -07:00
Ev Kontsevoy e90173fab7 Intermediate commit 2016-03-14 11:31:02 -07:00
Ev Kontsevoy 277a7c6b42 Merge remote-tracking branch 'origin/master' into ev/ssh-api 2016-03-13 19:52:34 -07:00
Ev Kontsevoy 5b97e83986 Intermediate commit 2016-03-13 19:23:30 -07:00
klizhentas c1e0604dd0 Introduce auth server and proxy heartbeats
This commit introduces heartbeats of AuthServers and Proxies and fixes several issues:

1. Server init problem

There was an issue in server init, when certificates of multiple roles were overwriting each otther.
Now Teleport stores each keypair and certificate in a separate file <hostid>.role.key and <hostid>.role.cert
This also means that it's backwards incompatible with previous on disk format.

2. Proxy and Auth heartbeats

Auth servers and proxies now heartbeat into cluster as well

3. Bugfixes:

* Proxy role was missing, it is now treated as a separate role with permissions
* AdvertiseIP is now a global setting that can be used by all roles
* --advertise-ip flag was ignored and was never applied
* teleport service initialization has been simplified, now each role get it's own client
* minor cleanups
2016-03-13 18:15:09 -07:00
klizhentas 519f07611b fix data races and remove sleep from tests
* fix data race with advertise ip
* remove global variable
* simplify pings logic and fix ping bug
* fix potential bug in dynamic labels
2016-03-08 18:41:05 -08:00
Ev Kontsevoy 39382dc41a tsh ls works
similarly to tctl nodes ls
closes #181
2016-03-08 16:30:32 -08:00
Ev Kontsevoy 6a8dc6c668 Nonintrusive minor refactoring of "auth tunnel"
1. Wrote comments in places where I was confused
2. Renamed variables/structs that were confusing
3. Cleaned up code for easier reading
2016-03-01 14:40:10 -08:00
Ev Kontsevoy df4a334a10 Config. file functionaliy is done.
Left to do:

- tests
- a bit of usability testing and code polish
2016-02-20 17:17:09 -08:00
Alex Lyulkov f35f74cb46 working on tsh share 2016-02-12 18:25:54 +03:00
Alex Lyulkov a56b5236ac Moved to go1.5 vendoring 2016-01-20 18:52:25 +03:00
Alex Lyulkov 02b13a7ead Added period for labels 2015-12-10 14:01:34 +03:00
Alex Lyulkov c8332eba27 Added node labels, fixed limiter bugs 2015-12-07 23:05:54 +03:00
Alex Lyulkov e94152b6f6 Added hostname to presence service(Now auth knows hostname of each node) 2015-11-04 21:02:58 -08:00
klizhentas 00ef621e6b Apply apache license to teleport 2015-10-31 11:56:49 -07:00
Alex Lyulkov a3db86b236 More folders arrangments 2015-10-05 20:36:55 +03:00
Renamed from services/presence.go (Browse further)