Commit graph

34 commits

Author SHA1 Message Date
Jakub Nyckowski 0ee91f6c37
Enable GCI linter (#17894) 2022-10-28 20:20:28 +00:00
Joel 6caba42ec1
Add support for IdP-Initiated SAML2 login (#13924) 2022-08-22 18:27:44 +00:00
Joel e9fb1e84e2
Harden SQLite permissions (#12096) 2022-05-02 14:12:45 +00:00
Edoardo Spadolini cae1d320c7
Default to synchronous FULL for sqlite (#11387) 2022-04-01 07:15:24 +00:00
Edoardo Spadolini d83886e9c3
Address problems in concurrent sqlite access (#10706)
* Use BEGIN IMMEDIATE to start transactions

This makes it so all transactions grab a write lock
rather than a read lock that can be upgraded in case of
a write; in case of multiple writers (which, in our
case, can only happen during a restart as the new
process reopens the same sqlite database) this will
prevent two transactions from attempting to upgrade
their lock, which would cause a SQLITE_BUSY error in
one of them. In regular operation this shouldn't cause
a performance hit, as we're using a single connection
to the sqlite database (guarded by locks in the go side)
anyway.

* Escape path in sqlite connection URL

This makes it so that the sqlite backend supports paths with ? in them.

* Close process storage on TeleportProcess shutdown

This aligns the behavior of Shutdown with that of Close.

* Allow specifying the journal mode in sqlite

This will let sqlite backend users specify WAL mode in their config
file, and will allow us to specify alternate journal modes for our
on-disk caches in the future.

This also removes sqlite memory mode, as it's not used anywhere because
of its poor query performance compared to our in-memory backend, and
cleans up a bit of old cruft, and runs process storage in FULL sync
mode - it's very seldom written to and holds important data.
2022-03-15 16:54:48 +00:00
Jim Bishopp fe958969dc
Remove migration from backend API (#10835)
The Migrate method on the Backend interface was not implemented by any
backends.

Migration should be implemented in the New method of backends so they
can be sure migration happens before any background processes are
started.
2022-03-08 03:08:09 +00:00
Jim Bishopp 6c2ee0c149
Remove unused context from sqlite backend (#9658)
This removes an old context in the sqlite backend that is no longer used/referenced anywhere. Two fields in lite.Backend were removed: watchStarted and signalWatchStart (context and its cancel func).
2022-01-06 18:19:40 +00:00
Jakub Nyckowski d1baaaa399
Close all SQL statements (#9614)
Co-authored-by: Isaiah Becker-Mayer <isaiah@goteleport.com>
2022-01-06 01:16:52 +00:00
Forrest Marshall d52241d969 bump backend limit 2021-12-09 13:01:35 -08:00
Forrest Marshall 92f724cfd0 fix double-init and buffer overflows 2021-09-17 15:05:23 -07:00
Trent Clarke e91860631e
Port backend tests to testify / fix racy tests (#8170)
In order to get better visibility into the backend database tests using
the standard Go tooling, this changeset ports the backend tests away
from `Check`, and into subtests & `testify` for assertions.

This change means that individual sub-tests
 1. can be more easily identified in the json test logs, and
 2. can be more easily run individually from the command line

During this port I also discovered that some tests are using the fake
clocks incorrectly, which may be a cause of some of our flaky etcd
tests.
2021-09-10 04:14:04 -07:00
Brian Joerger 9b8b9d6d0c
rollback - Upgrade api version. (#7751) 2021-07-30 15:34:19 -07:00
Brian Joerger c040aca4c1
Upgrade api version. (#7609) 2021-07-28 13:51:21 -07:00
Forrest Marshall 50a7680d9a fix init event emission 2021-07-22 15:21:07 -07:00
Andrej Tokarčík 7c630ec960
Introduce Lock resource (#7430) 2021-07-07 18:20:53 +02:00
Brian Joerger 7bff7c41bd
Remove API aliases (#6983) 2021-06-04 13:29:31 -07:00
a-palchikov 3d459db6d3
Test flakes: use ordering tests for keep alives (#5358)
* Evaluate watcher events to decide whether keep-alives are effective
instead of relying on arbitrary TTLs (implemented as absolute time which
adds to trouble).

Fixes https://github.com/gravitational/teleport/issues/5346.

* Replace the approximate expire timestamps comparisons with the ordering tests

* Address review comments. Move ordered keep-alive tests back to backend/test/suite

* Use an alternative implementation of FakeClock.Advance for etcd to use real time.Sleep as etcd server cannot use fakeclock

* Address review comments

* Use fake clock in firestore tests

* Add missing import

Co-authored-by: Andrew Lytvynov <andrew@goteleport.com>
2021-04-14 14:47:14 -07:00
Andrew Lytvynov fc1c1dbd14 Move all utils.InitLoggerForTests calls to TestMain
This prevents data races between changing the standard logger and it
acutally being used.
2021-02-23 18:04:55 -08:00
dmitri a74c90769c Fix reported data races in lib/backend unit tests.
Fixes https://github.com/gravitational/teleport/issues/5331.
2021-02-02 15:07:25 -08:00
Andrew Lytvynov 5ca68f2351
Remove 'var _ = fmt.Printf' from *_test.go files (#5438)
These declarations serve no purpose, likely leftover from old debugging.
2021-01-29 17:01:10 -08:00
Andrew Lytvynov 92ed2db38a Fixing golint warnings, batch 1
Mostly cosmetic changes:
- making receiver names consistent
- renaming `foo.FooBar` to `foo.Bar` (using package name as prefix)
- removing redundant `else` branches
- changing `a += 1` to `a++`
2020-10-13 00:22:49 +00:00
Andrew Lytvynov c68b571080 Add a Migrate method to backend.Backend
Unify migrations and expose them to the calling code at startup.
All backends except for etcd implement a nop migration.
2020-07-02 23:24:49 +00:00
Andrew Lytvynov 3c94003379 errcheck: fix findings in lib/backend, lib/client 2020-06-01 20:16:16 +00:00
Andrew Lytvynov 617afc7e6f Fix remaining gosimple findings
List of fixed items:

```
integration/helpers.go:1279:2               gosimple  S1000: should use for range instead of for { select {} }
integration/integration_test.go:144:5       gosimple  S1009: should omit nil check; len() for nil slices is defined as zero
integration/integration_test.go:173:5       gosimple  S1009: should omit nil check; len() for nil slices is defined as zero
integration/integration_test.go:296:28      gosimple  S1019: should use make(chan error) instead
integration/integration_test.go:570:41      gosimple  S1019: should use make(chan interface{}) instead
integration/integration_test.go:685:40      gosimple  S1019: should use make(chan interface{}) instead
integration/integration_test.go:759:33      gosimple  S1019: should use make(chan string) instead
lib/auth/init_test.go:62:2                  gosimple  S1021: should merge variable declaration with assignment on next line
lib/auth/tls_test.go:1658:22                gosimple  S1024: should use time.Until instead of t.Sub(time.Now())
lib/backend/dynamo/dynamodbbk.go:420:5      gosimple  S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/dynamo/dynamodbbk.go:656:12     gosimple  S1039: unnecessary use of fmt.Sprintf
lib/backend/etcdbk/etcd.go:458:5            gosimple  S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/firestore/firestorebk.go:407:5  gosimple  S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/lite/lite.go:317:5              gosimple  S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/lite/lite.go:336:6              gosimple  S1004: should use !bytes.Equal(value, expected.Value) instead
lib/backend/memory/memory.go:365:5          gosimple  S1004: should use !bytes.Equal(expected.Key, replaceWith.Key) instead
lib/backend/memory/memory.go:376:5          gosimple  S1004: should use !bytes.Equal(existingItem.Value, expected.Value) instead
lib/backend/test/suite.go:327:10            gosimple  S1024: should use time.Until instead of t.Sub(time.Now())
lib/client/api.go:1410:9                    gosimple  S1003: should use strings.ContainsRune(name, ':') instead
lib/client/api.go:2355:32                   gosimple  S1019: should use make([]ForwardedPort, len(spec)) instead
lib/client/keyagent_test.go:85:2            gosimple  S1021: should merge variable declaration with assignment on next line
lib/client/player.go:54:33                  gosimple  S1019: should use make(chan int) instead
lib/config/configuration.go:1024:52         gosimple  S1019: should use make(services.CommandLabels) instead
lib/config/configuration.go:1025:44         gosimple  S1019: should use make(map[string]string) instead
lib/config/configuration.go:930:21          gosimple  S1003: should use strings.Contains(clf.Roles, defaults.RoleNode) instead
lib/config/configuration.go:931:22          gosimple  S1003: should use strings.Contains(clf.Roles, defaults.RoleAuthService) instead
lib/config/configuration.go:932:23          gosimple  S1003: should use strings.Contains(clf.Roles, defaults.RoleProxy) instead
lib/service/supervisor.go:387:2             gosimple  S1001: should use copy() instead of a loop
lib/tlsca/parsegen.go:140:9                 gosimple  S1034: assigning the result of this type assertion to a variable (switch generalKey := generalKey.(type)) could eliminate type assertions in switch cases
lib/utils/certs.go:140:9                    gosimple  S1034: assigning the result of this type assertion to a variable (switch generalKey := generalKey.(type)) could eliminate type assertions in switch cases
lib/utils/certs.go:167:40                   gosimple  S1010: should omit second index in slice, s[a:len(s)] is identical to s[a:]
lib/utils/certs.go:204:5                    gosimple  S1004: should use !bytes.Equal(certificateChain[0].SubjectKeyId, certificateChain[0].AuthorityKeyId) instead
lib/utils/parse/parse.go:116:45             gosimple  S1003: should use strings.Contains(variable, "}}") instead
lib/utils/parse/parse.go:116:6              gosimple  S1003: should use strings.Contains(variable, "{{") instead
lib/utils/socks/socks.go:192:10             gosimple  S1025: should use String() instead of fmt.Sprintf
lib/utils/socks/socks.go:199:10             gosimple  S1025: should use String() instead of fmt.Sprintf
lib/web/apiserver.go:1054:18                gosimple  S1024: should use time.Until instead of t.Sub(time.Now())
lib/web/apiserver.go:1954:9                 gosimple  S1039: unnecessary use of fmt.Sprintf
tool/tsh/tsh.go:1193:14                     gosimple  S1024: should use time.Until instead of t.Sub(time.Now())
```
2020-05-27 19:36:38 +00:00
Andrew Lytvynov 4b5cd7e68f gosimple: simplify or remote return statements 2020-05-15 16:32:45 +00:00
Andrew Lytvynov a48c40ad78 gosimple: replace time.Now().Sub(x) with time.Since(x) 2020-05-15 16:32:45 +00:00
Andrew Lytvynov f8661edea3 Clean up dead code across the codebase
Spring cleaning!
A very mechanical cleanup using several linters (unused, deadcode,
structcheck). Build and tests still pass so no behavior should be
affected.
2020-04-09 21:10:12 +00:00
Russell Jones de25684689 Added testing.Verbose to allow silencing of tests. 2020-02-06 11:15:44 -08:00
Sasha Klizhentas a22f7be365 Adds in-memory cache option, improves scalability for IOT mode.
This commit resolves #3227

In IOT mode, 10K nodes are connecting back to the proxies, putting
a lot of pressure on the proxy cache.

Before this commit, Proxy's only cache option were persistent
sqlite-backed caches. The advantage of those caches that Proxies
could continue working after reboots with Auth servers unavailable.

The disadvantage is that sqlite backend breaks down on many concurrent
reads due to performance issues.

This commit introduces the new cache configuration option, 'in-memory':

```yaml
teleport:
  cache:
    # default value sqlite,
    # the only supported values are sqlite or in-memory
    type: in-memory
```

This cache mode allows two m4.4xlarge proxies to handle 10K IOT mode connected
nodes with no issues.

The second part of the commit disables the cache reload on timer that caused
inconsistent view results for 10K displayed nodes with servers disappearing
from the view.

The third part of the commit increases the channels buffering discovery
requests 10x. The channels were overfilling in 10K nodes and nodes
were disconnected. The logic now does not treat the channel overflow
as a reason to close the connection. This is possible due to the changes
in the discovery protocol that allow target nodes to handle missing
entries, duplicate entries or conflicting values.
2020-02-06 09:16:48 -08:00
Russell Jones 9c2cfa1cd8 Cleanup of dead code.
* Removed legacy backends no longer supported.
* Removed code marked for deletion.
* Updated Makefile to use $ instead of ` to match Enterprise.
2019-07-02 18:01:44 -07:00
Russell Jones 0e3e4a1e62 Do not emit events in mirror mode.
Update mirror mode (for both the memory and SQLite backends) to no
longer emit events when an element expires. This allows caches to handle
update/delete logic themselves.

This fixes an issue where services.ProxyWatcher was not getting updates
to the list of proxies.
2019-05-21 09:58:43 -07:00
Alexander Klizhentas 92e5bf5081
Fixes in DynamoDB event polling (#2661)
* Add resest for buffers to close watchers
and reset buffer the state.
* Add reconnect logic to DynamoDB
* Add tests for cache watchers, make sure
the errors of the cache internal watcher propagate to
external watchers.
2019-04-17 18:52:09 -07:00
Sasha Klizhentas 8356ae6a74 Use in-memory cache for the auth server API.
This commit expands the usage of the caching layer
for auth server API:

* Introduces in-memory cache that is used to serve all
Auth server API requests. This is done to achieve scalability
on 10K+ node clusters, where each node fetches certificate authorities,
roles, users and join tokens. It is not possible to scale
DynamoDB backend or other backends on 10K reads per seconds
on a single shard or partition. The solution is to introduce
an in-memory cache of the backend state that is always used
for reads.

* In-memory cache has been expanded to support all resources
required by the auth server.

* Experimental `tctl top` command has been introduced to display
common single node metrics.

Replace SQLite Memory Backend with BTree

SQLite in memory backend was suffering from
high tail latencies under load (up to 8 seconds
in 99.9%-ile on load configurations).

This commit replaces the SQLite memory caching
backend with in-memory BTree backend that
brought down tail latencies to 2 seconds (99.9%-ile)
and brought overall performance improvement.
2019-04-12 14:23:09 -07:00
Sasha Klizhentas f40df845db Events and GRPC API
This commit introduces several key changes to
Teleport backend and API infrastructure
in order to achieve scalability improvements
on 10K+ node deployments.

Events and plain keyspace
--------------------------

New backend interface supports events,
pagination and range queries
and moves away from buckets to
plain keyspace, what better aligns
with DynamoDB and Etcd featuring similar
interfaces.

All backend implementations are
exposing Events API, allowing
multiple subscribers to consume the same
event stream and avoid polling database.

Replacing BoltDB, Dir with SQLite
-------------------------------

BoltDB backend does not support
having two processes access the database at the
same time. This prevented Teleport
using BoltDB backend to be live reloaded.

SQLite supports reads/writes by multiple
processes and makes Dir backend obsolete
as SQLite is more efficient on larger collections,
supports transactions and can detect data
corruption.

Teleport automatically migrates data from
Bolt and Dir backends into SQLite.

GRPC API and protobuf resources
-------------------------------

GRPC API has been introduced for
the auth server. The auth server now serves both GRPC
and JSON-HTTP API on the same TLS socket and uses
the same client certificate authentication.

All future API methods should use GRPC and HTTP-JSON
API is considered obsolete.

In addition to that some resources like
Server and CertificateAuthority are now
generated from protobuf service specifications in
a way that is fully backward compatible with
original JSON spec and schema, so the same resource
can be encoded and decoded from JSON, YAML
and protobuf.

All models should be refactored
into new proto specification over time.

Streaming presence service
--------------------------

In order to cut bandwidth, nodes
are sending full updates only when changes
to labels or spec have occured, otherwise
new light-weight GRPC keep alive updates are sent
over to the presence service, reducing
bandwidth usage on multi-node deployments.

In addition to that nodes are no longer polling
auth server for certificate authority rotation
updates, instead they subscribe to event updates
to detect updates as soon as they happen.

This is a new API, so the errors are inevitable,
that's why polling is still done, but
on a way slower rate.
2018-12-10 17:20:24 -08:00