self-hosted/teleport

mirror of https://github.com/gravitational/teleport synced 2024-10-19 00:33:50 +00:00

Author	SHA1	Message	Date
Carson Anderson	edff37226c	Add Prometheus metrics cache events and stale events (#9826 ) This adds two Prometheus metrics teleport_cache_events and teleport_cache_stale_events with one label indicating the service.	2022-02-11 09:14:42 -07:00
Carson Anderson	b384de6007	Add teleport_reverse_tunnels_connected Prometheus metric (#9698 ) Adds teleport_reverse_tunnels_connected Prometheus metric which tracks reverse tunnels connected to the proxy server by type. * Update prometheus help Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com> * Update metrics wording Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>	2022-02-02 20:52:19 +00:00
Carson Anderson	a8a57b19f8	Add metric tracking number of Teleport agents joined to cluster (#9749 ) Adds the Prometheus metric teleport_registered_servers which is a gauge indicating the unique number of Teleport instances connected to the cluster by version. Co-authored-by: Zac Bergquist <zmb3@users.noreply.github.com> Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>	2022-02-02 18:47:21 +00:00
Carson Anderson	6e3c703ddb	Add teleport_build_info Prometheus metric to Teleport (#9595 ) Adds teleport_build_info metric to Teleport providing the gitref, version, and Go version.	2022-01-05 21:17:54 +00:00
Russell Jones	85b6727f8f	Added metrics for missing SSH tunnels. Added metrics and logging for missing SSH reverse tunnels. This is useful for debugging to find if nodes are discovering all proxies.	2021-10-15 18:04:28 -07:00
rosstimothy	fb0ab2b9b7	Watcher System Metrics (#8338 ) * add event watcher prometheus metrics and a new tctl top tab to visualize them	2021-09-28 12:16:03 -04:00
Eugene Yakubovich	67c0eb3b4c	Add restricted session Adds the ability to block network traffic on SSH sessions. The deny/allow lists of IPs are specified in teleport.yaml file. Supports both IPv4 and IPv6 communication. This feature currently relies on enhanced recording for cgroup management so that needs to be enabled as well. -- Design rationale: This patch uses Linux Security Module (LSM) hooks, specifically security_socket_connect and security_socket_sendmsg, to control egress traffic. The LSM provides two advantages over socket filtering program types. - It's executed early enough that the task information is available. This makes it easy to report PID, COMM, etc. - It becomes a model for extending restrictions beyond networking. The set of enforced cgroups is stored in a BPF hash map and the deny/allow lists are stored in BPF trie maps. An IP address is first checked against the allow list. If found, it's checked for an override in the deny list. The policy is default deny. However, the absence of the NetworkRestrictions API object is allow all. IPv4 addresses are additionally registered in IPv6 trie (as mapped) to account for dual stacks. However it is unclear if this is sufficient as 4-to-6 transition methods utilize a multitude of translation and tunneling methods.	2021-07-16 16:49:04 -07:00
jane quin	7c9fd8e50d	Add additional Prometheus Metrics (#6511 )	2021-04-28 15:46:27 -07:00
Andrew Lytvynov	3004b65019	proxy: add proxy_ssh_sessions_total metric This is similar to server_interactive_sessions_total, but tracks all SSH sessions through a proxy.	2020-09-18 20:57:34 +00:00
Andrew Lytvynov	96375c7d3d	tctl: fix `tctl top` colors on dark terminals If we leave `TextStyle` empty on UI elements, it will use the default foreground color defined by the terminal (light for dark terminals and vice versa). Same goes for `BorderStyle`. A few other tweaks to UI and source metrics: - update table ratios to prevent hiding output rows on short (height) terminal windows - update tab selector style to use bold/underline instead of colors to mark selected tab - print `No data` in histogram tables when there are no values - don't report the local cluster in `remote_clusters` metric	2020-08-19 22:17:17 +00:00
Andrew Lytvynov	cd1344a4a5	Add prometheus metric mirroring /readyz state This allows users to get the health of their nodes from prometheus metrics pipeline instead of polling readyz separately. Updates #3700	2020-05-14 18:08:10 +00:00
Russell Jones	77e8b63470	Enhanced Session Recording. Added package cgroup to orchestrate cgroups. Only support for cgroup2 was added to utilize because cgroup2 cgroups have unique IDs that can be used correlated with BPF events. Added bpf package that contains three BPF programs: execsnoop, opensnoop, and tcpconnect. The bpf package starts and stops these programs as well correlating their output with Teleport sessions and emitting them to the audit log. Added support for Teleport to re-exec itself before launching a shell. This allows Teleport to start a child process, capture it's PID, place the PID in a cgroup, and then continue to process. Once the process is continued it can be tracked by it's cgroup ID. Reduced the total number of connections to a host so Teleport does not quickly exhaust all file descriptors. Exhausting all file descriptors happens very quickly when disk events are emitted to the audit log which are emitted at a very high rate. Added tarballs for exec sessions. Updated session.start and session.end events with additional metadata. Updated the format of session tarballs to include enhanced events. Added file configuration for enhanced session recording. Added code to startup enhanced session recording and pass package to SSH nodes.	2019-12-02 15:10:39 -08:00
Alexander Klizhentas	6b5935fb71	Use RADIX trees for prefix matching. (#2666 ) Buffer fan out used simple prefix match in a loop, what resulted in high CPU load on many connected watchers. This commit switches to RADIX trees for prefix matching what reduces CPU load substantially for 5K+ connected watchers.	2019-04-22 15:28:04 -07:00
Sasha Klizhentas	8356ae6a74	Use in-memory cache for the auth server API. This commit expands the usage of the caching layer for auth server API: * Introduces in-memory cache that is used to serve all Auth server API requests. This is done to achieve scalability on 10K+ node clusters, where each node fetches certificate authorities, roles, users and join tokens. It is not possible to scale DynamoDB backend or other backends on 10K reads per seconds on a single shard or partition. The solution is to introduce an in-memory cache of the backend state that is always used for reads. * In-memory cache has been expanded to support all resources required by the auth server. * Experimental `tctl top` command has been introduced to display common single node metrics. Replace SQLite Memory Backend with BTree SQLite in memory backend was suffering from high tail latencies under load (up to 8 seconds in 99.9%-ile on load configurations). This commit replaces the SQLite memory caching backend with in-memory BTree backend that brought down tail latencies to 2 seconds (99.9%-ile) and brought overall performance improvement.	2019-04-12 14:23:09 -07:00

14 commits