* RFD 108 - Agent Census (update) * Mention macOS agent Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com> * Add release versions --------- Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
14 KiB
authors | state |
---|---|
Vitor Enes (vitor@goteleport.com) | implemented (v11.3.8, v12.1.1) |
RFD 108 - Agent Census
Required Approvals
- Engineering: @zmb3 && @jimbishopp
- Product: @xin || @klizhentas
- Security: @wadells
What
This RFD details how we'll track more information about agents (aka Agent Census). A brief description of this task can be found in Cloud's RFD 53.
Goals
- Track more information about each Teleport agent (such as OS, OS version, architecture, installation methods, container runtime and others)
Non-goals
- Detail how this information will be analyzed / visualized.
Why
We want to understand how agents are installed and where they are running so that we can prioritize the work around cloud agent upgrades.
Details
Terminology
- Service: A Teleport service manages access to resources such as SSH nodes, kubernetes clusters, internal web applications, databases, and windows desktops.
- Agent: A
teleport
process that runs one or more Teleport services (depending on the configuration). - PreHog: A microservice used to capture user events across several Teleport tools.
Implementation Details
This section is divided in the following subsections:
- Data tracked: which data about each agent will be tracked
- Data collection: how such data will flow from the agents to PreHog
- Data computation: how to compute such data
Data tracked
We want to start tracking the following data in PreHog:
- Teleport version
- Teleport enabled services (
node
,kube
,app
,db
andwindows_desktop
) - OS (
linux
ordarwin
, as these are the only two OS currently supported) - OS version (e.g. Linux distribution)
- Host architecture (e.g.
amd64
) glibc
version (Linux only)- Installation methods (Dockerfile, Helm,
install-node.sh
) - Container runtime (e.g. Docker)
- Container orchestrator (e.g. Kubernetes)
- Cloud environment (e.g. AWS, GCP, Azure)
Data collection
Currently, when an agent first starts, the inventory control system (ICS) sends an UpstreamInventoryHello
message to the auth server.
This message has the following fields:
message UpstreamInventoryHello {
string Version = 1;
string ServerID = 2;
repeated string Services = 3 [(gogoproto.casttype) = "github.com/gravitational/teleport/api/types.SystemRole"];
string Hostname = 4;
}
The Version
field contains the Teleport version, while the Services
field contains the subset of the system roles that are currently active at the agent.
While initially we considered extending this message to contain all the agent metadata we want track, we decided to instead add a new message type UpstreamInventoryAgentMetadata
(see the message definition below).
Some of the agent metadata may be slow to compute (due to HTTP requests), and thus blocking the sending of the UpstreamInventoryHello
until such metadata is computed could potentially increase the agent start-up/connection time.
Instead, when the auth server handle is created at the agent (here), a new goroutine will be spawned in order to fetch the agent metadata in the background and send it every time a new stream with the auth server is established.
// UpstreamInventoryAgentMetadata is the message sent up the inventory control stream containing
// metadata about the instance.
message UpstreamInventoryAgentMetadata {
// OS advertises the instance OS ("darwin" or "linux").
string OS = 1;
// OSVersion advertises the instance OS version (e.g. "ubuntu 22.04").
string OSVersion = 2;
// HostArchitecture advertises the instance host architecture (e.g. "x86_64" or "arm64").
string HostArchitecture = 3;
// GlibcVersion advertises the instance glibc version of linux instances (e.g. "2.35").
string GlibcVersion = 4;
// InstallMethods advertises the install methods used for the instance (e.g. "dockerfile").
repeated string InstallMethods = 5;
// ContainerRuntime advertises the container runtime for the instance, if any (e.g. "docker").
string ContainerRuntime = 6;
// ContainerOrchestrator advertises the container orchestrator for the instance, if any
// (e.g. "kubernetes-v1.24.8-eks-ffeb93d").
string ContainerOrchestrator = 7;
// CloudEnvironment advertises the cloud environment for the instance, if any (e.g. "aws").
string CloudEnvironment = 8;
}
When the auth server receives an UpstreamInventoryAgentMetadata
message, it will take the information in the message and send it to PreHog.
For this, a new PreHog AgentMetadataEvent
message will be added (note that only the UpstreamInventoryHello.Hostname
won't be sent to PreHog as it can contain PII but also because it doesn't seem useful):
message AgentMetadataEvent {
string version = 1;
string host_id = 2;
repeated string services = 3;
string os = 4;
string os_version = 5;
string host_architecture = 6;
string glibc_version = 7;
repeated string install_methods = 8;
string container_runtime = 9;
string container_orchestrator = 10;
string cloud_environment = 11;
}
PostHog data
Some of the fields above are repeated
.
In PostHog, instead of storing these field values as arrays, we will create one event property for each element in the array (which will likely help visualizing this information in PostHog).
If, for example, AgentMetadataEvent.services
contains both node
and kube
, in PostHog we'll have the following three properties:
tp.agent.services = [node, kube]
tp.agent.service.node = true
tp.agent.service.kube = true
The same applies for AgentMetadataEvent.install_methods
.
Data computation
Both the Teleport version and active Teleport services are already tracked in the ICS. We detail below how the remaining data will be computed.
3. OS
UpstreamInventoryAgentMetadata.OS
will be set to the value on the GOOS
environment variable.
This will give us either darwin
or linux
as they are the only two supported OS for now.
4. OS version
On darwin
, UpstreamInventoryAgentMetadata.OSVersion
will be set to the outcome of (something equivalent to) $(sw_vers -productName) $(sw_vers -productVersion)
(e.g. "macOS 13.2"
).
This is what gopsutil
is doing (here).
On linux
, we'll inspect /etc/os-release
and combine the values associated with "NAME="
and "VERSION_ID="
(e.g. "Ubuntu 22.04").
If this file does not exist (unlikely, as it seems widely supported), we can fallback to /etc/lsb-release
and combine the values associated with "DISTRIB_ID="
and "DISTRIB_RELEASE="
(which is what gopsutil
is doing (here)).
Following this approach is more reliable than using /usr/bin/lsb_release
directly as it is not always available (e.g. docker run -ti ubuntu:22.04 lsb_release
fails).
5. Host architecture
UpstreamInventoryAgentMetadata.HostArchitecture
will be set to the value on the GOARCH
environment variable.
In the future we may use sysctl -n sysctl.proc_translated
in order to detect if a macOS agent is running under Rosetta.
6. glibc
version
If on linux
, UpstreamInventoryAgentMetadata.GLibCVersion
will be set to the output of gnu_get_libc_version
.
// #include <gnu/libc-version.h>
import "C"
func fetchGlibcVersion() string {
return C.GoString(C.gnu_get_libc_version())
}
7. Installation methods
Different installation methods will be tracked by setting new TELEPORT_INSTALL_METHOD_$NAME
environment variables to true
(where $NAME
is the installation method).
We have one environment variable for each installation method as some of the installation methods below may occur at the same time (e.g. Dockerfile
and teleport-kube-agent
, or install-node.sh
and APT
and systemctl
).
- Dockerfile:
ENV TELEPORT_INSTALL_METHOD_DOCKERFILE=true
will be added to the Dockerfile. teleport-kube-agent
Helm chart:TELEPORT_INSTALL_METHOD_HELM_KUBE_AGENT
will be set totrue
in the deployment spec.install-node.sh
:export TELEPORT_INSTALL_METHOD_NODE_SCRIPT="true"
will be added to this script. It is the recommended way to install SSH nodes, apps and many databases. Even thoughexport
doesn't persist across restarts, we can have the agent persist such value (and maybe all of the values sent inUpstreamInventoryAgentMetadata
) when it first starts.systemctl
: Tracking whether the agent is running usingsystemctl
does not require a new environment variable. For this, we'll simply check ifsystemctl status teleport.service
succeeds and, if so, if it contains the string"active (running)"
.
The installation methods that follow won't be tracked for now. Later on, we may try to track these if, once we start tracking the above installation methods, we notice that we're not yet covering most methods.
- tarball: We can add
export TELEPORT_INSTALL_METHOD_TARBALL="true"
to theinstall
script. (However, if the customer does not use theinstall
script and instead moves the binaries manually, we won't be able to track this installation method.) .deb
/.rpm
/.pkg
packages, APT or YUM repository, and Teleport AMIs: It's unclear ATM how these can be tracked.- built from source: While it's technically possible for customers to build Teleport from source, we won't try to track this installation method as it seems an unlikely use-case.
homebrew
: It's also possible to install Teleport on macOS usinghomebrew
. The Teleport package inhomebrew
is not maintained by us, so we will also not track this installation method.
In summary, we'll have the following values in UpstreamInventoryAgentMetadata.InstallMethods
for now:
dockerfile
helm_kube_agent
node_script
systemctl
8. Container runtime
To determine if the agent is running on Docker, we'll check if the file /.dockerenv
exists.
(Docker itself does this).
If so, UpstreamInventoryAgentMetadata.ContainerRuntime
will be set to docker
.
If we're interested in tracking other container runtimes, we could follow the approach by gopsutil
(here).
9. Container orchestrator
To determine if the agent is running on a Kubernetes pod, we can try to initialize a Kubernetes client
similar to how Validator.getClient() does it.
If this succeeds, the agent is running on Kubernetes.
Afterwards, we'll try to detect in which cloud provider the pod is running on.
For this, we'll call client.ServerVersion()
:
- in EKS, the git version looks like
"v1.24.8-eks-ffeb93d"
(i.e. contains the substring"-eks"
) - in GPC (docs), the git version looks like
"1.23.14-gke.1800"
(i.e. contains the substring"-gke"
) - in AKS, the git version looks like
"v1.25.2"
, so it's not possible to detect this environment using this method. (This is also a problem for Helm charts, as reported in Azure/AKS#3375.)
In the end, UpstreamInventoryAgentMetadata.ContainerOrchestrator
will be set to kubernetes-$GIT_VERSION
.
Initially we considered setting UpstreamInventoryAgentMetadata.ContainerOrchestrator
to kubernetes-eks
if on EKS, kubernetes-gcp
if on GCP and kubernetes-unknown
otherwise.
However, this will require changing the agent code in order to track AKS (if at some point they decide to include the substring "-aks"
) or some other container orchestrator that can also be detected using the git version.
10. Cloud environment
The only way to determine this seems to be by hitting certain HTTP endpoints specific to each cloud environment:
- AWS (docs): http://169.254.169.254/latest/
- GCP (docs): http://metadata.google.internal/computeMetadata/v1/
- Azure (docs): http://169.254.169.254/metadata/instance?api-version=2021-02-01
UpstreamInventoryAgentMetadata.CloudEnvironment
will be set to:
aws
if on AWSgcp
if on GCPazure
if on Azure
Security
Detecting the 9. Container orchestrator and 10. Cloud environment requires hitting certain HTTP endpoints. This may be considered too intrusive, so we have to make a decision on whether we really want to track it and argue why it's okay to do so.
The host ID will be anonymized as it may not be just a UUID.
Data sanitization
Nothing special is done regarding sanitization. This will be tackled more holistically in a follow-up project.
UX
Data analysis and visualization are not a goal for this RFD, so no UX concerns for now.