6.8 KiB
authors | state |
---|---|
Grzegorz Zdunek (grzegorz.zdunek@goteleport.com) | draft |
RFD 97 - Teleport Connect usage metrics
Required Approvers
- Engineering:? @ravicious @zmb3
- Product: @klizhentas @xinding33
What
Collect downloads and usage metrics of Teleport Connect.
Why
Currently, the team has no information on how many users download, install and use Teleport Connect on a daily basis. In order to effectively plan the development of the product, the team should also know what is the adoption of new features, which ones are the most popular and which are problematic for users.
Details
Collecting events
Events for Connect will be collected on the client side. TypeScript code will have a stateless metrics service that will
forward them to the gRPC handler exposed by tsh daemon
, which will ultimately submit them to a service
called prehog
. To prevent flooding backend with a large number of small
requests, events will be batched before sending to prehog
. The batching mechanism has already been implemented
in UsageReporter
that will be used for collecting cluster events. tsh daemon
will try to reuse the same code as much
as possible (by providing its own batching parameters and submit function). Events will be sent once every hour (this
may change) and before closing the app.
It was considered to use an authorized endpoint provided by cluster's Auth Server
, but it seems to not work well for
Connect for a few reasons:
- Some events may not belong to any cluster (at the time of writing this RFD there is no such event, but the solution should be future-proof).
- Batch can contain events from multiple clusters.
- Batch can be sent after the session expires.
Anonymization
NOTE: The anonymization solution described below applies only to events that are associated with a cluster. Events that do not belong to any cluster but contain sensitive data will have to be anonymized in a different way.
Each event that contains sensitive data, like cluster name needs to be anonymized. It will be done in tsh daemon
, the
same way as in Auth Server
- using HMAC with unique cluster id
as the key. Connect will reuse the same code.
The only issue with anonymizing events client-side is lack of cluster id
that is kept in Auth Server
. To remedy
this, when the app starts and retrieves cluster information, it should also retrieve the cluster id
, create an
anonymizer and store it in the cluster struct.
Storing events
Batches of anonymized events will be sent to a public endpoint in prehog
(intended for use only by Connect) that
translates them into the PostHog's data model.
Connect events will share the same project with clusters and website events. It will allow to perform queries that need both sources of data, like calculating what is the percentage of users logging to a cluster with Connect.
Some event properties, like OS can be saved as a user property. These properties are
then stored directly on each event. For example, when
the first emitted event sets a user property os: windows
, each next event will have this property set.
To differentiate events coming from multiple application instances, each event needs to have distinct_id
field. It
will be supplied with UUID generated by Connect with connect.
prefix. The value will be created on the start and
stored in a file in the app data directory, so it will not change between restarts.
NOTE: As stated above, Connect events are tied to the application instance (or just the client machine). It means that PostHog's Person for Connect and for cluster will be a different thing. They should not be merged.
User agreement
On the start, Teleport Connect will ask user to opt in to volunteer anonymized metrics and usage-data with standard message "Are you OK sending anonymized usage data about Teleport Connect? This will help us to improve product".
If the user refuses, Connect will not send any usage data.
How will collecting metrics support product development?
In the initial version, it should help with getting answers to the following questions:
How many unique users download and use Teleport Connect today?
To answer the first part of the question, download counts from goteleport.com/download
are needed. These will be
collected from access logs from CloudFront CDN.
To calculate how many users use Teleport Connect on a daily basis, a metric like DAU (Daily Active Users) can be used. This metric can be based on a specific event, but in this case it should be calculated using any event. For example, user logged only once in a given day to refresh certs for a DB proxy connection - such user can be considered as active.
What features are the most popular?
Usage of each feature will be measured basing on events from the events section. They will allow to generate various statistics, like the most common kinds of connections or just show the usage of particular features like Access Requests.
What platforms are the most popular?
This will be measured in two ways.
- Based on downloads count for each platform.
- Based on a real usage - every event will contain the OS field. These events will be then aggregated by unique users.
How usage grows or shrinks over time?
PostHog allows to create Trends basing on DAU. It can be used to show how usage changes in a given period of time.
Events
connect.cluster.login
Successful login to a cluster.
Event properties:
cluster_name
: string (anonymized)user_name
: string (anonymized)connector_type
: stringos
: string (set once on a user properties)arch
: string (set once on a user properties) - CPU architectureos_version
: string (set on a user properties)connect_version
: string (set on a user properties)distinct_id
: string
connect.protcol.run
Connecting to the protocol.
Event properties:
cluster_name
: string (anonymized)user_name
: string (anonymized)protocol
: one ofssh
/proxy_db
/kube
distinct_id
: string
connect.accessRequest.create
Creating an access request.
Event properties:
cluster_name
: string (anonymized)user_name
: string (anonymized)kind
: one ofrole
,resource
distinct_id
: string
connect.accessRequests.review
Reviewing an access request.
Event properties:
cluster_name
: string (anonymized)user_name
: string (anonymized)distinct_id
: string
connect.accessRequests.assumeRole
Assuming a requested role.
Event properties:
cluster_name
: string (anonymized)user_name
: string (anonymized)distinct_id
: string
connect.fileTransfer.run
Running file transfer.
Event properties:
cluster_name
: string (anonymized)user_name
: string (anonymized)direction
: one ofupload
/download
distinct_id
: string