mirror of
https://github.com/gravitational/teleport
synced 2024-10-19 08:43:58 +00:00
RFD 129: Avoid Discovery Resource Name Collisions (#27258)
* RFD: Auto-Discovery Resource Name Templates * Update 0129-discovery-name-templating.md * Update 0129-discovery-name-templating.md Expand on UX when user references unsupported template var * Update 0129-discovery-name-templating.md * Add dynamic config examples * Show proto message updates needed * Fixup error message example for tctl * rework RFD * remove config template discussion * explain a discovery naming convention approach * remove tsh proxy app entry * add web ui and Teleport Connect UX * expand full detail on ambiguous tsh error * fixup example formatting, aws account ID length, sort order * fix formatting of table * include azure region * consistent naming scheme * helps avoid collisions in rare cases of invalid resource group chars * update tsh UX * support --query and --labels flags instead of positional labels arg * clarify how prefix resource name matching will be implemented * update examples * address backward compat * update subcommands to include apps and db logout
This commit is contained in:
parent
90f1d9a536
commit
2dacc4d35c
448
rfd/0129-discovery-name-templating.md
Normal file
448
rfd/0129-discovery-name-templating.md
Normal file
|
@ -0,0 +1,448 @@
|
|||
---
|
||||
authors: Gavin Frazar (gavin.frazar@goteleport.com)
|
||||
state: draft
|
||||
---
|
||||
|
||||
# RFD 0129 - Avoid Discovery Resource Name Collisions
|
||||
|
||||
## Required Approvers
|
||||
|
||||
- Engineering: `@r0mant && @smallinsky && @tigrato`
|
||||
- Product: `@klizhentas || @xinding33`
|
||||
- Security: `@reedloden || @jentfoo`
|
||||
|
||||
## What
|
||||
|
||||
Auto-Discovery shall name discovered resources such that other resources of
|
||||
the same kind are unlikely to have the same name.
|
||||
|
||||
In particular, discovered cloud resource names shall include uniquely
|
||||
identifying metadata in the name such as region, account ID, or sub-type name.
|
||||
|
||||
`tsh` sub-commands shall allow users to use a prefix of the resource name when
|
||||
the prefix unambiguously identifies a resource.
|
||||
|
||||
Additionally, `tsh` sub-commands shall support using label selectors to
|
||||
unambiguously select a single resource.
|
||||
|
||||
This RFD does not apply to ssh server instance discovery, since servers are
|
||||
already identified within the Teleport cluster by a UUID.
|
||||
|
||||
## Why
|
||||
|
||||
Multiple discovery agents can discover resources with identical names.
|
||||
For example, this happened when customers had databases in different AWS
|
||||
regions or accounts with the same name. When a name collision occurs, only one
|
||||
of the databases can be accessed by users.
|
||||
|
||||
Name collisions can be avoided with the addition of other resource metadata
|
||||
in the resource name.
|
||||
|
||||
Since discovered resource names will be longer and more tedious to use, we
|
||||
should support resource name prefixes and label matching in `tsh`, Teleport
|
||||
Connect, and the web UI for better UX.
|
||||
|
||||
Relevant issue:
|
||||
- https://github.com/gravitational/teleport/issues/22438
|
||||
|
||||
## Details
|
||||
|
||||
#### AWS Discovery
|
||||
|
||||
Discovered database and kube cluster names shall have a lowercase suffix
|
||||
appended to it that includes:
|
||||
|
||||
- Name of the AWS matcher type
|
||||
- `eks`, `rds`, `rdsproxy`, `redshift`, `redshift-serverless`, `elasticache`,
|
||||
`memorydb` (as of writing this RFD)
|
||||
- AWS region
|
||||
- AWS account ID
|
||||
|
||||
All of these AWS resource types require a unique name within an AWS account
|
||||
and region.
|
||||
|
||||
By including the region and account ID, resources of the same kind
|
||||
in different AWS accounts or regions will avoid name collision with each-other.
|
||||
|
||||
By including the Teleport matcher type in the name, resources of different
|
||||
sub-kinds will also avoid name collision.
|
||||
|
||||
By combining these properties, resource names will not collide.
|
||||
|
||||
The reason for including `eks` in kube cluster names, even though this is the
|
||||
only "kind" of kube cluster we discover in AWS, is to clearly distinguish the
|
||||
cluster further from clusters in other clouds, although this isn't strictly
|
||||
necessary.
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
discovery_service:
|
||||
enabled: true
|
||||
aws:
|
||||
- types: ["eks", "rds", "redshift"]
|
||||
regions: ["us-west-1", "us-west-2"]
|
||||
assume_role_arn: "arn:aws:iam::111111111111:role/DiscoveryRole"
|
||||
external_id: "123abc"
|
||||
tags:
|
||||
"*": "*"
|
||||
- types: ["eks", "rds", "redshift"]
|
||||
regions: ["us-west-1", "us-west-2"]
|
||||
assume_role_arn: "arn:aws:iam::222222222222:role/DiscoveryRole"
|
||||
external_id: "456def"
|
||||
tags:
|
||||
"*": "*"
|
||||
```
|
||||
|
||||
If the discovery service is configured like the above, the discovery agent will
|
||||
discover AWS EKS clusters and AWS RDS and Redshift databases in the `us-west-1`
|
||||
and `us-west-2` AWS regions, in AWS accounts `111111111111` and `222222222222`.
|
||||
|
||||
Now suppose that an EKS cluster, RDS database, and Redshift database all named
|
||||
`foo` exist in both regions in both AWS accounts.
|
||||
If the discovery service applies the new naming convention, the discovered
|
||||
resources should be named:
|
||||
|
||||
- `foo-eks-us-west-1-111111111111`
|
||||
- `foo-eks-us-west-2-111111111111`
|
||||
- `foo-eks-us-west-1-222222222222`
|
||||
- `foo-eks-us-west-2-222222222222`
|
||||
- `foo-rds-us-west-1-111111111111`
|
||||
- `foo-rds-us-west-2-111111111111`
|
||||
- `foo-rds-us-west-1-222222222222`
|
||||
- `foo-rds-us-west-2-222222222222`
|
||||
- `foo-redshift-us-west-1-111111111111`
|
||||
- `foo-redshift-us-west-2-111111111111`
|
||||
- `foo-redshift-us-west-1-222222222222`
|
||||
- `foo-redshift-us-west-2-222222222222`
|
||||
|
||||
This naming convention does not violate our database name validation regex,
|
||||
`^[a-z]([-a-z0-9]*[a-z0-9])?$`,
|
||||
and does not violate our kube cluster name validation regex `^[a-zA-Z0-9._-]+$`.
|
||||
|
||||
#### Azure Discovery
|
||||
|
||||
Azure resources have a resource ID that uniquely identifies the resource, e.g.:
|
||||
`/subscriptions/00000000-1111-2222-3333-444444444444/resourceGroups/<group name>/providers/<provider name>/<name>`
|
||||
|
||||
We could use this ID as the database name, but it is unnecessarily verbose.
|
||||
It will also fail to match our database name validation regex:
|
||||
`[a-z]([-a-z0-9]*[a-z0-9])?`.
|
||||
|
||||
Additionally, all of the Azure databases that Teleport currently supports
|
||||
require globally unique names (within the same type of database), because Azure
|
||||
assigns a DNS name:
|
||||
|
||||
- Redis: `<name>.redis.cache.windows.net`.
|
||||
- SQL Server: `<name>.database.windows.net`.
|
||||
- Postgres: `<name.postgres.database.azure.com`.
|
||||
- MySQL: `<name>.mysql.database.azure.com`.
|
||||
|
||||
MySQL/Postgres server names must be unique among both single-server and
|
||||
flexible-server instances.
|
||||
|
||||
Therefore, we can form a uniquely identifying name among Azure resources just by
|
||||
adding the kind of matcher to the resource name.
|
||||
However, AKS kube clusters do not require globally unique names - they only need
|
||||
to be unique within the same resource group in the same subscription.
|
||||
|
||||
Additionally, resource group names may contain characters that are not valid
|
||||
in Teleport database/kube names, so we must either omit the resource group name
|
||||
in those cases or perform some kind of string transform.
|
||||
If we include the resource region, it will serve as a heuristic to avoid name
|
||||
collision when resource group names contain invalid characters.
|
||||
Including resource region will also be consistent with the other cloud naming
|
||||
schemes.
|
||||
|
||||
To make the naming convention consistent, and to "future-proof" it, the
|
||||
naming convention will be to append a suffix that includes:
|
||||
|
||||
- Name of the Azure matcher type
|
||||
- `aks`, `mysql`, `postgres`, `redis`, `sqlserver` (as of writing this RFD)
|
||||
- Azure region
|
||||
- Azure resource group name
|
||||
- resource group names may contain characters that we do not allow in database
|
||||
or kube cluster names.
|
||||
The resource group name should be checked for invalid characters and dropped
|
||||
from the name suffix if it is invalid.
|
||||
This is only a heuristic, but any approach here will be a heuristic, and
|
||||
this is the simplest string transform we can do, which avoids confusing
|
||||
users with strange resource group names they don't recognize.
|
||||
- Azure subscription ID
|
||||
- subscription IDs only contains letters, digits, and hyphens.
|
||||
|
||||
Example:
|
||||
```yaml
|
||||
discovery_service:
|
||||
enabled: true
|
||||
aws:
|
||||
- types: ["aks", "mysql", "postgres"]
|
||||
regions: ["eastus"]
|
||||
subscriptions:
|
||||
- "11111111-1111-1111-1111-111111111111"
|
||||
- "22222222-2222-2222-2222-222222222222"
|
||||
resource_groups: ["group1", "group2", "weird-)(-group-name"]
|
||||
tags:
|
||||
"*": "*"
|
||||
```
|
||||
|
||||
If the discovery service is configured like the above, the discovery agent will
|
||||
discover Azure AKS kube clusters, Azure MySQL, and Azure PostgreSQL databases.
|
||||
|
||||
Now suppose that four AKS kube clusters named `foo` exist in each combination of
|
||||
resource group and subscription ID, and a MySQL database and Postgres database
|
||||
both named `foo` exist in the the `1111..` subscription and `group1`.
|
||||
If the discovery service applies the new naming convention, the discovered
|
||||
resources should be named:
|
||||
|
||||
- `foo-eastus-aks-group1-11111111-1111-1111-1111-111111111111`
|
||||
- `foo-eastus-aks-group2-11111111-1111-1111-1111-111111111111`
|
||||
- `foo-eastus-aks-group1-22222222-2222-2222-2222-222222222222`
|
||||
- `foo-eastus-aks-group2-22222222-2222-2222-2222-222222222222`
|
||||
- `foo-eastus-mysql-group1-11111111-1111-1111-1111-111111111111`
|
||||
- `foo-eastus-postgres-group1-11111111-1111-1111-1111-111111111111`
|
||||
|
||||
If resources exist within the Azure resource group `weird-)(-group-name`,
|
||||
then we simply drop the resource group name from the resource name:
|
||||
|
||||
- `foo-eastus-aks-11111111-1111-1111-1111-111111111111`
|
||||
- `foo-eastus-aks-22222222-2222-2222-2222-222222222222`
|
||||
- `foo-eastus-mysql-11111111-1111-1111-1111-111111111111`
|
||||
- ...
|
||||
|
||||
Unfortunately, this would allow name collisions across resource groups.
|
||||
|
||||
Alternatively, we could apply a transformation to the resource group name to
|
||||
make it valid.
|
||||
For example, base64 encode it, make the string lowercase, and replace the
|
||||
`[+/=]` characters with valid characters, maybe even truncating the result:
|
||||
(another heuristic, although less likely to collide names):
|
||||
|
||||
```sh
|
||||
$ echo "weird-)(-group-name" | base64 | sed 's#[+/=]#x#g' | tr '[:upper:]' '[:lower:]' | cut -c1-8
|
||||
d2vpcmqt
|
||||
$ echo "other-weird-)(-group-name" | base64 | sed 's#[+/=]#x#g' | tr '[:upper:]' '[:lower:]' | cut -c1-8
|
||||
b3rozxit
|
||||
```
|
||||
|
||||
- `foo-eastus-aks-d2vpcmqt-11111111-1111-1111-1111-111111111111`
|
||||
- `foo-eastus-aks-b3rozxit-11111111-1111-1111-1111-111111111111`
|
||||
- ...
|
||||
|
||||
Each database name will be unique, since `foo` must be globally unique among
|
||||
all Azure MySQL databases and globally unique among all Azure Postgres databases.
|
||||
|
||||
Even if a new database type is added that doesn't have this globally unique
|
||||
name property, the resource group name and subscription ID will avoid name
|
||||
collisions, and the databases will be distinguished from databases in other
|
||||
clouds.
|
||||
If resource group name has invalid characters, the Azure region will make name
|
||||
collisions even more unlikely.
|
||||
|
||||
Likewise, the discovered AKS clusters will avoid colliding with other kube
|
||||
clusters in Azure or other clouds.
|
||||
|
||||
This naming convention does not violate our database name validation regex,
|
||||
`^[a-z]([-a-z0-9]*[a-z0-9])?$`,
|
||||
and does not violate our kube cluster name validation regex `^[a-zA-Z0-9._-]+$`.
|
||||
|
||||
#### GCP Discovery
|
||||
|
||||
GCP discovery currently supports discovering only GKE kube clusters.
|
||||
|
||||
GKE cluster names are unique within the same GCP project ID and location/zone.
|
||||
|
||||
The discovery naming convention for GKE clusters shall be to append a suffix to
|
||||
the cluster name that includes:
|
||||
|
||||
- Name of the Teleport GCP matcher type
|
||||
- `gke`
|
||||
- GCP project ID
|
||||
- These can be custom, but will only consist of characters, digits, hyphens.
|
||||
- GCP location
|
||||
|
||||
```yaml
|
||||
gcp:
|
||||
- types: ["gke"]
|
||||
locations: ["us-west1", "us-west2"]
|
||||
tags:
|
||||
"*": "*"
|
||||
project_ids: ["my-project"]
|
||||
```
|
||||
|
||||
If the discovery service is configured like the above, the discovery agent will
|
||||
discover GCP GKE kube clusters in "my-project" in the `us-west1` and `us-west2`
|
||||
locations.
|
||||
|
||||
Now suppose GKE clusters named `foo` exist in each region.
|
||||
If the discovery service applies the new naming convention, the discovered
|
||||
resources should be named:
|
||||
|
||||
- `foo-gke-us-west1-my-project`
|
||||
- `foo-gke-us-west2-my-project`
|
||||
|
||||
This naming convention avoids name collisions between GKE clusters and does not
|
||||
collide with discovered AWS/Azure clusters.
|
||||
|
||||
This naming convention does not violate our kube cluster name validation regex:
|
||||
`^[a-zA-Z0-9._-]+$`
|
||||
|
||||
### `tsh` UX
|
||||
|
||||
Users will be frustrated if they are forced to type out verbose resource names
|
||||
when using `tsh`.
|
||||
To avoid this poor UX, sub-commands should support prefix resource names, label
|
||||
matching, or using a predicate expression to select a resource.
|
||||
|
||||
The same UX should apply to all `tsh` sub-commands that take a resource name
|
||||
argument. These commands shall support
|
||||
`tsh <sub-command> [--labels keys=val1,key2=val2,...] [--query <predicate>] [name | prefix]` syntax:
|
||||
|
||||
- `tsh db login`
|
||||
- `tsh db logout`
|
||||
- `tsh db connect`
|
||||
- `tsh db env`
|
||||
- `tsh db config`
|
||||
- `tsh kube login`
|
||||
- `tsh app login`
|
||||
- `tsh app logout`
|
||||
- `tsh app config`
|
||||
- `tsh proxy db`
|
||||
- `tsh proxy kube`
|
||||
- `tsh proxy app`
|
||||
|
||||
To support prefix names, we add a new predicate expression function
|
||||
`hasPrefix`, and change the `tsh` API calls to use `hasPrefix(name, "<prefix>")`
|
||||
rather than the current predicate expression `name == "<name>"`.
|
||||
|
||||
The `--query` flag provides the full power of the predicate language, which
|
||||
includes label matching.
|
||||
The `--labels` flag provides a less powerful, but more convenient notation for
|
||||
selecting a resource by matching labels.
|
||||
|
||||
We already support both of these cli features as either a flag or positional arg
|
||||
in other `tsh` commands, e.g. `tsh db ls --query="..." key1=val1,key2=val2,...`
|
||||
|
||||
When `--query` is used along with a positional arg for the resource name or
|
||||
prefix, we will need to combine the two as a single predicate expression, e.g.
|
||||
`tsh db connect --query='labels.env == "prod"' foo-db`
|
||||
will be combined into the predicate expression `hasPrefix(name, "foo-db") && (labels.env == "prod")`
|
||||
|
||||
#### `tsh` examples
|
||||
|
||||
To illustrate the new UX for `tsh` sub-commands, here is an example using
|
||||
`tsh db connect` to select a database (the same applies for other commands):
|
||||
|
||||
```sh
|
||||
$ tsh db ls
|
||||
Name Description Allowed Users Labels Connect
|
||||
------ ------------------- ------------------- --------------------------- -------
|
||||
bar RDS instance in ... [*] account-id=123456789012,region=us-west-1,env=dev,...
|
||||
bar RDS instance in ... [*] account-id=123456789012,region=us-west-2,env=dev,...
|
||||
foo RDS instance in ... [*] account-id=123456789012,region=us-west-1,env=prod,...
|
||||
|
||||
# connect by prefix name
|
||||
$ tsh db connect --db-user=alice --db-name-postgres foo
|
||||
#...connects to "foo-rds-us-west-1-123456789012" by prefix...
|
||||
|
||||
# ambiguous prefix name is an error
|
||||
$ tsh db connect --db-user=alice --db-name-postgres bar
|
||||
error: ambiguous database name could match multiple databases:
|
||||
Name Description Protocol Type URI Allowed Users Labels Connect
|
||||
------------------------------ ------------------------- -------- ---- ----------------------------------------------------- ------------- ----------------------------------------------------------------------------------------------------------------------------------------- -------
|
||||
bar-rds-us-west-1-123456789012 RDS instance in us-west-1 postgres rds bar.abcdefghijklmnop.us-west-1.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=dev,region=us-west-1,teleport.dev/origin=dynamic
|
||||
bar-rds-us-west-2-123456789012 RDS instance in us-west-2 postgres rds bar.abcdefghijklmnop.us-west-2.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=dev,region=us-west-2,teleport.dev/origin=dynamic
|
||||
|
||||
Hint: try addressing the database by its full name or by matching its labels (ex: tsh db connect key1=value1,key2=value2).
|
||||
Hint: use `tsh db ls -v` or `tsh db ls --format=[yaml | json]` to list all databases with verbose details.
|
||||
|
||||
# resolve the error by connecting with an unambiguous prefix
|
||||
$ tsh db connect --db-user=alice --db-name-postgres bar-rds-us-west-2
|
||||
#...connects to "bar-rds-us-west-2-123456789012" by prefix...
|
||||
|
||||
# or connect by label(s) using --labels
|
||||
$ tsh db connect --db-user=alice --db-name-postgres --labels region=us-west-2
|
||||
#...connects to "bar-rds-us-west-2-123456789012" by matching region label...
|
||||
|
||||
# or connect by label(s) in a --query predicate
|
||||
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-2"'
|
||||
#...connects to "bar-rds-us-west-2-123456789012" by matching region label...
|
||||
|
||||
# ambiguous label(s) match is also an error
|
||||
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-1"'
|
||||
error: ambiguous database query matches multiple databases:
|
||||
Name Description Protocol Type URI Allowed Users Labels Connect
|
||||
------------------------------ ------------------------- -------- ---- ----------------------------------------------------- ------------- ----------------------------------------------------------------------------------------------------------------------------------------- -------
|
||||
bar-rds-us-west-1-123456789012 RDS instance in us-west-1 postgres rds bar.abcdefghijklmnop.us-west-1.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=dev,region=us-west-1,teleport.dev/origin=dynamic
|
||||
foo-rds-us-west-1-123456789012 RDS instance in us-west-1 postgres rds foo.abcdefghijklmnop.us-west-1.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=prod,region=us-west-1,teleport.dev/origin=dynamic
|
||||
|
||||
Hint: try addressing the database by its full name or by matching its labels (ex: tsh db connect key1=value1,key2=value2).
|
||||
Hint: use `tsh db ls -v` or `tsh db ls --format=[yaml | json]` to list all databases with verbose details.
|
||||
|
||||
# resolve the error by using either more specific labels or adding a prefix name
|
||||
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-1"' foo
|
||||
#...connects to "foo-rds-us-west-1-123456789012" by prefix and label...
|
||||
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-1" && labels.env == "prod"'
|
||||
#...connects to "foo-rds-us-west-1-123456789012" by multiple labels...
|
||||
```
|
||||
|
||||
### Web UI and Teleport Connect UX
|
||||
|
||||
Both the web UI and Teleport Connect already support searching for substrings
|
||||
in resource names and labels.
|
||||
|
||||
Searching by substring is a "fuzzier" kind of search than prefix-based name
|
||||
search (like this RFD proposed prefix-based search for `tsh`) - it's more
|
||||
likely to match more than one resource.
|
||||
However, GUI UX is fundamentally different from CLI - users can search and then
|
||||
interactively select from multiple matching resources.
|
||||
So this kind of search is appropriate for the web UI and Teleport Connect, but
|
||||
not for `tsh`.
|
||||
|
||||
Both web UI and Teleport Connect also support label-based searching with
|
||||
the predicate language, e.g.:
|
||||
|
||||
```
|
||||
labels["env"] == "dev" && labels["region"] == "us-west-1"
|
||||
```
|
||||
|
||||
Therefore, no UX changes are required for these user interfaces.
|
||||
|
||||
### Security
|
||||
|
||||
No security concerns I can think of.
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
|
||||
If the Teleport Discovery service is upgraded, but `tsh` is not, then
|
||||
we may break backwards compatibility with user automation scripts, and/or
|
||||
frustrate users with long names they must type fully, since their `tsh` does
|
||||
not have the UX improvements.
|
||||
|
||||
Solution: backport `tsh` UX changes to prior versions and reserve changes to
|
||||
the Teleport Discovery naming schema for v14.
|
||||
This way users can continue to type the old names of discovered resources and
|
||||
connect by prefix match.
|
||||
|
||||
`tsh` UX changes will add a new predicate expression `hasPrefix` to the
|
||||
server-side predicate resource parser.
|
||||
If a user has a newer `tsh` version than the server, then `hasPrefix` may not
|
||||
be supported by the server and `tsh` will get an error.
|
||||
To avoid issues, we can make `tsh` fallback to listing resources without a
|
||||
predicate expression and filter the results by matching prefix name.
|
||||
|
||||
### Audit Events
|
||||
|
||||
N/A
|
||||
|
||||
### Test Plan
|
||||
|
||||
We should test that discovering multiple resources with identical names does not
|
||||
suffer name collisions.
|
||||
|
||||
Setup identically named RDS databases and kube clusters in different AWS regions
|
||||
and a discovery agent to discover them.
|
||||
|
||||
Check that the resources in each region are discovered and differentiated by
|
||||
region in their name.
|
||||
|
Loading…
Reference in a new issue