teleport/rfd/0060-gRPC-backend.md

201 lines
7 KiB
Markdown
Raw Normal View History

---
authors: Michael McAllister (michael.mcallister@goteleport.com)
state: canceled
---
2022-03-21 23:20:28 +00:00
# RFD 60 - gRPC Backend
## What
A new [backend](https://github.com/gravitational/teleport/tree/v8.3.1/lib/backend) that sends requests to a user defined gRPC server for persistence. Initially targeting cluster state with audit events to come later (in a subsequent RFD)
## Why
As Cloud-Hosted Teleport continues to grow, so does our need to have greater flexibility on not only what persistence layer we use (be it currently supported backends like DynamoDB, or otherwise) but _how_ these are implemented.
Take for instance, the current implementation of the [DynamoDB backend](https://github.com/gravitational/teleport/tree/v8.3.1/lib/backend/dynamo) which allows in its configuration to specify the table name, but does not presently have the ability to customize the Partition key. This means that it's not possible to co-locate multiple installations of Teleport within the same DynamoDB table, and as a result are bounded by [AWS account limits](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html#limits-tables) when deploying multiple installations.
Other backend solutions have their own challenges regarding authentication and authorization. For instance, most databases have limits on the number of users/roles that you can define which we'd need to leverage to ensure strong data isolation between each deployment. Using an authentication method that would allow one tenant to access another tenants data would be a non-option.
Implementing a gRPC client within Teleport allows not only the Cloud team, but external developers to implement:
- A custom persistence layer that may not be supported by Teleport natively
- Custom caching logic, connection pooling, etc.
- Custom database schema, partition scheme, etc.
- Custom authentication and authorization (mTLS, LDAP, etc.)
## Details
### Scope
It is proposed that the Teleport core codebase hold the protocol buffer definitions and the corresponding client.
To keep this RFD brief the Cloud teams (aspirational) implementation of the corresponding gRPC server is out of scope of this documentation and not discussed in detail, it will also be wholly owned by the Cloud team and not apart of the core codebase.
Additionally audit events are initially outside of scope, but will be implemented at a later date.
### Protobuf Definition
The proposed protobuf definition aims to mirror the interface that is defined in [backend.go](https://github.com/gravitational/teleport/blob/cf162af679f3c136b0cc5a7c5bfcd8bba14afdaa/lib/backend/backend.go#L41-L91)
```protobuf
message Item {
// Key is a key of the key value item
bytes key = 1;
// Value is a value of the key value item
bytes value = 2;
// Expires is an optional record expiry time
google.protobuf.Timestamp expires = 3 [
(gogoproto.stdtime) = true,
(gogoproto.nullable) = false
];
// ID is a record ID, newer records have newer ids
sfixed64 id = 4;
// LeaseID is a lease ID, could be set on objects with TTL
sfixed64 lease_id = 5;
}
message Lease {
// Key is a key of the key value item
bytes key = 1;
// ID is a record ID, newer records have newer ids
sfixed64 id = 2;
}
message Watch {
// resume_last_id is the last item index we saw, and we want to resume from any event after that point
sfixed64 resume_last_id = 1;
}
message CompareAndSwapRequest {
// Expected is the existing item
Item expected = 1;
// replace_with is the new item to swap
Item replace_with = 2;
}
message GetRequest {
// Key is a key of the key value item
bytes key = 1;
}
message GetRangeRequest {
// start_key is the starting key to request in the range request
bytes start_key = 1;
// end_key is the ending key to request in the range request
bytes end_key = 2;
// limit is the maximum number of results to return
int32 limit = 3;
}
message GetRangeResult {
repeated Item items = 1;
}
message DeleteRequest {
// Key is a key of the key value item
bytes key = 1;
}
message DeleteRangeRequest {
// start_key is the starting key to request in the range request
bytes start_key = 1;
// end_key is the ending key to request in the range request
bytes end_key = 2;
}
message KeepAliveRequest {
Lease lease = 1;
google.protobuf.Timestamp expires = 2 [
(gogoproto.stdtime) = true,
(gogoproto.nullable) = false
];
}
message Event {
// OpType specifies operation type
enum OpType {
UNKNOWN = 0;
UNRELIABLE = 1;
INVALID = 2;
INIT = 3;
PUT = 4;
DELETE = 5;
GET = 6;
}
OpType type = 1;
Item item = 2;
}
service BackendService {
// Create creates item if it does not exist
rpc Create(Item) returns (Lease);
// Put upserts value into backend
rpc Put(Item) returns (Lease);
// CompareAndSwap compares item with existing item and replaces is with replace_with item
rpc CompareAndSwap(CompareAndSwapRequest) returns (Lease);
// Update updates value in the backend
rpc Update(Item) returns (Lease);
// Get returns a single item or not found error
rpc Get(GetRequest) returns (Item);
// GetRange returns query range
rpc GetRange(GetRangeRequest) returns (GetRangeResult);
// Delete deletes item by key, returns NotFound error if item does not exist
rpc Delete(DeleteRequest) returns (google.protobuf.Empty);
// DeleteRange deletes range of items with keys between startKey and endKey
rpc DeleteRange(DeleteRangeRequest) returns (google.protobuf.Empty);
// KeepAlive keeps object from expiring, updates lease on the existing object,
// expires contains the new expiry to set on the lease,
// some backends may ignore expires based on the implementation
// in case if the lease managed server side
rpc KeepAlive(KeepAliveRequest) returns (google.protobuf.Empty);
// NewWatcher returns a new event watcher
rpc NewWatcher(Watch) returns (stream Event);
}
```
### Configuration
In order for the gRPC client (the Teleport side) to be able to talk to a server, at minimum the address of the server providing the gRPC service will be required, additionally the following existing configuration properties will be reused:
- buffer size for client size buffering
- mTLS details (CA, Cert, Key) (currently used for etcd)
Representing this in yaml the storage stanza of the configuration could potentially look like:
```yaml
storage:
type: grpc
server: endpoint.example.com:1992
tls_ca_file: /secrets/grpc/ca.crt
tls_cert_file: /secrets/grpc/tls.crt
tls_key_file: /secrets/tls.key
audit_events_uri: ['dynamodb://example-ddb-table.events'] # Future iteration will support grpc://endpoint.example.com:1992
audit_sessions_uri: s3://example-bucket/sessions
continuous_backups: true
auto_scaling: true
read_min_capacity: 20
read_max_capacity: 100
read_target_value: 50.0
write_min_capacity: 10
write_max_capacity: 100
write_target_value: 70.0
```