Refresh the HA guide (#24479)

* Refresh the HA guide

Closes #22742

Update the HA guide to reflect v12 changes in the `teleport-cluster`
Helm chart, e.g., running the Auth Service and Proxy Service in separate
compute pools.

* Respond to zmb3 feedback

* Respond to hugoShaka feedback

Remove mentions of running the Kubernetes Service on the Auth Service
hosts

* Respond to alexfornuto feedback

* Update docs/pages/deploy-a-cluster/high-availability.mdx

Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>

* Update docs/pages/deploy-a-cluster/high-availability.mdx

Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>

---------

Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
This commit is contained in:
Paul Gottschling 2023-05-04 17:28:27 -04:00 committed by GitHub
parent 35b837de87
commit cca1320193
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 179 additions and 107 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 134 KiB

After

Width:  |  Height:  |  Size: 165 KiB

View file

@ -1,6 +1,7 @@
---
title: "Deploying a High Availability Teleport Cluster"
description: "Deploying a High Availability Teleport Cluster"
tocDepth: 3
---
When deploying Teleport in production, you should design your deployment to
@ -16,14 +17,17 @@ deployment.
## Overview
A high-availability Teleport cluster revolves around a group of redundant
`teleport` processes, each of which runs the Auth Service and Proxy Service,
plus the infrastructure required to support them.
A high-availability Teleport cluster revolves around two pools of redundant
`teleport` processes, one running the Auth Service and one running the Proxy
Service, plus the infrastructure required to support each pool.
This includes:
- **A Layer 4 load balancer** to direct traffic from users and services to an
available `teleport` process.
Infrastructure components include:
- A **public Layer 4 load balancer** to direct traffic from users and services
to an available Proxy Service instance.
- A **private Layer 4 load balancer** to direct traffic from the Proxy Service
to the Auth Service's gRPC API, which is how Teleport manages the Auth
Service's backend state and provides credentials to users and services in your
cluster.
- A **cluster state backend**. This is a key-value store for cluster state and
audit events that all Auth Service instances can access. This requires
permissions for Auth Service instances to manage records within the key-value
@ -44,21 +48,32 @@ This includes:
![Diagram of a high-availability Teleport
architecture](../../img/deploy-a-cluster/teleport-ha-architecture.png)
## Layer 4 load balancer
## Layer 4 load balancers
The load balancer forwards traffic from users and services to an available
Teleport instance. This must not terminate TLS, and must transparently forward
the TCP traffic it receives. In other words, this must be a Layer 4 load
balancer, not a Layer 7 (e.g., HTTP) load balancer.
High-availability Teleport clusters require two load balancers:
- **Proxy Service load balancer:** A load balancer to receive traffic from
outside the network where your Teleport cluster is running and forward it to
an available Proxy Service instance. This load balancer handles TCP traffic
from users and services in a variety of application-layer protocols.
- **Auth Service load balancer:** A load balancer to forward traffic from a
Proxy Service instance to an available Auth Service instance. This handles TLS
traffic to the Auth Service's gRPC endpoint.
We recommend configuring your load balancer to route traffic across multiple
Both load balancers must transparently forward the TCP traffic they receive,
without terminating TLS. In other words, these must be Layer 4 load balancers,
not Layer 7 (e.g., HTTP).
We recommend configuring your load balancers to route traffic across multiple
zones (if using a cloud provider) or data centers (if using an on-premise
solution) to ensure availability.
### TLS Routing
### Configuring the Proxy Service load balancer
Your load balancer configuration depends on whether you will enable [TLS
Routing](../management/operations/tls-routing.mdx) in your Teleport cluster.
#### TLS Routing
The way you configure the Proxy Service load balancer depends on whether you
will enable [TLS Routing](../management/operations/tls-routing.mdx) in your
Teleport cluster.
With TLS Routing, the Teleport Proxy Service uses application-layer protocol
negotiation (ALPN) to handle all communication with users and services via the
@ -76,29 +91,30 @@ The approach we describe in this guide uses only a Layer 4 load balancer to
minimize the infrastructure you will deploy, but users that require a separate
load balancer for HTTPS traffic should disable TLS Routing.
### Configuring the load balancer
#### Open ports
Configure the load balancer to forward traffic from the following ports on the
load balancer to the corresponding port on an available Teleport instance. The
configuration depends on whether you will enable TLS Routing:
Configure the Proxy Service load balancer to forward traffic from the following
ports on the load balancer to the corresponding port on an available Proxy
Service instance. The configuration depends on whether you will enable TLS
Routing:
<Tabs>
<TabItem label="TLS Routing">
| Port | Description |
| - | - |
| `443` | ALPN port for TLS Routing. |
| Load Balancer Port | Proxy Service Port | Description |
| - | - | - |
| `443` | `3080` | ALPN port for TLS Routing. |
</TabItem>
<TabItem label="Separate Ports">
These ports are required:
| Port | Description |
| - | - |
| `3023` | SSH port for clients connect to. |
| `3024` | SSH port used to create reverse SSH tunnels from behind-firewall environments. |
| `443` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |
| Load Balancer Port | Proxy Service Port | Description |
| - | - | - |
| <nobr>`3023`</nobr> | <nobr>`3023`</nobr> | SSH port for clients connect to. |
| `3024` | `3024` | SSH port used to create reverse SSH tunnels from behind-firewall environments. |
| `443` | `3080` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |
You can leave these ports closed if you are not using their corresponding
services:
@ -112,6 +128,13 @@ services:
</TabItem>
</Tabs>
### Configuring the Auth Service load balancer
The Auth Service load balancer must forward traffic to the Auth Service's gRPC
port. In this guide, we are assuming that you have configured the Auth Service
load balancer to forward traffic from port `3025` to port `3025` on an available
Auth Service instance.
## Cluster state backend
The Teleport Auth Service stores cluster state (such as dynamic configuration
@ -256,23 +279,30 @@ records.
## Teleport instances
Run the Teleport Auth Service and Proxy Service as a scalable group of compute
resources, for example, a Kubernetes `Deployment` or AWS Auto Scaling group.
This requires running the `teleport` binary on each Kubernetes pod or virtual
machine or in your group.
Run the Teleport Auth Service and Proxy Service as two scalable groups of
compute resources, for example, using Kubernetes Deployments or AWS Auto
Scaling groups. This requires running the `teleport` binary on each Kubernetes
pod or virtual machine in your group.
<Notice type="tip">
If you plan to run Teleport on Kubernetes, the `teleport-cluster` Helm chart
deploys the Auth Service and Proxy Service pools for you. To see how to use this
Helm chart, read our [Helm Deployments](helm-deployments.mdx) documentation.
</Notice>
You should deploy your Teleport instances across multiple zones (if using a
cloud provider) or data centers (if using an on-premise solution) to ensure
availability.
In the [Configuration](#configuration) section, we will show you how to
configure each binary for high availability.
### Proxy Service pool
### Open ports
#### Open ports
Ensure that, on each Teleport instance, the following ports allow traffic from
the load balancer. The Proxy Service uses these ports to communicate with
Teleport users and services.
Ensure that, on each Proxy Service instance, the following ports allow traffic
from the Proxy Service load balancer. The Proxy Service uses these ports to
communicate with Teleport users and services.
As with your load balancer configuration, the ports you should open on your
Teleport instances depend on whether you will enable TLS Routing:
@ -282,7 +312,7 @@ Teleport instances depend on whether you will enable TLS Routing:
| Port | Description |
| - | - |
| `443` | ALPN port for TLS Routing. |
| `3080` | ALPN port for TLS Routing. |
</TabItem>
<TabItem label="Separate Ports">
@ -293,7 +323,7 @@ These ports are required:
| - | - |
| `3023` | SSH port for clients connect to. |
| `3024` | SSH port used to create reverse SSH tunnels from behind-firewall environments. |
| `443` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |
| `3080` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |
You can leave these ports closed if you are not using their corresponding
services:
@ -309,51 +339,19 @@ services:
*This is the same table of ports you used to configure the load balancer.*
### License file
#### Configuration
If you are deploying Teleport Enterprise, you need to download a license file
and make it available to your Teleport Auth Service instances.
Create a configuration file and provide it to each of your Proxy Service
instances at `/etc/teleport.yaml`. We will explain the required configuration
fields for a high-availability Teleport deployment below. These are the minimum
requirements, and when planning your high-availability deployment, you will want
to follow a more specific [deployment guide](introduction.mdx) for your
environment.
To obtain your license file, visit the [Teleport customer
dashboard](https://dashboard.gravitational.com/web/login) and log in. Click
"DOWNLOAD LICENSE KEY". You will see your current Teleport Enterprise account
permissions and the option to download your license file:
#### `proxy_service` and `auth_service`
![License File modal](../../img/enterprise/license.png)
The license file must be available to each Teleport Auth Service instance at
`/var/lib/teleport/license.pem`.
### Configuration
Create a configuration file and provide it to each of your Teleport instances at
`/etc/teleport.yaml`. We will explain the required configuration fields for a
high-availability Teleport deployment below. These are the minimum requirements,
and when planning your high-availability deployment, you will want to follow a
more specific [deployment guide](introduction.mdx) for your environment.
#### `storage`
The first configuration section to write is the `storage` section, which
configures the cluster state backend and session recording backend for the
Teleport Auth Service:
```yaml
version: v3
teleport:
storage:
# ...
```
Consult our [Backends Reference](../reference/backends.mdx) for the configuration
fields you should set in the `storage` section.
#### `auth_service` and `proxy_service`
The `auth_service` and `proxy_service` sections configure the Auth Service and
Proxy Service, which we will run together on each Teleport instance. The
configuration will depend on whether you are enabling TLS Routing in your
cluster:
The `proxy_service` section configures the Proxy Service. The configuration will
depend on whether you are enabling TLS Routing in your cluster:
<Tabs>
<TabItem label="TLS Routing">
@ -363,14 +361,8 @@ Teleport configuration:
```yaml
version: v3
teleport:
storage:
# ...
auth_service:
enabled: true
cluster_name: "mycluster.example.com"
# Remove this if not using Teleport Enterprise
license_file: "/var/lib/license/license.pem"
enabled: false
proxy_service:
enabled: true
public_addr: "mycluster.example.com:443"
@ -390,15 +382,9 @@ Teleport configuration:
```yaml
version: v3
teleport:
storage:
# ...
auth_service:
proxy_listener_mode: separate
enabled: true
cluster_name: "mycluster.example.com"
# Remove this if not using Teleport Enterprise
license_file: "/var/lib/license/license.pem"
enabled: false
proxy_service:
enabled: true
listen_addr: 0.0.0.0:3023
@ -416,21 +402,17 @@ reverse tunnel port (`tunnel_listen_addr`) for the Proxy Service.
</TabItem>
</Tabs>
The `auth_service` and `proxy_service` configurations above have the following
required settings for a high-availability Teleport deployment:
In the `proxy_service` section, we have enabled the Teleport Proxy Service
(`enabled`) and instructed it to find its TLS credentials in the
`/etc/teleport-tls` directory (`https_keypairs`).
- In the `auth_service` section, we have enabled the Teleport Auth Service
(`enabled`) and instructed it to find an Enterprise license file at
`/var/lib/license/license.pem` (`license_file`). Remove the `license_file`
field if you are deploying the open source edition of Teleport.
- In the `proxy_service` section, we have enabled the Teleport Proxy Service
(`enabled`) and instructed it to find its TLS credentials in the
`/etc/teleport-tls` directory (`https_keypairs`).
We have set `auth_service.enabled` to `false` to disable the Auth Service, which
is enabled by default, on each Proxy Service instance.
#### `ssh_service`
You can disable the SSH Service on each Teleport instance by adding the
following to each instance's configuration file:
The SSH Service is enabled by default. You can disable the SSH Service on each
Teleport instance by adding the following to each instance's configuration file:
```yaml
version: v3
@ -451,6 +433,96 @@ should not have direct access to the underlying node.
If you are deploying Teleport on a cluster of virtual machines, remove this line
to run the SSH Service and enable secure access to the host.
### Auth Service pool
#### Open ports
Ensure that, on each Auth Service instance, the following ports are open:
| Port | Description |
| - | - |
| <nobr>`3025`</nobr> | gRPC port to open to Proxy Service instances.|
#### License file
If you are deploying Teleport Enterprise, you need to download a license file
and make it available to your Teleport Auth Service instances.
(!docs/pages/includes//enterprise/obtainlicense.mdx!)
The license file must be available to each Teleport Auth Service instance at
`/var/lib/teleport/license.pem`.
#### Configuration
Create a configuration file and provide it to each of your Auth Service
instances at `/etc/teleport.yaml`. We will explain the required configuration
fields for a high-availability Teleport deployment below. These are the minimum
requirements, and when planning your high-availability deployment, you will want
to follow a more specific [deployment guide](introduction.mdx) for your
environment.
#### `storage`
The first configuration section to write is the `storage` section, which
configures the cluster state backend and session recording backend for the Auth
Service:
```yaml
version: v3
teleport:
storage:
# ...
```
Consult our [Backends Reference](../reference/backends.mdx) for the configuration
fields you should set in the `storage` section.
#### `auth_service` and `proxy_service`
The `auth_service` section configures the Auth Service:
```yaml
version: v3
teleport:
storage:
# ...
auth_service:
enabled: true
cluster_name: "mycluster.example.com"
# Remove this if not using Teleport Enterprise
license_file: "/var/lib/teleport/license.pem"
proxy_service:
enabled: false
```
In the `auth_service` section, we have enabled the Teleport Auth Service
(`enabled`) and instructed it to find an Enterprise license file at
`/var/lib/teleport/license.pem` (`license_file`). Remove the `license_file` field
if you are deploying the open source edition of Teleport.
Since we are running Proxy Service instances in a dedicated pool, we have
disabled the Proxy Service on our Auth Service instances by setting
`proxy_service.enabled` to `false`.
#### `ssh_service`
As with the Proxy Service pool, you can disable the SSH Service on each Teleport
instance by adding the following to each instance's configuration file:
```yaml
version: v3
teleport:
storage:
# ...
auth_service:
# ...
proxy_service:
# ...
ssh_service:
enabled: false
```
## Next steps
### Refine your plan