Refresh the HA guide (#24479)

* Refresh the HA guide Closes #22742 Update the HA guide to reflect v12 changes in the `teleport-cluster` Helm chart, e.g., running the Auth Service and Proxy Service in separate compute pools. * Respond to zmb3 feedback * Respond to hugoShaka feedback Remove mentions of running the Kubernetes Service on the Auth Service hosts * Respond to alexfornuto feedback * Update docs/pages/deploy-a-cluster/high-availability.mdx Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com> * Update docs/pages/deploy-a-cluster/high-availability.mdx Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com> --------- Co-authored-by: STeve (Xin) Huang <xin.huang@goteleport.com>
2024-10-21 17:53:28 +00:00 · 2023-05-04 17:28:27 -04:00 · 2023-05-04 17:28:27 -04:00 · cca1320193
parent 35b837de87
commit cca1320193
2 changed files with 179 additions and 107 deletions
--- a/docs/img/deploy-a-cluster/teleport-ha-architecture.png
+++ b/docs/img/deploy-a-cluster/teleport-ha-architecture.png
--- a/docs/pages/deploy-a-cluster/high-availability.mdx
+++ b/docs/pages/deploy-a-cluster/high-availability.mdx
@ -1,6 +1,7 @@
 ---
 title: "Deploying a High Availability Teleport Cluster"
 description: "Deploying a High Availability Teleport Cluster"
+tocDepth: 3
 ---

 When deploying Teleport in production, you should design your deployment to
@ -16,14 +17,17 @@ deployment.

 ## Overview

-A high-availability Teleport cluster revolves around a group of redundant
-`teleport` processes, each of which runs the Auth Service and Proxy Service,
-plus the infrastructure required to support them. 
+A high-availability Teleport cluster revolves around two pools of redundant
+`teleport` processes, one running the Auth Service and one running the Proxy
+Service, plus the infrastructure required to support each pool. 

-This includes:
-
- **A Layer 4 load balancer** to direct traffic from users and services to an
-  available `teleport` process.
+Infrastructure components include:
+- A **public Layer 4 load balancer** to direct traffic from users and services
+  to an available Proxy Service instance.
+- A **private Layer 4 load balancer** to direct traffic from the Proxy Service
+  to the Auth Service's gRPC API, which is how Teleport manages the Auth
+  Service's backend state and provides credentials to users and services in your
+  cluster.
 - A **cluster state backend**. This is a key-value store for cluster state and
  audit events that all Auth Service instances can access. This requires
  permissions for Auth Service instances to manage records within the key-value
@ -44,21 +48,32 @@ This includes:
 ![Diagram of a high-availability Teleport
 architecture](../../img/deploy-a-cluster/teleport-ha-architecture.png)

-## Layer 4 load balancer
+## Layer 4 load balancers

-The load balancer forwards traffic from users and services to an available
-Teleport instance. This must not terminate TLS, and must transparently forward
-the TCP traffic it receives. In other words, this must be a Layer 4 load
-balancer, not a Layer 7 (e.g., HTTP) load balancer. 
+High-availability Teleport clusters require two load balancers:
+- **Proxy Service load balancer:** A load balancer to receive traffic from
+  outside the network where your Teleport cluster is running and forward it to
+  an available Proxy Service instance. This load balancer handles TCP traffic
+  from users and services in a variety of application-layer protocols.
+- **Auth Service load balancer:** A load balancer to forward traffic from a
+  Proxy Service instance to an available Auth Service instance. This handles TLS
+  traffic to the Auth Service's gRPC endpoint.

-We recommend configuring your load balancer to route traffic across multiple
+Both load balancers must transparently forward the TCP traffic they receive,
+without terminating TLS. In other words, these must be Layer 4 load balancers,
+not Layer 7 (e.g., HTTP).
+
+We recommend configuring your load balancers to route traffic across multiple
 zones (if using a cloud provider) or data centers (if using an on-premise
 solution) to ensure availability. 

-### TLS Routing
+### Configuring the Proxy Service load balancer

-Your load balancer configuration depends on whether you will enable [TLS
-Routing](../management/operations/tls-routing.mdx) in your Teleport cluster.
+#### TLS Routing
+
+The way you configure the Proxy Service load balancer depends on whether you
+will enable [TLS Routing](../management/operations/tls-routing.mdx) in your
+Teleport cluster.

 With TLS Routing, the Teleport Proxy Service uses application-layer protocol
 negotiation (ALPN) to handle all communication with users and services via the
@ -76,29 +91,30 @@ The approach we describe in this guide uses only a Layer 4 load balancer to
 minimize the infrastructure you will deploy, but users that require a separate
 load balancer for HTTPS traffic should disable TLS Routing.

-### Configuring the load balancer
+#### Open ports

-Configure the load balancer to forward traffic from the following ports on the
-load balancer to the corresponding port on an available Teleport instance. The
-configuration depends on whether you will enable TLS Routing:
+Configure the Proxy Service load balancer to forward traffic from the following
+ports on the load balancer to the corresponding port on an available Proxy
+Service instance. The configuration depends on whether you will enable TLS
+Routing:

 <Tabs>
 <TabItem label="TLS Routing">

-| Port | Description |
-| - | - |
-| `443` | ALPN port for TLS Routing. |
+| Load Balancer Port | Proxy Service Port | Description |
+| - | - | - |
+| `443` | `3080` | ALPN port for TLS Routing. |

 </TabItem>
 <TabItem label="Separate Ports">

 These ports are required:

-| Port | Description |
-| - | - |
-| `3023` | SSH port for clients connect to. |
-| `3024` | SSH port used to create reverse SSH tunnels from behind-firewall environments. |
-| `443` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |
+| Load Balancer Port | Proxy Service Port | Description |
+| - | - | - |
+| <nobr>`3023`</nobr> | <nobr>`3023`</nobr> | SSH port for clients connect to. |
+| `3024` | `3024` | SSH port used to create reverse SSH tunnels from behind-firewall environments. |
+| `443` | `3080` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |

 You can leave these ports closed if you are not using their corresponding
 services:
@ -112,6 +128,13 @@ services:
 </TabItem>
 </Tabs>

+### Configuring the Auth Service load balancer
+
+The Auth Service load balancer must forward traffic to the Auth Service's gRPC
+port. In this guide, we are assuming that you have configured the Auth Service
+load balancer to forward traffic from port `3025` to port `3025` on an available
+Auth Service instance. 
+
 ## Cluster state backend

 The Teleport Auth Service stores cluster state (such as dynamic configuration
@ -256,23 +279,30 @@ records.

 ## Teleport instances

-Run the Teleport Auth Service and Proxy Service as a scalable group of compute
-resources, for example, a Kubernetes `Deployment` or AWS Auto Scaling group.
-This requires running the `teleport` binary on each Kubernetes pod or virtual
-machine or in your group. 
+Run the Teleport Auth Service and Proxy Service as two scalable groups of
+compute resources, for example, using Kubernetes Deployments or AWS Auto
+Scaling groups. This requires running the `teleport` binary on each Kubernetes
+pod or virtual machine in your group. 
+
+<Notice type="tip">
+
+If you plan to run Teleport on Kubernetes, the `teleport-cluster` Helm chart
+deploys the Auth Service and Proxy Service pools for you. To see how to use this
+Helm chart, read our [Helm Deployments](helm-deployments.mdx) documentation.
+
+</Notice>

 You should deploy your Teleport instances across multiple zones (if using a
 cloud provider) or data centers (if using an on-premise solution) to ensure
 availability.

-In the [Configuration](#configuration) section, we will show you how to
-configure each binary for high availability.
+### Proxy Service pool

-### Open ports
+#### Open ports

-Ensure that, on each Teleport instance, the following ports allow traffic from
-the load balancer. The Proxy Service uses these ports to communicate with
-Teleport users and services.
+Ensure that, on each Proxy Service instance, the following ports allow traffic
+from the Proxy Service load balancer. The Proxy Service uses these ports to
+communicate with Teleport users and services.

 As with your load balancer configuration, the ports you should open on your
 Teleport instances depend on whether you will enable TLS Routing: 
@ -282,7 +312,7 @@ Teleport instances depend on whether you will enable TLS Routing:

 | Port | Description |
 | - | - |
-| `443` | ALPN port for TLS Routing. |
+| `3080` | ALPN port for TLS Routing. |

 </TabItem>
 <TabItem label="Separate Ports">
@ -293,7 +323,7 @@ These ports are required:
 | - | - |
 | `3023` | SSH port for clients connect to. |
 | `3024` | SSH port used to create reverse SSH tunnels from behind-firewall environments. |
-| `443` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |
+| `3080` | HTTPS connections to authenticate `tsh` users into the cluster. The same connection is used to serve a Web UI. |

 You can leave these ports closed if you are not using their corresponding
 services:
@ -309,51 +339,19 @@ services:

 *This is the same table of ports you used to configure the load balancer.*

-### License file
+#### Configuration

-If you are deploying Teleport Enterprise, you need to download a license file
-and make it available to your Teleport Auth Service instances.
+Create a configuration file and provide it to each of your Proxy Service
+instances at `/etc/teleport.yaml`. We will explain the required configuration
+fields for a high-availability Teleport deployment below. These are the minimum
+requirements, and when planning your high-availability deployment, you will want
+to follow a more specific [deployment guide](introduction.mdx) for your
+environment. 

-To obtain your license file, visit the [Teleport customer
-dashboard](https://dashboard.gravitational.com/web/login) and log in. Click
-"DOWNLOAD LICENSE KEY". You will see your current Teleport Enterprise account
-permissions and the option to download your license file:
+#### `proxy_service` and `auth_service`

-![License File modal](../../img/enterprise/license.png)
-
-The license file must be available to each Teleport Auth Service instance at
-`/var/lib/teleport/license.pem`. 
-
-### Configuration
-
-Create a configuration file and provide it to each of your Teleport instances at
-`/etc/teleport.yaml`. We will explain the required configuration fields for a
-high-availability Teleport deployment below. These are the minimum requirements,
-and when planning your high-availability deployment, you will want to follow a
-more specific [deployment guide](introduction.mdx) for your environment. 
-
-#### `storage` 
-
-The first configuration section to write is the `storage` section, which
-configures the cluster state backend and session recording backend for the
-Teleport Auth Service:
-
-```yaml
-version: v3
-teleport:
-  storage:
-    # ...
-```
-
-Consult our [Backends Reference](../reference/backends.mdx) for the configuration
-fields you should set in the `storage` section.
-
-#### `auth_service` and `proxy_service` 
-
-The `auth_service` and `proxy_service` sections configure the Auth Service and
-Proxy Service, which we will run together on each Teleport instance. The
-configuration will depend on whether you are enabling TLS Routing in your
-cluster:
+The `proxy_service` section configures the Proxy Service. The configuration will
+depend on whether you are enabling TLS Routing in your cluster:

 <Tabs>
 <TabItem label="TLS Routing">
@ -363,14 +361,8 @@ Teleport configuration:

 ```yaml
 version: v3
-teleport:
-  storage:
-  # ...
 auth_service:
-  enabled: true
-  cluster_name: "mycluster.example.com"
-  # Remove this if not using Teleport Enterprise
-  license_file: "/var/lib/license/license.pem"
+  enabled: false
 proxy_service:
  enabled: true
  public_addr: "mycluster.example.com:443"
@ -390,15 +382,9 @@ Teleport configuration:

 ```yaml
 version: v3
-teleport:
-  storage:
-  # ...
 auth_service:
  proxy_listener_mode: separate
-  enabled: true
-  cluster_name: "mycluster.example.com"
-  # Remove this if not using Teleport Enterprise
-  license_file: "/var/lib/license/license.pem"
+  enabled: false
 proxy_service:
  enabled: true
  listen_addr: 0.0.0.0:3023
@ -416,21 +402,17 @@ reverse tunnel port (`tunnel_listen_addr`) for the Proxy Service.
 </TabItem>
 </Tabs>

-The `auth_service` and `proxy_service` configurations above have the following
-required settings for a high-availability Teleport deployment:
+In the `proxy_service` section, we have enabled the Teleport Proxy Service
+(`enabled`) and instructed it to find its TLS credentials in the
+`/etc/teleport-tls` directory (`https_keypairs`).

- In the `auth_service` section, we have enabled the Teleport Auth Service
-  (`enabled`) and instructed it to find an Enterprise license file at
-  `/var/lib/license/license.pem` (`license_file`). Remove the `license_file`
-  field if you are deploying the open source edition of Teleport. 
- In the `proxy_service` section, we have enabled the Teleport Proxy Service
-  (`enabled`) and instructed it to find its TLS credentials in the
-  `/etc/teleport-tls` directory (`https_keypairs`).
+We have set `auth_service.enabled` to `false` to disable the Auth Service, which
+is enabled by default, on each Proxy Service instance.

 #### `ssh_service`

-You can disable the SSH Service on each Teleport instance by adding the
-following to each instance's configuration file: 
+The SSH Service is enabled by default. You can disable the SSH Service on each
+Teleport instance by adding the following to each instance's configuration file: 

 ```yaml
 version: v3
@ -451,6 +433,96 @@ should not have direct access to the underlying node.
 If you are deploying Teleport on a cluster of virtual machines, remove this line
 to run the SSH Service and enable secure access to the host.

+### Auth Service pool
+
+#### Open ports
+
+Ensure that, on each Auth Service instance, the following ports are open: 
+
+| Port | Description |
+| - | - |
+| <nobr>`3025`</nobr> | gRPC port to open to Proxy Service instances.|
+
+#### License file
+
+If you are deploying Teleport Enterprise, you need to download a license file
+and make it available to your Teleport Auth Service instances.
+
+(!docs/pages/includes//enterprise/obtainlicense.mdx!)
+
+The license file must be available to each Teleport Auth Service instance at
+`/var/lib/teleport/license.pem`. 
+
+#### Configuration
+
+Create a configuration file and provide it to each of your Auth Service
+instances at `/etc/teleport.yaml`. We will explain the required configuration
+fields for a high-availability Teleport deployment below. These are the minimum
+requirements, and when planning your high-availability deployment, you will want
+to follow a more specific [deployment guide](introduction.mdx) for your
+environment. 
+
+#### `storage` 
+
+The first configuration section to write is the `storage` section, which
+configures the cluster state backend and session recording backend for the Auth
+Service:
+
+```yaml
+version: v3
+teleport:
+  storage:
+    # ...
+```
+
+Consult our [Backends Reference](../reference/backends.mdx) for the configuration
+fields you should set in the `storage` section.
+
+#### `auth_service` and `proxy_service`
+
+The `auth_service` section configures the Auth Service:
+
+```yaml
+version: v3
+teleport:
+  storage:
+  # ...
+auth_service:
+  enabled: true
+  cluster_name: "mycluster.example.com"
+  # Remove this if not using Teleport Enterprise
+  license_file: "/var/lib/teleport/license.pem"
+proxy_service:
+  enabled: false
+```
+
+In the `auth_service` section, we have enabled the Teleport Auth Service
+(`enabled`) and instructed it to find an Enterprise license file at
+`/var/lib/teleport/license.pem` (`license_file`). Remove the `license_file` field
+if you are deploying the open source edition of Teleport. 
+
+Since we are running Proxy Service instances in a dedicated pool, we have
+disabled the Proxy Service on our Auth Service instances by setting
+`proxy_service.enabled` to `false`.
+
+#### `ssh_service`
+
+As with the Proxy Service pool, you can disable the SSH Service on each Teleport
+instance by adding the following to each instance's configuration file: 
+
+```yaml
+version: v3
+teleport:
+  storage:
+  # ...
+auth_service:
+# ...
+proxy_service:
+# ...
+ssh_service:
+  enabled: false
+```
+
 ## Next steps

 ### Refine your plan