How to deploy easegress as a cluster (#369)

* fix small typos * draft doc for cluster deployment * fix typos * fix cookbook example * add new node and fix example * improve doc * more about reader nodes * modify cookbook chapter to use new configuration syntax * update cookbook links and update multi-node cluster doc * improve cookbook multi-node-cluster chapter * add diagrams and improve doc * fix doc according comments
easegress-io · Nov 30, 2021 · d266e79 · d266e79
1 parent 23cb833
commit d266e79
Show file tree

Hide file tree

Showing 8 changed files with 237 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -115,6 +115,7 @@ The following examples show how to use Easegress for different scenarios.
 - [WebAssembly](./doc/cookbook/wasm.md) - Using AssemblyScript to extend the Easegress
 - [WebSocket](./doc/cookbook/websocket.md) - WebSocket proxy for Easegress
 - [Workflow](./doc/cookbook/workflow.md) - An Example to make a workflow for a number of APIs.
+- [Cluster deployment](./doc/cookbook/multi_node_cluster.md) - How to deploy multiple Easegress cluster nodes.
 
 For full list, see [Cookbook](./doc/cookbook/README.md).
 

diff --git a/README.zh-CN.md b/README.zh-CN.md
@@ -115,6 +115,7 @@
 - [WebAssembly](./doc/cookbook/wasm.md) - 使用 AssemblyScript 来扩展 Easegress。
 - [WebSocket](./doc/cookbook/websocket.md) - Easegress 的 WebSocket 代理。
 - [工作流](./doc/cookbook/workflow.md) - 将若干 API 进行组合，定制为工作流。
+- [Easegress 集群化部署](./doc/cookbook/multi_node_cluster.md) - Easegress 如何进行集群化多点部署。
 
 完整的列表请参见 [Cookbook](./doc/cookbook/README.md)。
 

diff --git a/doc/cookbook/README.md b/doc/cookbook/README.md
@@ -17,3 +17,4 @@ The following examples show how to use Easegress for different scenarios.
 - [WebAssembly](./wasm.md) - Using AssemblyScript to extend the Easegress
 - [WebSocket](./websocket.md) - WebSocket proxy for Easegress
 - [Workflow](./workflow.md) - An Example to make a workflow for a number of APIs.
+- [Cluster deployment](./doc/cookbook/multi_node_cluster.md) - How to deploy multiple Easegress cluster nodes.
diff --git a/doc/cookbook/easegress-cluster-connections.png b/doc/cookbook/easegress-cluster-connections.png
diff --git a/doc/cookbook/easegress-cluster-nodes.png b/doc/cookbook/easegress-cluster-nodes.png
diff --git a/doc/cookbook/k8s_ingress_controller.md b/doc/cookbook/k8s_ingress_controller.md
@@ -82,11 +82,11 @@ spec:
     app: products
     department: sales
   ports:
-  - name: port_v1
+  - name: port-v1
     protocol: TCP
     port: 60001
     targetPort: 50001
-  - name: port_v2
+  - name: port-v2
     protocol: TCP
     port: 60002
     targetPort: 50002

diff --git a/doc/cookbook/multi_node_cluster.md b/doc/cookbook/multi_node_cluster.md
@@ -0,0 +1,231 @@
+
+# Easegress cluster
+
+- [Easegress cluster](#easegress-cluster)
+  - [Background](#background)
+  - [Prerequisite](#prerequisite)
+  - [Deploy an Easegress cluster step by step](#deploy-an-easegress-cluster-step-by-step)
+    - [Add new member](#add-new-member)
+  - [YAML Configuration (optional)](#yaml-configuration-optional)
+  - [Configuration tips (optional)](#configuration-tips-optional)
+  - [References](#references)
+
+## Background
+
+When to deploy Easegress as a cluster?
+- Your traffic is larger than one machine can handle
+- You want to High-availability and minimize the service downtime
+- You want to minimize latency on service peaks
+
+It is easy to start multiple Easegress instances to form an Easegress cluster. This tutorial provides instructions how to create stand-alone Easegress cluster by starting multiple Easegress instances.
+
+## Prerequisite
+
+The following prerequisites are required for a successful deployment of Easegress cluster.
+- latest `easegress-server` and `egctl` binaries (run `make` in root of the repository)
+- successful creation of an Easegress pipeline (like the Hello World example in the README.md of the repository or any other chapter in doc/cookbook)
+- few machines that are in the same network or otherwise accessible or Docker or other container technology. If you only have one machine, then you could use localhost as the host and modify the ports in the example.
+
+## Deploy an Easegress cluster step by step
+The goal of this tutorial is to have following infrastructure running Easegress:
+
+<p align="center">
+  <img src="./easegress-cluster-nodes.png" width=400>
+</p>
+
+- 4 machines connected
+- each running Easegress instance
+- 3 of Easegress instances have cluster role *primary* and one *secondary*
+
+The difference between *primary* and *secondary* cluster roles is that *primary* persists the cluster state to disk, while *secondary* does not. The number of *secondary* Easegress instances can scale up and down, but the number of *primary* instances should be fixed.
+
+Let's start by creating three Easegress instances with *primary* role. Add nodes private IPs to following environment variables:
+
+```bash
+export HOST1=<host1-IP>
+export HOST2=<host2-IP>
+export HOST3=<host3-IP>
+export CLUSTER=machine-1=https://$HOST1:2380,machine-2=https://$HOST2:2380,machine-3=https://$HOST3:2380
+```
+`CLUSTER` environment variable now contains IP addresses of each member in the cluster. It will be same for all members.
+
+Set the environment variables to each machine. Start the first instance at the first machine
+```bash
+easegress-server \
+  --cluster-name "multi-node-cluster" \
+  --cluster-role "primary" \
+  --name "machine-1" \
+  --api-addr $HOST1:2381 \
+  --initial-advertise-peer-urls https://$HOST1:2380 \
+  --listen-peer-urls https://$HOST1:2380 \
+  --listen-client-urls https://$HOST1:2379 \
+  --advertise-client-urls https://$HOST1:2379 \
+  --initial-cluster $CLUSTER
+```
+
+Here we define the basic information, like the name of the instance and the name of the cluster. Arguments `initial-advertise-peer-urls`,`listen-peer-urls`, `listen-client-urls` and `advertise-client-urls` are for communication with other peers (other primary cluster members). You can read more about them in the end of this tutorial, but for now it's enough to notice that hostname for *machine-1* is `$HOST1`, which is the IP address of this machine.
+
+Then start the second instance at machine 2
+```bash
+easegress-server \
+  --cluster-name "multi-node-cluster" \
+  --cluster-role "primary" \
+  --name "machine-2" \
+  --initial-advertise-peer-urls https://$HOST2:2380 \
+  --listen-peer-urls https://$HOST2:2380 \
+  --listen-client-urls https://$HOST2:2379 \
+  --advertise-client-urls https://$HOST2:2379 \
+  --initial-cluster $CLUSTER
+```
+and the last machine 3.
+```bash
+easegress-server \
+  --cluster-name "multi-node-cluster" \
+  --cluster-role "primary" \
+  --name "machine-3" \
+  --initial-advertise-peer-urls https://$HOST3:2380 \
+  --listen-peer-urls https://$HOST3:2380 \
+  --listen-client-urls https://$HOST3:2379 \
+  --advertise-client-urls https://$HOST3:2379 \
+  --initial-cluster $CLUSTER
+```
+
+Now you can list cluster members
+```bash
+egctl --server $HOST1:2381 member list | grep " name"
+```
+should print 
+```bash
+    name: machine-1
+    name: machine-2
+    name: machine-3
+```
+
+###  Add new member
+
+Let's add one more node with a *secondary* cluster role this time.
+
+```bash
+# on machine 4
+easegress-server \
+  --cluster-name "multi-node-cluster" \
+  --cluster-role "secondary" \
+  --name "machine-4" \
+  --primary-listen-peer-urls https://$HOST1:2380 \
+  --state-flag "existing"
+```
+Here `primary-listen-peer-urls` tells, where to find a *primary* cluster member and `state-flag` with value "existing" means that this cluster was already created.
+
+We can now see also the 4th instance:
+```bash
+egctl --server $HOST1:2381 member list | grep " name"
+# prints
+    name: machine-1
+    name: machine-2
+    name: machine-3
+    name: machine-4
+```
+Congratulations, you now have your Easegress instances running! You can now start applying resources to Easegress, like [pipeline](./pipeline.md) or [workflow](./workflow.md) for example.
+
+You can also keep reading this tutorial to know more about YAML configuration of Easegress cluster instances or configuration tips.
+
+## YAML Configuration (optional)
+
+The examples above use the *easegress-server's* command line flags, but often it is more convenient to define server parameters in a yaml configuration file. For example, store following yaml to each host machine and change the host addresses accordingly.
+
+```yaml
+# create one yaml file for each host
+name: machine-1 # machine-2, machine-3
+cluster-name: cluster-test
+cluster-role: primary
+api-addr: localhost:2381
+data-dir: ./data
+wal-dir: ""
+cpu-profile-file:
+memory-profile-file:
+log-dir: ./log
+debug: false
+cluster:
+  listen-peer-urls: # change CURRENT-HOST to current host
+   - https://<CURRENT-HOST>:2380
+  listen-client-urls:
+   - https://<CURRENT-HOST>:2379
+  advertise-client-urls:
+   - https://<CURRENT-HOST>:2379
+  initial-advertise-peer-urls:
+   - https://<CURRENT-HOST>:2380
+  initial-cluster: # initial-cluster is same for every host
+   - machine-1: https://<HOST-1>:2380
+   - machine-2: https://<HOST-2>:2380
+   - machine-3: https://<HOST-3>:2380
+```
+Then apply these values on each machine, using `config-file` command line argument:
+`easegress-server --config-file config.yaml`.
+
+The configuration file for adding new secondary node looks like following:
+
+```yaml
+name: machine-4
+cluster-name: cluster-test
+cluster-role: secondary
+data-dir: ./data
+wal-dir: ""
+cpu-profile-file:
+memory-profile-file:
+log-dir: ./log
+debug: false
+cluster:
+  primary-listen-peer-urls: https://$HOST1:2380
+```
+
+## Configuration tips (optional)
+
+*What is good size for cluster?*
+
+It is a good practice to choose an odd number (1,3,5,7,9) of *primary* nodes, to tolerate failures of *primary* nodes. This way the cluster can stay in healthy state, even if the network partitions. With an even number of *primary* nodes, the cluster can be divided to two groups of equal size due to network partition. Then neither of the sub-clusters have the majority required for consensus. However with odd number of *primary* nodes, the cluster cannot be divided to two groups of equal size and this problem cannot occur.
+
+For the *secondary* nodes, there is no constraints for the number of nodes. Secondary nodes do not participate consensus vote of the cluster, so their failure do not affect the cluster health. Adding more (*secondary*) nodes does still increase the communication between nodes.
+
+ *Can number of primary members scale up?*
+
+Please note that it is not recommended to add additional node with `primary` cluster role, but `primary` nodes should be started at cluster start up. When scaling up the cluster, it is recommended to add and remove `secondary` cluster members.
+
+*What are `advertise-peer-urls` and `listen-client-urls`?*
+
+`advertise-client-urls`, `listen-client-urls`, `listen-peer-urls` and `listen-client-urls` are arguments necessary for *primary* cluster members to communicate with other peers (members).
+
+| argument   |  description  |
+|-----|-----|
+| advertise-client-urls | client URLs member advertises to the rest of the cluster |
+| listen-client-urls | URLs that member listens for client traffic |
+| listen-peer-urls | URLs that member listens for peer (other *primary* members) traffic  |
+| initial-advertise-peer-urls | peer (other *primary* members) URLs member advertises to the rest of the cluster |
+
+These arguments are used for [etcd](https://etcd.io) server and client configuration. You can read more about them in etcd documentation.
+
+*Why *primary* member needs more arguments than *secondary* member ?*
+
+Here's a drawing that illustrates the difference.
+
+<p align="center">
+  <img src="./easegress-cluster-connections.png" width=300>
+</p>
+
+*Primary* members need to synchronize with peers (other *primary* members). Meanwhile *secondary* can read and update the state through one *primary* member.
+
+*How are etcd server and Easegress primary member related?*
+
+Easegress uses [etcd](https://etcd.io) distributed key-value store to synchronize the cluster state. The primary and secondary cluster roles have the following relation with `etcd`:
+
+| Easegress cluster role   | primary   | secondary   |
+|-----|-----|-----|
+| etcd term | server | client |
+
+*Primary* member uses etcd server for cluster communication, while *secondary* member uses etcd client for this.
+
+
+## References
+
+1. https://en.wikipedia.org/wiki/High-availability_cluster
+2. https://en.wikipedia.org/wiki/Raft_(algorithm)
+3. https://etcd.io/docs/v3.5/faq/
diff --git a/doc/cookbook/resilience.md b/doc/cookbook/resilience.md
@@ -14,7 +14,7 @@
     - [TimeLimiter](#timelimiter-1)
     - [Concepts](#concepts)
 
-The as a Cloud Native traffic orchestrator, Easegress supports build-in resilience features. It is the ability of your system to react to failure and still remain functional. It's not about avoiding failure, but accepting failure and constructing your cloud-native services to respond to it. You want to return to a fully functioning state quickly as possible.[1]
+As a Cloud Native traffic orchestrator, Easegress supports build-in resilience features. It is the ability of your system to react to failure and still remain functional. It's not about avoiding failure, but accepting failure and constructing your cloud-native services to respond to it. You want to return to a fully functioning state quickly as possible.[1]
 
 ## Basic: Load Balance