Crunchy Watch is an application that watches a PostgreSQL master and looks for a failure, at which point it will perform a failover scenario.
Failover scenarios are extensible. Sample failover scenarios are provided including:
-
trigger a failover on a random replica
-
trigger a failover on a replica using metadata labels
-
trigger a failover on a replica that is further ahead than others
Crunchy Watch is packaged into a Docker container which can execute in a pure Docker 1.12, Kubernetes 1.7, and Openshift 3.6 environments.
You can also run Crunchy Watch outside of a container as a binary.
Crunchy provides a commercially supported version of this container built upon RHEL 7 and the Crunchy supported PostgreSQL. Contact Crunchy for more details at link:https://www.crunchydata.com.
Crunchy Watch is designed to operate on multiple platforms. Therefore, it is necessary to specify the platform at startup.
$> crunchy-watch <platform>
Supported Platforms:
Name | Value |
---|---|
Kubernetes |
kube |
Example:
$> crunchy-watch kube
All options can be configured via a command-line flag or an environment variable.
Flag values will take precedence over values defined by an environment variable.
There are general options that apply across all platforms. As well, each platform provides their own specific options. The details for each are provided below.
Option | Environment Variable | Default | Description |
---|---|---|---|
--primary |
CRUNCHY_WATCH_PRIMARY |
host of the primary PostgreSQL instance |
|
--primary-port |
CRUNCHY_WATCH_PRIMARY_PORT |
5432 |
port of the primary PostreSQL instance |
--replica |
CRUNCHY_WATCH_REPLICA |
host of the replica PostgreSQL instance |
|
--replica-port |
CRUNCHY_WATCH_REPLICA_PORT |
5432 |
port of the replica PostgreSQL instance |
--username |
CRUNCHY_WATCH_USERNAME |
postgres |
login user to connect to PostgreSQL |
--password |
CRUNCHY_WATCH_PASSWORD |
login user’s password to connect to PostgreSQL |
|
--database |
CRUNCHY_WATCH_DATABASE |
postgres |
database to connect |
--timeout |
CRUNCHY_WATCH_TIMEOUT |
10s |
connection timeout - valid time units are "ns", "us", "ms", "s", "m", "h" |
--max-failures |
CRUNCHY_WATCH_MAX_FAILURES |
0 |
maximum number of failures before performing a failover |
--healthcheck-interval |
CRUNCHY_WATCH_HEALTHCHECK_INTERVAL |
10s |
interval between healthchecks - valid time units are "ns", "us", "ms", "s", "m", "h" |
--failover-wait |
CRUNCHY_WATCH_FAILOVER_WAIT |
50s |
time to wait for failover to process - valid time units are "ns", "us", "ms", "s", "m", "h" |
--pre-hook |
CRUNCHY_WATCH_PRE_HOOK |
failover hook to execute before processing failover |
|
--post-hook |
CRUNCHY_WATCH_POST_HOOK |
failover hook to execute after processing failover |
|
--debug |
CRUNCHY_DEBUG |
when set to true, causes debug level messages to be output |
Building crunchy-watch
, supporting plugin modules and docker image are
accomplished using make
and the provide Makefile.
These steps assume your normal userid is someuser and you are installing on a clean minimal Centos7 install.
sudo yum -y install docker sudo groupadd docker sudo systemctl enable docker sudo systemctl start docker sudo usermod -a -G docker someuser newgrp docker docker ps
export GOPATH=$HOME/cdev export PATH=$PATH:$GOPATH/bin export CCP_IMAGE_PREFIX=crunchydata export CCP_BASEOS=centos7 export CCP_PGVERSION=10 export CCP_PG_FULLVERSION=10.5 export CCP_VERSION=2.1.0 export CCP_IMAGE_TAG=$CCP_BASEOS-$CCP_PG_FULLVERSION-$CCP_VERSION export WATCH_CLI=kubectl export WATCH_NAMESPACE=demo export WATCH_ROOT=$GOPATH/src/github.com/crunchydata/crunchy-watch
In the case of Openshift:
export WATCH_CLI=oc
Then, build the project structure as follows:
mkdir -p $GOPATH/src $GOPATH/bin $GOPATH/pkg mkdir -p $GOPATH/src/github.com/crunchydata/ cd $GOPATH/src/github.com/crunchydata git clone https://github.com/CrunchyData/crunchy-watch.git cd crunchy-watch git checkout master
Configure storage for the Kube and Openshift examples by setting the following environment variables:
For NFS:
export CCP_STORAGE_CAPACITY=400M export CCP_NFS_IP=192.168.122.212 export CCP_STORAGE_MODE=ReadWriteMany export CCP_SECURITY_CONTEXT='"supplementalGroups": [65534]' export CCP_STORAGE_PATH=/nfsfileshare
For HostPath:
export CCP_STORAGE_CAPACITY=400M export CCP_STORAGE_MODE=ReadWriteMany export CCP_STORAGE_PATH=/data
Create the demo namespace:
$ kubectl create -f $WATCH_ROOT/conf/demo-namespace.json namespace "demo" created $ kubectl get namespace demo NAME STATUS AGE demo Active 7s
Then set the namespace as the current location to avoid using the wrong namespace:
$ kubectl config set-context $(kubectl config current-context) --namespace=demo
Note
|
To build the RHEL based image, you will need the Crunchy repo keys to be copied to the $GOPATH/src/github.com/crunchydata/crunchy-watch directory. This is because the RHEL image is based on the Crunchy RPM packages. |
cp CRUNCHY-GPG-KEY.public $GOPATH/src/github.com/crunchydata/crunchy-watch cp crunchypg*.repo $GOPATH/src/github.com/crunchydata/crunchy-watch
make docker-image
Target | Description |
---|---|
all |
(default) calls |
build |
builds |
modules |
builds all plugin modules |
kube-module |
builds kubernetes plugin module |
clean |
cleans all build related artifacts, including dependencies. |
resolve |
resolves all build related dependencies |
docker-image |
build docker image - Note: requires |
setup |
downloads required tools and docker image related dependencies |
Crunchy Watch is designed with extension of its function and supported platforms in mind.
Crunchy Watch makes use of the golang plugin package. Therefore it is possible to build support for new platforms separate from each other.
To integrate with the plugin system the following interface must be met:
type FailoverHandler interface { Failover() error SetFlags(*flag.FlagSet) }
Failover()
is called to process the failover logic for the platform that the
plugin supports.
SetFlags(*flag.FlagSet)
is called immediately after the plugin is loaded.
This allows for plugin to define options/flags that are unique to its
operation.
As well, it must be built with the -buildmode=plugin
option. See an example
of this in the project Makefile
Crunchy Watch provides both a pre
and post
failover hook. These hooks will
be executed in a shell environment created by the crunchy-watch
process.
Therefore they can be any executable or script that can be called by the user
running the crunchy-watch
process.
To configure the execution of these hooks, a fully qualified path to the
executable or script must be provided by either the --pre-hook
or
--post-hook
flags. Or by defining the CRUNCHY_WATCH_PRE_HOOK
or
CRUNCHY_WATCH_POST_HOOK
environment variables.
Example:
$> crunchy-watch kube --pre-hook=/tmp/watch-pre-hook
Or,
$> CRUNCHY_WATCH_PRE_HOOK=/tmp/watch-pre-hook crunchy-watch kube
There are 2 primary examples for using crunchy-watch provided. Both examples work for both Kubernetes and Openshift environments. Setting the WATCH_CLI environment variable to oc for Openshift or kubectl for Kubernetes is required to run the examples.
The first example has crunchy-watch watching 2 pods, a primary and a replica pod. Failover is performed on the primary pod.
To run the pod example, first start up the sample pods:
cd examples/sample-pods ./run.sh
To run crunchy-watch for watching this set of pods, run:
cd examples/kube ./run.sh
To trigger a failover of the primary Pod to the replica Pod enter the following:
$WATCH_CLI delete pod pr-primary $WATCH_CLI logs watch --follow
To verify watch logs for the folowing:
ERRO[2018-09-06T13:38:50Z] Could not reach 'pr-primary' (Attempt: 1) INFO[2018-09-06T13:38:50Z] Executing pre-hook: /hooks/watch-pre-hook INFO[2018-09-06T13:38:50Z] Processing Failover: Strategy - latest INFO[2018-09-06T13:38:50Z] Deleting existing primary... INFO[2018-09-06T13:38:50Z] Deleted old primary INFO[2018-09-06T13:38:50Z] Choosing failover replica... INFO[2018-09-06T13:38:50Z] Chose failover target (pr-replica) INFO[2018-09-06T13:38:50Z] Promoting failover replica... DEBU[2018-09-06T13:38:50Z] executing cmd: [/opt/cpm/bin/promote.sh] on pod pr-replica in namespace demo container: postgres INFO[2018-09-06T13:38:50Z] Relabeling failover replica... DEBU[2018-09-06T13:38:50Z] label: name DEBU[2018-09-06T13:38:50Z] label: replicatype INFO[2018-09-06T13:38:50Z] Executing post-hook: /hooks/watch-post-hook INFO[2018-09-06T13:39:00Z] Health Checking: 'pr-primary'
To clean up the example:
cd $WATCH_ROOT/examples/sample-pods ./cleanup.sh cd $WATCH_ROOT/examples/kube ./cleanup.sh
The 2nd example of crunchy-watch demonstrates failover of a Deployment. The sample Deployments used in the example are started as follows:
cd $WATCH_ROOT/examples/sample-deployments ./run.sh
Run the crunchy-watch Deployment example as follows:
cd $WATCH_ROOT/examples/kube-deployments ./run.sh
To trigger a failover of the primary Deployment to the replica Deployment enter the following:
$WATCH_CLI delete deploy watchprimary $WATCH_CLI logs watch --follow
To verify watch the logs for:
INFO[2018-09-06T15:13:12Z] Health Checking: 'watchprimary' ERRO[2018-09-06T15:13:22Z] dial tcp 10.99.3.81:5432: i/o timeout ERRO[2018-09-06T15:13:22Z] Could not reach 'watchprimary' (Attempt: 1) INFO[2018-09-06T15:13:22Z] Executing pre-hook: /hooks/watch-pre-hook INFO[2018-09-06T15:13:22Z] Processing Failover: Strategy - latest INFO[2018-09-06T15:13:22Z] Deleting existing primary... INFO[2018-09-06T15:13:22Z] deleting deployment WARN[2018-09-06T15:13:22Z] deployments.extensions "watchprimary" not found INFO[2018-09-06T15:13:22Z] Deleted old primary INFO[2018-09-06T15:13:22Z] Choosing failover replica... INFO[2018-09-06T15:13:22Z] Chose failover target (watchreplica-56c48c7f4b-68fcb) INFO[2018-09-06T15:13:22Z] Promoting failover replica... DEBU[2018-09-06T15:13:22Z] executing cmd: [/opt/cpm/bin/promote.sh] on pod watchreplica-56c48c7f4b-68fcb in namespace demo container: postgres INFO[2018-09-06T15:13:22Z] Relabeling failover replica... . . . INFO[2018-09-06T15:14:28Z] Health Checking: 'watchprimary' INFO[2018-09-06T15:14:28Z] Successfully reached 'watchprimary'
To clean up the example:
cd $WATCH_ROOT/examples/sample-deployments ./cleanup.sh cd $WATCH_ROOT/examples/kube-deployments ./cleanup.sh