Orchestrate CockroachDB Across Multiple Kubernetes Clusters
This page shows you how to orchestrate a secure CockroachDB deployment across three Kubernetes clusters, each in a different geographic region, using the StatefulSet feature to manage the containers within each cluster and linking them together via DNS.
To deploy in a single Kubernetes cluster instead, see Kubernetes Single-Cluster Deployment. Also, for details about potential performance bottlenecks to be aware of when running CockroachDB in Kubernetes and guidance on how to optimize your deployment for better performance, see CockroachDB Performance on Kubernetes.
Before you begin
Before getting started, it's helpful to review some Kubernetes-specific terminology and current limitations.
Kubernetes terminology
Feature | Description |
---|---|
instance | A physical or virtual machine. In this tutorial, you'll run instances as part of three independent Kubernetes clusters, each in a different region. |
pod | A pod is a group of one of more Docker containers. In this tutorial, each pod will run on a separate instance and include one Docker container running a single CockroachDB node. You'll start with 3 pods in each region and grow to 4. |
StatefulSet | A StatefulSet is a group of pods treated as stateful units, where each pod has distinguishable network identity and always binds back to the same persistent storage on restart. StatefulSets are considered stable as of Kubernetes version 1.9 after reaching beta in version 1.5. |
persistent volume | A persistent volume is a piece of networked storage (Persistent Disk on GCE, Elastic Block Store on AWS) mounted into a pod. The lifetime of a persistent volume is decoupled from the lifetime of the pod that's using it, ensuring that each CockroachDB node binds back to the same storage on restart.This tutorial assumes that dynamic volume provisioning is available. When that is not the case, persistent volume claims need to be created manually. |
RBAC | RBAC, or Role-Based Access Control, is the system Kubernetes uses to manage permissions within the cluster. In order to take an action (e.g., get or create ) on an API resource (e.g., a pod ), the client must have a Role that allows it to do so. |
namespace | A namespace provides a scope for resources and names within a Kubernetes cluster. Names of resources need to be unique within a namespace, but not across namespaces. Most Kubernetes client commands will use the default namespace by default, but can operate on resources in other namespaces as well if told to do so. |
kubectl | kubectl is the command-line interface for running commands against Kubernetes clusters. |
kubectl context | A kubectl "context" specifies a Kubernetes cluster to connect to and authentication for doing so. You can set a context as the default using the kubectl use-context <context-name> command such that all future kubectl commands will talk to that cluster, or you can specify the —context=<context-name> flag on almost any kubectl command to tell it which cluster you want to run the command against. We will make heavy use of the —context flag in these instructions in order to run commands against the different regions' Kubernetes clusters. |
UX differences from running in a single cluster
These instructions create a StatefulSet that runs CockroachDB in each of the Kubernetes clusters you provide to the configuration scripts. These StatefulSets can be scaled independently of each other by running kubectl
commands against the appropriate cluster. These steps will also point each Kubernetes cluster's DNS server at the other clusters' DNS servers so that DNS lookups for certain zone-scoped suffixes (e.g., "*.us-west1-a.svc.cluster.local") can be deferred to the appropriate cluster's DNS server. However, in order to make this work, we create the StatefulSets in namespaces named after the zone in which the cluster is running. This means that in order to run a command against one of the pods, you have to run, e.g., kubectl logs cockroachdb-0 —namespace=us-west1-a
instead of just kubectl logs cockroachdb-0
. Alternatively, you can configure your kubectl
context to default to using that namespace for commands run against that cluster.
Note that the CockroachDB pods being in a non-default namespace means that if we didn't do anything about it then any client applications wanting to talk to CockroachDB from the default namespace would need to talk to a zone-scoped service name such as "cockroachdb-public.us-west1-a" rather than just the normal "cockroachdb-public" that they would use in a single-cluster setting. However, the setup script used by these instructions sets up an additional ExternalName
service in the default namespace such that the clients in the default namespace can simply talk to the "cockroachdb-public" address.
Finally, if you haven't worked with multiple Kubernetes clusters often before, you may find yourself forgetting to think about which cluster you want to run a given command against, and thus getting confusing results to your commands. Remember that you will either have to run kubectl use-context <context-name>
frequently to switch contexts between commands or you will have to append —context=<context-name>
on most commands you run to ensure they are run on the correct cluster.
Limitations
Kubernetes version
Kubernetes 1.8 or higher is required.
Exposing DNS servers
In the approach documented here, the DNS servers from each Kubernetes cluster are hooked together by exposing them via a load balanced IP address that is visible to the public Internet. This is because Google Cloud Platform's Internal Load Balancers do not currently support clients in one region using a load balancer in another region.
None of the services in your Kubernetes cluster will be accessible publicly, but their names could leak out to a motivated attacker. If this is unacceptable, please let us know and we can demonstrate other options. Your voice could also help convince Google to allow clients from one region to use an Internal Load Balancer in another, eliminating the problem.
Step 1. Start Kubernetes clusters
Our multi-region deployment approached relies on pod IP addresses being routable across three distinct Kubernetes clusters and regions. The hosted Google Kubernetes Engine (GKE) service satisfies this requirement, so that is the environment featured here. If you want to run on another cloud or on-premises, use this basic network test to see if it will work.
- Complete the Before You Begin steps described in the Google Kubernetes Engine Quickstart documentation.
This includes installing gcloud
, which is used to create and delete Kubernetes Engine clusters, and kubectl
, which is the command-line tool used to manage Kubernetes from your workstation.
Tip:
The documentation offers the choice of using Google's Cloud Shell product or using a local shell on your machine. Choose to use a local shell if you want to be able to view the CockroachDB Admin UI using the steps in this guide.
- From your local workstation, start the first Kubernetes cluster, specifying the zone it should run in:
$ gcloud container clusters create cockroachdb1 --zone=<gce-zone>
Creating cluster cockroachdb1...done.
This creates GKE instances in the zone specified and joins them into a single Kubernetes cluster named cockroachdb1
.
The process can take a few minutes, so do not move on to the next step until you see a Creating cluster cockroachdb1…done
message and details about your cluster.
- Start the second Kubernetes cluster, specifying the zone it should run in:
$ gcloud container clusters create cockroachdb2 --zone=<gce-zone>
Creating cluster cockroachdb2...done.
- Start the third Kubernetes cluster, specifying the zone it should run in:
$ gcloud container clusters create cockroachdb3 --zone=<gce-zone>
Creating cluster cockroachdb3...done.
- Get the
kubectl
"contexts" for your clusters:
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* gke_cockroach-shared_us-east1-b_cockroachdb1 gke_cockroach-shared_us-east1-b_cockroachdb1 gke_cockroach-shared_us-east1-b_cockroachdb1
gke_cockroach-shared_us-west1-a_cockroachdb2 gke_cockroach-shared_us-west1-a_cockroachdb2 gke_cockroach-shared_us-west1-a_cockroachdb2
gke_cockroach-shared_us-central1-a_cockroachdb3 gke_cockroach-shared_us-central1-a_cockroachdb3 gke_cockroach-shared_us-central1-a_cockroachdb3
Note:
All of the kubectl
commands in this tutorial use the —context
flag to tell kubectl
which Kubernetes cluster to talk to. Each Kubernetes cluster operates independently; you have to tell each of them what to do separately, and when you want to get the status of something in a particular cluster, you have to make it clear to kubectl
which cluster you're interested in.
The context with *
in the CURRENT
column indicates the cluster that kubectl
will talk to by default if you do not specify the —context
flag.
- Get the email address associated with your Google Cloud account:
$ gcloud info | grep Account
Account: [your.google.cloud.email@example.org]
Warning:
This command returns your email address in all lowercase. However, in the next step, you must enter the address using the accurate capitalization. For example, if your address is YourName@example.com, you must use YourName@example.com and not yourname@example.com.
- For each Kubernetes cluster, create the RBAC roles CockroachDB needs for running on GKE, using the email address and relevant "context" name from the previous steps:
$ kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<your.google.cloud.email@example.org> --context=<context-name-of-kubernetes-cluster1>
$ kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<your.google.cloud.email@example.org> --context=<context-name-of-kubernetes-cluster2>
$ kubectl create clusterrolebinding $USER-cluster-admin-binding --clusterrole=cluster-admin --user=<your.google.cloud.email@example.org> --context=<context-name-of-kubernetes-cluster3>
Step 2. Start CockroachDB
- Create a directory and download the required script and configuration files into it:
$ mkdir multiregion
$ cd multiregion
$ curl -OOOOOOOOO \
https://raw.githubusercontent.com/cockroachdb/cockroach/master/cloud/kubernetes/multiregion/{README.md,client-secure.yaml,cluster-init-secure.yaml,cockroachdb-statefulset-secure.yaml,dns-lb.yaml,example-app-secure.yaml,external-name-svc.yaml,setup.py,teardown.py}
- At the top of the
setup.py
script, fill in thecontexts
map with the zones of your clusters and their "context" names, for example:
context = {
'us-east1-b': 'gke_cockroach-shared_us-east1-b_cockroachdb1',
'us-west1-a': 'gke_cockroach-shared_us-west1-a_cockroachdb2',
'us-central1-a': 'gke_cockroach-shared_us-central1-a_cockroachdb3',
}
You retrieved the kubectl
"contexts" in an earlier step. To get them again, run:
$ kubectl config get-contexts
- In the
setup.py
script, fill in theregions
map with the zones and corresponding regions of your clusters, for example:
$ regions = {
'us-east1-b': 'us-east1',
'us-west1-a': 'us-west1',
'us-central1-a': 'us-central1',
}
Setting regions is optional, but recommended, because it improves CockroachDB's ability to diversify data placement if you use more than one zone in the same region. If you aren't specifying regions, just leave the map empty.
- If you haven't already, install CockroachDB locally and add it to your
PATH
. Thecockroach
binary will be used to generate certificates.
If the cockroach
binary is not on your PATH
, in the setup.py
script, set the cockroach_path
variable to the path to the binary.
Optionally, to optimize your deployment for better performance, review CockroachDB Performance on Kubernetes and make the desired modifications to the
cockroachdb-statefulset-secure.yaml
file.Run the
setup.py
script:
$ python setup.py
As the script creates various resources and creates and initializes the CockroachDB cluster, you'll see a lot of output, eventually ending with job "cluster-init-secure" created
.
- Confirm that the CockroachDB pods in each cluster say
1/1
in theREADY
column, indicating that they've successfully joined the cluster:
$ kubectl get pods --selector app=cockroachdb --all-namespaces --context=<context-name-of-kubernetes-cluster1>
NAMESPACE NAME READY STATUS RESTARTS AGE
us-east1-b cockroachdb-0 1/1 Running 0 14m
us-east1-b cockroachdb-1 1/1 Running 0 14m
us-east1-b cockroachdb-2 1/1 Running 0 14m
$ kubectl get pods --selector app=cockroachdb --all-namespaces --context=<context-name-of-kubernetes-cluster2>
NAMESPACE NAME READY STATUS RESTARTS AGE
us-central1-a cockroachdb-0 1/1 Running 0 14m
us-central1-a cockroachdb-1 1/1 Running 0 14m
us-central1-a cockroachdb-2 1/1 Running 0 14m
$ kubectl get pods --selector app=cockroachdb --all-namespaces --context=<context-name-of-kubernetes-cluster3>
NAMESPACE NAME READY STATUS RESTARTS AGE
us-west1-a cockroachdb-0 1/1 Running 0 14m
us-west1-a cockroachdb-1 1/1 Running 0 14m
us-west1-a cockroachdb-2 1/1 Running 0 14m
If you notice that only one of the Kubernetes clusters' pods are marked as READY
, you likely also need to configure a network firewall rule that will allow the pods in the different clusters to talk to each other. You can run the following command to create a firewall rule allowing traffic on port 26257 (the port used by CockroachDB for inter-node traffic) within your private GCE network. It will not allow any traffic in from outside your private network:
$ gcloud compute firewall-rules create allow-cockroach-internal --allow=tcp:26257 --source-ranges=10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
Creating firewall...done.
NAME NETWORK DIRECTION PRIORITY ALLOW DENY
allow-cockroach-internal default INGRESS 1000 tcp:26257
Tip:
In each Kubernetes cluster, the StatefulSet configuration sets all CockroachDB nodes to write to stderr
, so if you ever need access to a pod/node's logs to troubleshoot, use kubectl logs <podname> —namespace=<cluster-namespace> —context=<cluster-context>
rather than checking the log on the persistent volume.
Step 3. Use the built-in SQL client
- Use the
client-secure.yaml
file to launch a pod and keep it running indefinitely, specifying the context of the Kubernetes cluster to run it in:
$ kubectl create -f client-secure.yaml --context=<cluster-context>
pod "cockroachdb-client-secure" created
The pod uses the root
client certificate created earlier by the setup.py
script. Note that this will work from any of the three Kubernetes clusters as long as you use the correct namespace and context combination.
- Get a shell into the pod and start the CockroachDB built-in SQL client, again specifying the namespace and context of the Kubernetes cluster where the pod is running:
$ kubectl exec -it cockroachdb-client-secure --context=<cluster-context> -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
#
# Server version: CockroachDB CCL v2.0.5 (x86_64-unknown-linux-gnu, built 2018/08/13 17:59:42, go1.10) (same version as client)
# Cluster ID: 99346e82-9817-4f62-b79b-fdd5d57f8bda
#
# Enter \? for a brief introduction.
#
warning: no current database set. Use SET database = <dbname> to change, CREATE DATABASE to make a new database.
root@cockroachdb-public:26257/>
- Run some basic CockroachDB SQL statements:
> CREATE DATABASE bank;
> CREATE TABLE bank.accounts (id INT PRIMARY KEY, balance DECIMAL);
> INSERT INTO bank.accounts VALUES (1, 1000.50);
> SELECT * FROM bank.accounts;
+----+---------+
| id | balance |
+----+---------+
| 1 | 1000.5 |
+----+---------+
(1 row)
> CREATE USER roach WITH PASSWORD 'Q7gc8rEdS';
You will need this username and password to access the Admin UI in Step 4.
- Exit the SQL shell and pod:
> \q
The pod will continue running indefinitely, so any time you need to reopen the built-in SQL client or run any other cockroach
client commands (e.g., cockroach node
), repeat step 2 using the appropriate command.
If you'd prefer to delete the pod and recreate it when needed, run:
$ kubectl delete pod cockroachdb-client-secure --context=<cluster-context>
Step 4. Access the Web UI
To access the cluster's Web UI:
- Port-forward from your local machine to a pod in one of your Kubernetes clusters:
$ kubectl port-forward cockroachdb-0 8080 --namespace=<cluster-namespace> --context=<cluster-context>
Forwarding from 127.0.0.1:8080 -> 8080
Note:
The port-forward
command must be run on the same machine as the web browser in which you want to view the Web UI. If you have been running these commands from a cloud instance or other non-local shell, you will not be able to view the UI without configuring kubectl
locally and running the above port-forward
command on your local machine.
Go to https://localhost:8080 and log in with the username and password created in the Use the built-in SQL client step.
In the UI, check the Node List to verify that all nodes are running, and then click the Databases tab on the left to verify that
bank
is listed.
Step 5. Simulate datacenter failure
One of the major benefits of running a multi-region cluster is that an entire datacenter or region can go down without affecting the availability of the CockroachDB cluster as a whole.
To see this in action:
- Scale down one of the StatefulSets to zero pods, specifying the namespace and context of the Kubernetes cluster where it's running:
$ kubectl scale statefulset cockroachdb --replicas=0 --namespace=<cluster-namespace> --context=<cluster-context>
statefulset "cockroachdb" scaled
In the Admin UI, the Cluster Overview will soon show the three nodes from that region as Suspect. If you wait for 5 minutes or more, they will be listed as Dead. Note that even though there are three dead nodes, the other nodes are all healthy, and any clients using the database in the other regions will continue to work just fine.
When you're done verifying that the cluster still fully functions with one of the regions down, you can bring the region back up by running:
$ kubectl scale statefulset cockroachdb --replicas=3 --namespace=<cluster-namespace> --context=<cluster-context>
statefulset "cockroachdb" scaled
Step 6. Maintain the cluster
Scale the cluster
Each of your Kubernetes clusters contains 3 nodes that pods can run on. To ensure that you do not have two pods on the same node (as recommended in our production best practices), you need to add a new worker node and then edit your StatefulSet configuration to add another pod.
Use the
kubectl scale
command to add a pod to the StatefulSet in the Kubernetes cluster where you want to add a CockroachDB node:
$ kubectl scale statefulset cockroachdb --replicas=4 --namespace=<cluster-namespace> --context=<cluster-context>
statefulset "cockroachdb" scaled
- Verify that a fourth pod was added successfully:
$ kubectl get pods --namespace=<cluster-namespace> --context=<cluster-context>
NAME READY STATUS RESTARTS AGE
cockroachdb-0 1/1 Running 0 1h
cockroachdb-1 1/1 Running 0 1h
cockroachdb-2 1/1 Running 0 7m
cockroachdb-3 1/1 Running 0 44s
cockroachdb-client-secure 1/1 Running 0 26m
Upgrade the cluster
As new versions of CockroachDB are released, it's strongly recommended to upgrade to newer versions in order to pick up bug fixes, performance improvements, and new features. The general CockroachDB upgrade documentation provides best practices for how to prepare for and execute upgrades of CockroachDB clusters, but the mechanism of actually stopping and restarting processes in Kubernetes is somewhat special.
Kubernetes knows how to carry out a safe rolling upgrade process of the CockroachDB nodes. When you tell it to change the Docker image used in the CockroachDB StatefulSet, Kubernetes will go one-by-one, stopping a node, restarting it with the new image, and waiting for it to be ready to receive client requests before moving on to the next one. For more information, see the Kubernetes documentation.
- Decide how the upgrade will be finalized.
Note:
This step is relevant only when upgrading from v2.0.x to v2.1. For upgrades within the v2.1.x series, skip this step.
By default, after all nodes are running the new version, the upgrade process will be auto-finalized. This will enable certain performance improvements and bug fixes introduced in v2.1. After finalization, however, it will no longer be possible to perform a downgrade to v2.0. In the event of a catastrophic failure or corruption, the only option will be to start a new cluster using the old binary and then restore from one of the backups created prior to performing the upgrade.
We recommend disabling auto-finalization so you can monitor the stability and performance of the upgraded cluster before finalizing the upgrade:
- Get a shell into the pod with the
cockroach
binary created earlier and start the CockroachDB built-in SQL client:
$ kubectl exec -it cockroachdb-client-secure --context=<cluster-context> -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
- Set the
cluster.preserve_downgrade_option
cluster setting:
> SET CLUSTER SETTING cluster.preserve_downgrade_option = '2.0';
- For each Kubernetes cluster, kick off the upgrade process by changing the desired Docker image. To do so, pick the version that you want to upgrade to, then run the following command, replacing "VERSION" with your desired new version and specifying the relevant namespace and "context" name for the Kubernetes cluster:
$ kubectl patch statefulset cockroachdb --namespace=<namespace-of-kubernetes-cluster1> --context=<context-name-of-kubernetes-cluster1> --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:VERSION"}]'
$ kubectl patch statefulset cockroachdb --namespace=<namespace-of-kubernetes-cluster2> --context=<context-name-of-kubernetes-cluster2> --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:VERSION"}]'
$ kubectl patch statefulset cockroachdb --namespace=<namespace-of-kubernetes-cluster3> --context=<context-name-of-kubernetes-cluster3> --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"cockroachdb/cockroach:VERSION"}]'
- If you then check the status of the pods in each Kubernetes cluster, you should see one of them being restarted:
$ kubectl get pods --selector app=cockroachdb --all-namespaces --context=<context-name-of-kubernetes-cluster1>
$ kubectl get pods --selector app=cockroachdb --all-namespaces --context=<context-name-of-kubernetes-cluster1>
$ kubectl get pods --selector app=cockroachdb --all-namespaces --context=<context-name-of-kubernetes-cluster1>
This will continue until all of the pods have restarted and are running the new image.
- Finish the upgrade.
Note:
This step is relevant only when upgrading from v2.0.x to v2.1. For upgrades within the v2.1.x series, skip this step.
If you disabled auto-finalization in step 1 above, monitor the stability and performance of your cluster for as long as you require to feel comfortable with the upgrade (generally at least a day). If during this time you decide to roll back the upgrade, repeat the rolling restart procedure with the old binary.
Once you are satisfied with the new version, re-enable auto-finalization:
- Get a shell into the pod with the
cockroach
binary created earlier and start the CockroachDB built-in SQL client:
$ kubectl exec -it cockroachdb-client-secure --context=<cluster-context> -- ./cockroach sql --certs-dir=/cockroach-certs --host=cockroachdb-public
- Re-enable auto-finalization:
> RESET CLUSTER SETTING cluster.preserve_downgrade_option;
Stop the cluster
- To delete all of the resources created in your clusters, copy the
contexts
map fromsetup.py
intoteardown.py
, and then runteardown.py
:
$ python teardown.py
namespace "us-east1-b" deleted
service "kube-dns-lb" deleted
configmap "kube-dns" deleted
pod "kube-dns-5dcfcbf5fb-l4xwt" deleted
pod "kube-dns-5dcfcbf5fb-tddp2" deleted
namespace "us-west1-a" deleted
service "kube-dns-lb" deleted
configmap "kube-dns" deleted
pod "kube-dns-5dcfcbf5fb-8csc9" deleted
pod "kube-dns-5dcfcbf5fb-zlzn7" deleted
namespace "us-central1-a" deleted
service "kube-dns-lb" deleted
configmap "kube-dns" deleted
pod "kube-dns-5dcfcbf5fb-6ngmw" deleted
pod "kube-dns-5dcfcbf5fb-lcfxd" deleted
- Stop each Kubernetes cluster:
$ gcloud container clusters delete cockroachdb1 --zone=<gce-zone>
Deleting cluster cockroachdb1...done.
$ gcloud container clusters delete cockroachdb2 --zone=<gce-zone>
Deleting cluster cockroachdb2...done.
$ gcloud container clusters delete cockroachdb3 --zone=<gce-zone>
Deleting cluster cockroachdb3...done.