Example: Deploying Cassandra with a StatefulSet

This tutorial shows you how to run Apache Cassandra on Kubernetes. Cassandra, a database, needs persistent storage to provide data durability (application state). In this example, a custom Cassandra seed provider lets the database discover new Cassandra instances as they join the Cassandra cluster.

StatefulSets make it easier to deploy stateful applications into your Kubernetes cluster. For more information on the features used in this tutorial, see StatefulSet.

Note:

Cassandra and Kubernetes both use the term node to mean a member of a cluster. In this tutorial, the Pods that belong to the StatefulSet are Cassandra nodes and are members of the Cassandra cluster (called a ring). When those Pods run in your Kubernetes cluster, the Kubernetes control plane schedules those Pods onto Kubernetes Nodes.

When a Cassandra node starts, it uses a seed list to bootstrap discovery of other nodes in the ring. This tutorial deploys a custom Cassandra seed provider that lets the database discover new Cassandra Pods as they appear inside your Kubernetes cluster.

Objectives

  • Create and validate a Cassandra headless Service.
  • Use a StatefulSet to create a Cassandra ring.
  • Validate the StatefulSet.
  • Modify the StatefulSet.
  • Delete the StatefulSet and its Pods.

Before you begin

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

To complete this tutorial, you should already have a basic familiarity with Pods, Services, and StatefulSets.

Additional Minikube setup instructions

Caution:

Minikube defaults to 2048MB of memory and 2 CPU. Running Minikube with the default resource configuration results in insufficient resource errors during this tutorial. To avoid these errors, start Minikube with the following settings:

  1. minikube start --memory 5120 --cpus=4

Creating a headless Service for Cassandra

In Kubernetes, a Service describes a set of Pods that perform the same task.

The following Service is used for DNS lookups between Cassandra Pods and clients within your cluster:

application/cassandra/cassandra-service.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. labels:
  5. app: cassandra
  6. name: cassandra
  7. spec:
  8. clusterIP: None
  9. ports:
  10. - port: 9042
  11. selector:
  12. app: cassandra

Create a Service to track all Cassandra StatefulSet members from the cassandra-service.yaml file:

  1. kubectl apply -f https://k8s.io/examples/application/cassandra/cassandra-service.yaml

Validating (optional)

Get the Cassandra Service.

  1. kubectl get svc cassandra

The response is

  1. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  2. cassandra ClusterIP None <none> 9042/TCP 45s

If you don’t see a Service named cassandra, that means creation failed. Read Debug Services for help troubleshooting common issues.

Using a StatefulSet to create a Cassandra ring

The StatefulSet manifest, included below, creates a Cassandra ring that consists of three Pods.

Note: This example uses the default provisioner for Minikube. Please update the following StatefulSet for the cloud you are working with.

application/cassandra/cassandra-statefulset.yaml

  1. apiVersion: apps/v1
  2. kind: StatefulSet
  3. metadata:
  4. name: cassandra
  5. labels:
  6. app: cassandra
  7. spec:
  8. serviceName: cassandra
  9. replicas: 3
  10. selector:
  11. matchLabels:
  12. app: cassandra
  13. template:
  14. metadata:
  15. labels:
  16. app: cassandra
  17. spec:
  18. terminationGracePeriodSeconds: 1800
  19. containers:
  20. - name: cassandra
  21. image: gcr.io/google-samples/cassandra:v13
  22. imagePullPolicy: Always
  23. ports:
  24. - containerPort: 7000
  25. name: intra-node
  26. - containerPort: 7001
  27. name: tls-intra-node
  28. - containerPort: 7199
  29. name: jmx
  30. - containerPort: 9042
  31. name: cql
  32. resources:
  33. limits:
  34. cpu: "500m"
  35. memory: 1Gi
  36. requests:
  37. cpu: "500m"
  38. memory: 1Gi
  39. securityContext:
  40. capabilities:
  41. add:
  42. - IPC_LOCK
  43. lifecycle:
  44. preStop:
  45. exec:
  46. command:
  47. - /bin/sh
  48. - -c
  49. - nodetool drain
  50. env:
  51. - name: MAX_HEAP_SIZE
  52. value: 512M
  53. - name: HEAP_NEWSIZE
  54. value: 100M
  55. - name: CASSANDRA_SEEDS
  56. value: "cassandra-0.cassandra.default.svc.cluster.local"
  57. - name: CASSANDRA_CLUSTER_NAME
  58. value: "K8Demo"
  59. - name: CASSANDRA_DC
  60. value: "DC1-K8Demo"
  61. - name: CASSANDRA_RACK
  62. value: "Rack1-K8Demo"
  63. - name: POD_IP
  64. valueFrom:
  65. fieldRef:
  66. fieldPath: status.podIP
  67. readinessProbe:
  68. exec:
  69. command:
  70. - /bin/bash
  71. - -c
  72. - /ready-probe.sh
  73. initialDelaySeconds: 15
  74. timeoutSeconds: 5
  75. # These volume mounts are persistent. They are like inline claims,
  76. # but not exactly because the names need to match exactly one of
  77. # the stateful pod volumes.
  78. volumeMounts:
  79. - name: cassandra-data
  80. mountPath: /cassandra_data
  81. # These are converted to volume claims by the controller
  82. # and mounted at the paths mentioned above.
  83. # do not use these in production until ssd GCEPersistentDisk or other ssd pd
  84. volumeClaimTemplates:
  85. - metadata:
  86. name: cassandra-data
  87. spec:
  88. accessModes: [ "ReadWriteOnce" ]
  89. storageClassName: fast
  90. resources:
  91. requests:
  92. storage: 1Gi
  93. ---
  94. kind: StorageClass
  95. apiVersion: storage.k8s.io/v1
  96. metadata:
  97. name: fast
  98. provisioner: k8s.io/minikube-hostpath
  99. parameters:
  100. type: pd-ssd

Create the Cassandra StatefulSet from the cassandra-statefulset.yaml file:

  1. # Use this if you are able to apply cassandra-statefulset.yaml unmodified
  2. kubectl apply -f https://k8s.io/examples/application/cassandra/cassandra-statefulset.yaml

If you need to modify cassandra-statefulset.yaml to suit your cluster, download https://k8s.io/examples/application/cassandra/cassandra-statefulset.yaml and then apply that manifest, from the folder you saved the modified version into:

  1. # Use this if you needed to modify cassandra-statefulset.yaml locally
  2. kubectl apply -f cassandra-statefulset.yaml

Validating the Cassandra StatefulSet

  1. Get the Cassandra StatefulSet:

    1. kubectl get statefulset cassandra

    The response should be similar to:

    1. NAME DESIRED CURRENT AGE
    2. cassandra 3 0 13s

    The StatefulSet resource deploys Pods sequentially.

  2. Get the Pods to see the ordered creation status:

    1. kubectl get pods -l="app=cassandra"

    The response should be similar to:

    1. NAME READY STATUS RESTARTS AGE
    2. cassandra-0 1/1 Running 0 1m
    3. cassandra-1 0/1 ContainerCreating 0 8s

    It can take several minutes for all three Pods to deploy. Once they are deployed, the same command returns output similar to:

    1. NAME READY STATUS RESTARTS AGE
    2. cassandra-0 1/1 Running 0 10m
    3. cassandra-1 1/1 Running 0 9m
    4. cassandra-2 1/1 Running 0 8m
  3. Run the Cassandra nodetool inside the first Pod, to display the status of the ring.

    1. kubectl exec -it cassandra-0 -- nodetool status

    The response should look something like:

    1. Datacenter: DC1-K8Demo
    2. ======================
    3. Status=Up/Down
    4. |/ State=Normal/Leaving/Joining/Moving
    5. -- Address Load Tokens Owns (effective) Host ID Rack
    6. UN 172.17.0.5 83.57 KiB 32 74.0% e2dd09e6-d9d3-477e-96c5-45094c08db0f Rack1-K8Demo
    7. UN 172.17.0.4 101.04 KiB 32 58.8% f89d6835-3a42-4419-92b3-0e62cae1479c Rack1-K8Demo
    8. UN 172.17.0.6 84.74 KiB 32 67.1% a6a1e8c2-3dc5-4417-b1a0-26507af2aaad Rack1-K8Demo

Modifying the Cassandra StatefulSet

Use kubectl edit to modify the size of a Cassandra StatefulSet.

  1. Run the following command:

    1. kubectl edit statefulset cassandra

    This command opens an editor in your terminal. The line you need to change is the replicas field. The following sample is an excerpt of the StatefulSet file:

    1. # Please edit the object below. Lines beginning with a '#' will be ignored,
    2. # and an empty file will abort the edit. If an error occurs while saving this file will be
    3. # reopened with the relevant failures.
    4. #
    5. apiVersion: apps/v1
    6. kind: StatefulSet
    7. metadata:
    8. creationTimestamp: 2016-08-13T18:40:58Z
    9. generation: 1
    10. labels:
    11. app: cassandra
    12. name: cassandra
    13. namespace: default
    14. resourceVersion: "323"
    15. uid: 7a219483-6185-11e6-a910-42010a8a0fc0
    16. spec:
    17. replicas: 3
  2. Change the number of replicas to 4, and then save the manifest.

    The StatefulSet now scales to run with 4 Pods.

  3. Get the Cassandra StatefulSet to verify your change:

    1. kubectl get statefulset cassandra

    The response should be similar to:

    1. NAME DESIRED CURRENT AGE
    2. cassandra 4 4 36m

Cleaning up

Deleting or scaling a StatefulSet down does not delete the volumes associated with the StatefulSet. This setting is for your safety because your data is more valuable than automatically purging all related StatefulSet resources.

Warning: Depending on the storage class and reclaim policy, deleting the PersistentVolumeClaims may cause the associated volumes to also be deleted. Never assume you’ll be able to access data if its volume claims are deleted.

  1. Run the following commands (chained together into a single command) to delete everything in the Cassandra StatefulSet:

    1. grace=$(kubectl get pod cassandra-0 -o=jsonpath='{.spec.terminationGracePeriodSeconds}') \
    2. && kubectl delete statefulset -l app=cassandra \
    3. && echo "Sleeping ${grace} seconds" 1>&2 \
    4. && sleep $grace \
    5. && kubectl delete persistentvolumeclaim -l app=cassandra
  2. Run the following command to delete the Service you set up for Cassandra:

    1. kubectl delete service -l app=cassandra

Cassandra container environment variables

The Pods in this tutorial use the gcr.io/google-samples/cassandra:v13 image from Google’s container registry. The Docker image above is based on debian-base and includes OpenJDK 8.

This image includes a standard Cassandra installation from the Apache Debian repo. By using environment variables you can change values that are inserted into cassandra.yaml.

Environment variableDefault value
CASSANDRA_CLUSTER_NAME‘Test Cluster’
CASSANDRA_NUM_TOKENS32
CASSANDRA_RPC_ADDRESS0.0.0.0

What’s next