OpenEBS for Cassandra

OpenEBS and Cassandra

This tutorial provides detailed instructions to run a Kudo operator based Cassandra StatefulsSets with OpenEBS storage and perform some simple database operations to verify the successful deployment and it’s performance benchmark.

Introduction

Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle a large amounts of data across nodes, providing high availability with no single point of failure. It uses asynchronous masterless replication allowing low latency operations for all clients.

OpenEBS is the most popular Open Source Container Attached Solution available for Kubernetes and is favored by many organizations for its simplicity and ease of management and it’s highly flexible deployment options to meet the storage needs of any given stateful application.

Depending on the performance and high availability requirements of Cassandra, you can select to run Cassandra with the following deployment options:

For optimal performance, deploy Cassandra with OpenEBS Local PV. If you would like to use storage layer capabilities like high availability, snapshots, incremental backups and restore and so forth, you can select OpenEBS cStor.

Whether you use OpenEBS Local PV or cStor, you can set up the Kubernetes cluster with all its nodes in a single availability zone/data center or spread across multiple zones/ data centers.

Deployment model

OpenEBS and Cassandra

Configuration workflow

  1. Install OpenEBS
  2. Select OpenEBS storage engine
  3. Configure OpenEBS Local PV StorageClass
  4. Install Kudo operator
  5. Install Kudo based Cassandra
  6. Verify Cassandra is up and running
  7. Testing Cassandra performance on OpenEBS

Install OpenEBS

If OpenEBS is not installed in your K8s cluster, this can be done from here. If OpenEBS is already installed, go to the next step.

Select OpenEBS storage engine

A storage engine is the data plane component of the IO path of a Persistent Volume. In CAS architecture, users can choose different data planes for different application workloads based on a configuration policy. OpenEBS provides different types of storage engines and chooses the right engine that suits your type of application requirements and storage available on your Kubernetes nodes. More information can be read from here.

Configure OpenEBS Local PV StorageClass

In this tutorial, OpenEBS Local PV device has been used as the storage engine for deploying Kudo Cassandra. There are 2 ways to use OpenEBS Local PV.

  • openebs-hostpath - Using this option, it will create Kubernetes Persistent Volumes that will store the data into OS host path directory at: /var/openebs/<cassandra-pv>/. Select this option, if you don’t have any additional block devices attached to Kubernetes nodes. You would like to customize the directory where data will be saved, create a new OpenEBS Local PV storage class using these instructions.

  • openebs-device - Using this option, it will create Kubernetes Local PVs using the block devices attached to the node. Select this option when you want to dedicate a complete block device on a node to a Cassandra node. You can customize which devices will be discovered and managed by OpenEBS using the instructions here.

Install Kudo operator to install Cassandra

  • Make the environment to install Kudo operator using the following steps.

    1. $ export GOROOT=/usr/local/go
    2. $ export GOPATH=$HOME/gopath
    3. $ export PATH=$GOPATH/bin:$GOROOT/bin:$PATH
  • Choose the Kudo version. The latest version can be found here. In the following command, selected Kudo version is v0.14.0.

    1. VERSION=0.14.0
    2. OS=$(uname | tr '[:upper:]' '[:lower:]')
    3. ARCH=$(uname -m)
    4. wget -O kubectl-kudo https://github.com/kudobuilder/kudo/releases/download/v${VERSION}/kubectl-kudo_${VERSION}_${OS}_${ARCH}
  • Change the permission

    1. $ chmod +x kubectl-kudo
    2. $ sudo mv kubectl-kudo /usr/local/bin/kubectl-kudo
  • Install Cert-manager

    Before installing the KUDO operator, the cert-manager must be already installed in your cluster. If not, install the cert-manager. The instruction can be found from here. Since our K8s version is v1.16.0, we have installed cert-manager using the following command.

    1. $ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.1/cert-manager.yaml
  • Install Kudo operator using a specified version. In the following command, the selected version is v0.14.0.

    1. $ kubectl-kudo init --version 0.14.0

    Verify Kudo controller pods status

    1. $ kubectl get pod -n kudo-system
    2. NAME READY STATUS RESTARTS AGE
    3. kudo-controller-manager-0 1/1 Running 0 2m40s

Install Kudo operator based Cassandra

Install Kudo based Cassandra using OpenEBS storage engine. In this example, the storage class used is openebs-device. Before deploying Cassandra, ensure that there are enough block devices that can be used to consume Cassandra application, by running kubectl get bd -n openebs.

  1. $ export instance_name=cassandra-openebs
  2. $ export namespace_name=cassandra
  3. $ kubectl create ns cassandra
  4. $ kubectl kudo install cassandra --namespace=$namespace_name --instance $instance_name -p NODE_STORAGE_CLASS=openebs-device

Verify Cassandra is up and running

  • Get the Cassandra Pods, StatefulSet, Service and PVC details. It should show that StatefulSet is deployed with 3 Cassandra pods in running state and a headless service is configured.

    1. $kubectl get pod,service,sts,pvc -n cassandra
    2. NAME READY STATUS RESTARTS AGE
    3. cassandra-openebs-node-0 2/2 Running 0 4m
    4. cassandra-openebs-node-1 2/2 Running 0 3m2s
    5. cassandra-openebs-node-2 2/2 Running 0 3m24s
    6. NAME READY AGE
    7. statefulset.apps/cassandra 3/3 6m35s
    8. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    9. service/cassandra-openebs-svc ClusterIP None <none> 7000/TCP,7001/TCP,7199/TCP,9042/TCP,9160/TCP 6m35s
    10. NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
    11. var-lib-cassandra-cassandra-openebs-node-0 Bound pvc-213f2cfb-231f-4f14-be93-69c3d1c6d5d7 20Gi RWO openebs-device 20m
    12. var-lib-cassandra-cassandra-openebs-node-1 Bound pvc-059bf24b-3546-43f3-aa01-3a6bea640ffd 20Gi RWO openebs-device 19m
    13. var-lib-cassandra-cassandra-openebs-node-2 Bound pvc-82367756-7a19-4f7f-9e35-65e7696f3b86 20Gi RWO openebs-device 18m
  • Login to one of the Cassandra pod to verify the Cassandra cluster health status using the following command.

    1. $ kubectl exec -it cassandra-openebs-node-0 bash -n cassandra
    2. cassandra@cassandra-openebs-node-0:/$ nodetool status
    3. Datacenter: datacenter1
    4. =======================
    5. Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack
    6. UN 192.168.30.24 94.21 KiB 256 63.0% 73c54856-f045-48db-b0db-e6a751d005f8 rack1
    7. UN 192.168.93.31 75.12 KiB 256 65.3% d48c61b7-551b-4805-b8cc-b915d039f298 rack1
    8. UN 192.168.56.80 75 KiB 256 71.7% 91fc4107-e447-4605-8cbf-3916f9fd8abf rack1
  • Create a Test Keyspace with Tables. Login to one of the Cassandra pod and run the following commands from a cassandra pod.

    1. cassandra@cassandra-openebs-node-0:/$ cqlsh <svc-name>.<namespace>.svc.cluster.local

    Example command:

    1. cassandra@cassandra-openebs-node-0:/$ cqlsh cassandra-openebs-svc.cassandra.svc.cluster.local
    2. Connected to cassandra-openebs at cassandra-openebs-svc.cassandra.svc.cluster.local:9042.
    3. [cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]
    4. Use HELP for help.
    5. cqlsh>
  • Creating a Keyspace. Now, let’s create a Keyspace and add a table with some entries into it.

    1. cqlsh> create keyspace dev
    2. ... with replication = {'class':'SimpleStrategy','replication_factor':1};
  • Creating Data Objects

    1. cqlsh> use dev;
    2. cqlsh:dev> create table emp (empid int primary key,
    3. ... emp_first varchar, emp_last varchar, emp_dept varchar);
  • Inserting and Querying Data

    1. $ cqlsh:dev> insert into emp (empid, emp_first, emp_last, emp_dept)
    2. ... values (1,'fred','smith','eng');
    3. $ cqlsh:dev> select * from emp;
    4. empid | emp_dept | emp_first | emp_last
    5. -------+----------+-----------+----------
    6. 1 | eng | fred | smith
    7. (1 rows)
  • Updating a data

    1. $ cqlsh:dev> update emp set emp_dept = 'fin' where empid = 1;
    2. $ cqlsh:dev> select * from emp;
    3. empid | emp_dept | emp_first | emp_last
    4. -------+----------+-----------+----------
    5. 1 | fin | fred | smith
    6. (1 rows)
    7. cqlsh:dev> exit

Testing Cassandra Performance on OpenEBS

  • Login to one of the cassandra pod and run the following sample loadgen command to write and read some entry to and from the database.

    1. $ kubectl exec -it cassandra-openebs-node-0 bash -n cassandra
  • Get the database health status

    1. $ nodetool status
    2. Datacenter: datacenter1
    3. =======================
    4. Status=Up/Down
    5. |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack
    6. UN 192.168.52.94 135.39 MiB 256 32.6% 68206664-b1e7-4e73-9677-14119536e42d rack1
    7. UN 192.168.7.79 189.98 MiB 256 36.3% 5f6176f5-c47f-4d12-bd16-c9427baf68a0 rack1
    8. UN 192.168.70.87 127.46 MiB 256 31.2% da31ba66-42dd-4c85-a212-a0cb828bbefb rack1
  • Go to the directory where the binary is located.

    1. cassandra@cassandra-openebs-node-0:/$ cd /opt/cassandra/tools/bin
  • Run Write load

    1. cassandra@cassandra-openebs-node-0:/opt/cassandra/tools/bin$ ./cassandra-stress write n=1000000 -rate threads=50 -node 192.168.52.94
  • Run Read Load

    1. cassandra@cassandra-openebs-node-0:/opt/cassandra/tools/bin$ ./cassandra-stress read n=200000 -rate threads=50 -node 192.168.52.94

See Also:

OpenEBS architecture

OpenEBS use cases

Local PV concepts

Understanding NDM