Cluster Deployment

It’s highly recommended to deploy the GreptimeDB cluster in Kubernetes. There are the following prerequires:

  • Kubernetes(>=1.18)

    For testing purposes, you can use Kind or Minikube to create Kubernetes.

  • Helm v3

  • kubectl

Step 1: Deploy the GreptimeDB Operator

Add the chart repository with the following commands:

  1. helm repo add greptime https://greptimeteam.github.io/helm-charts/
  2. helm repo update

Create the greptimedb-admin namespace and deploy the GreptimeDB operator in the namespace:

  1. kubectl create ns greptimedb-admin
  2. helm upgrade --install greptimedb-operator greptime/greptimedb-operator -n greptimedb-admin

Step 2: Deploy the etcd Cluster

The GreptimeDB cluster needs the etcd cluster as the backend storage of the metasrv. We recommend using the Bitnami etcd chart to deploy the etcd cluster:

  1. kubectl create ns metasrv-store
  2. helm upgrade --install etcd oci://registry-1.docker.io/bitnamicharts/etcd \
  3. --set replicaCount=3 \
  4. --set auth.rbac.create=false \
  5. --set auth.rbac.token.enabled=false \
  6. -n metasrv-store

When the etcd cluster is ready, you can use the following command to check the cluster health:

  1. kubectl -n metasrv-store \
  2. exec etcd-0 -- etcdctl \
  3. --endpoints etcd-0.etcd-headless.metasrv-store:2379,etcd-1.etcd-headless.metasrv-store:2379,etcd-2.etcd-headless.metasrv-store:2379 \
  4. endpoint status

Step 3: Deploy the Kafka Cluster

We recommend using strimzi-kafka-operator to deploy the Kafka cluster in KRaft mode.

Create the kafka namespace and install the strimzi-kafka-operator:

  1. kubectl create namespace kafka
  2. kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

When the operator is ready, use the following spec to create the Kafka cluster:

  1. apiVersion: kafka.strimzi.io/v1beta2
  2. kind: KafkaNodePool
  3. metadata:
  4. name: dual-role
  5. labels:
  6. strimzi.io/cluster: kafka-wal
  7. spec:
  8. replicas: 3
  9. roles:
  10. - controller
  11. - broker
  12. storage:
  13. type: jbod
  14. volumes:
  15. - id: 0
  16. type: persistent-claim
  17. size: 20Gi
  18. deleteClaim: false
  19. ---
  20. apiVersion: kafka.strimzi.io/v1beta2
  21. kind: Kafka
  22. metadata:
  23. name: kafka-wal
  24. annotations:
  25. strimzi.io/node-pools: enabled
  26. strimzi.io/kraft: enabled
  27. spec:
  28. kafka:
  29. version: 3.7.0
  30. metadataVersion: 3.7-IV4
  31. listeners:
  32. - name: plain
  33. port: 9092
  34. type: internal
  35. tls: false
  36. - name: tls
  37. port: 9093
  38. type: internal
  39. tls: true
  40. config:
  41. offsets.topic.replication.factor: 3
  42. transaction.state.log.replication.factor: 3
  43. transaction.state.log.min.isr: 2
  44. default.replication.factor: 3
  45. min.insync.replicas: 2
  46. entityOperator:
  47. topicOperator: {}
  48. userOperator: {}

Save the spec as kafka-wal.yaml and apply in the Kubernetes:

  1. kubectl apply -f kafka-wal.yaml -n kafka

After the Kafka cluster is ready, check the status:

  1. kubectl get kafka -n kafka

The expected output will be:

  1. NAME DESIRED KAFKA REPLICAS DESIRED ZK REPLICAS READY METADATA STATE WARNINGS
  2. kafka-wal True KRaft

Step 4: Deploy the GreptimeDB Cluster with Remote WAL Settings

Create a GreptimeDB cluster with remote WAL settings:

  1. cat <<EOF | kubectl apply -f -
  2. apiVersion: greptime.io/v1alpha1
  3. kind: GreptimeDBCluster
  4. metadata:
  5. name: my-cluster
  6. namespace: default
  7. spec:
  8. base:
  9. main:
  10. image: greptime/greptimedb:latest
  11. frontend:
  12. replicas: 1
  13. meta:
  14. replicas: 1
  15. etcdEndpoints:
  16. - "etcd.metasrv-store:2379"
  17. datanode:
  18. replicas: 3
  19. remoteWal:
  20. kafka:
  21. brokerEndpoints:
  22. - "kafka-wal-kafka-bootstrap.kafka:9092"
  23. EOF

When the GreptimeDB cluster is ready, you can check the cluster status:

  1. kubectl get gtc my-cluster -n default

The expected output will be:

  1. NAME FRONTEND DATANODE META PHASE VERSION AGE
  2. my-cluster 1 3 1 Running latest 5m30s

Step 5: Write and Query Data

You can refer to the Overview to get more examples. For this tutorial, let’s choose to connect the cluster using the MySQL protocol. Use the kubectl to port forward 4002 traffic:

  1. kubectl port-forward svc/my-cluster-frontend 4002:4002 -n default

Open another terminal and connect the cluster by mysql:

  1. mysql -h 127.0.0.1 -P 4002

Create a distributed table:

  1. CREATE TABLE dist_table(
  2. ts TIMESTAMP DEFAULT current_timestamp(),
  3. n INT,
  4. row_id INT,
  5. PRIMARY KEY(n),
  6. TIME INDEX (ts)
  7. )
  8. PARTITION ON COLUMNS (n) (
  9. n < 5,
  10. n >= 5 AND n < 9,
  11. n >= 9
  12. )
  13. engine=mito;

Write the data:

  1. INSERT INTO dist_table(n, row_id) VALUES (1, 1);
  2. INSERT INTO dist_table(n, row_id) VALUES (2, 2);
  3. INSERT INTO dist_table(n, row_id) VALUES (3, 3);
  4. INSERT INTO dist_table(n, row_id) VALUES (4, 4);
  5. INSERT INTO dist_table(n, row_id) VALUES (5, 5);
  6. INSERT INTO dist_table(n, row_id) VALUES (6, 6);
  7. INSERT INTO dist_table(n, row_id) VALUES (7, 7);
  8. INSERT INTO dist_table(n, row_id) VALUES (8, 8);
  9. INSERT INTO dist_table(n, row_id) VALUES (9, 9);
  10. INSERT INTO dist_table(n, row_id) VALUES (10, 10);
  11. INSERT INTO dist_table(n, row_id) VALUES (11, 11);
  12. INSERT INTO dist_table(n, row_id) VALUES (12, 12);

And query the data:

  1. SELECT * from dist_table;