Quick Start

What’s Remote WAL

WAL(Write-Ahead Logging) is a crucial component in GreptimeDB that persistently records every data modification to ensure no memory-cached data loss. We implement WAL as a module in the Datanode service using a persistent embedded storage engine, raft-engine. When deploying GreptimeDB in the public cloud, we can persistently store WAL data in cloud storage(AWS EBS, GCP persistent disk, etc.) to achieve 0 RPO(Recovery Point Objective). However, the deployment has a significant RTO(Recovery Time Objective) because WAL is tightly coupled with Datanode. Additionally, the rafe-engine can’t support multiple log subscriptions, which makes it difficult to implement region hot standby and region migration.

To resolve the above problems, we decided to design and implement a remote WAL. The remote WAL decouples the WAL from the Datanode to the remote service, which we chose to be Apache Kafka. Apache Kafka is widely adopted in stream processing and exhibits excellent distributed fault tolerance and a subscription mechanism based on topics. With the release v0.5.0, we introduced Apache Kafka as an optional storage engine for WAL.

Run Standalone GreptimeDB with Remote WAL

It’s very easy to experience remote WAL by using the Docker with the following steps. In this quick start, we will create a Kafka cluster with one broker in KRaft mode and use it as remote WAL for the standalone GreptimeDB.

Step 1: Create a user-defined bridge of the Docker network

The user-defined bridge can help us create a bridge network to connect multiple containers:

  1. docker network create greptimedb-remote-wal

Step 2: Start the Kafka Service

Use the KRaft mode to start the singleton Kafka service:

  1. docker run \
  2. --name kafka --rm \
  3. --network greptimedb-remote-wal \
  4. -p 9092:9092 \
  5. -e KAFKA_CFG_NODE_ID="1" \
  6. -e KAFKA_CFG_PROCESS_ROLES="broker,controller" \
  7. -e KAFKA_CFG_CONTROLLER_QUORUM_VOTERS="1@kafka:9093" \
  8. -e KAFKA_CFG_ADVERTISED_LISTENERS="PLAINTEXT://kafka:9092" \
  9. -e KAFKA_CFG_CONTROLLER_LISTENER_NAMES="CONTROLLER" \
  10. -e KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP="CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT" \
  11. -e KAFKA_CFG_LISTENERS="PLAINTEXT://:9092,CONTROLLER://:9093" \
  12. -e ALLOW_PLAINTEXT_LISTENER="yes" \
  13. -e KAFKA_BROKER_ID="1" \
  14. -e KAFKA_CFG_LOG_DIRS="/bitnami/kafka/data" \
  15. -v $(pwd)/kafka-data:/bitnami/kafka/data \
  16. bitnami/kafka:3.6.0

Quick Start - 图1NOTE

To avoid accidently exit the Docker container, you may want to run it in the “detached” mode: add the -d flag to the docker run command.

The data will be stored in $(pwd)/kafka-data.

Step 3: Start the GreptimeDB with Remote WAL Configurations

Use the Kafka wal provider to start the standalone GreptimeDB:

  1. docker run \
  2. --network greptimedb-remote-wal \
  3. -p 4000-4003:4000-4003 \
  4. -v "$(pwd)/greptimedb:/tmp/greptimedb" \
  5. --name greptimedb --rm \
  6. -e GREPTIMEDB_STANDALONE__WAL__PROVIDER="kafka" \
  7. -e GREPTIMEDB_STANDALONE__WAL__BROKER_ENDPOINTS="kafka:9092" \
  8. greptime/greptimedb standalone start \
  9. --http-addr 0.0.0.0:4000 \
  10. --rpc-addr 0.0.0.0:4001 \
  11. --mysql-addr 0.0.0.0:4002 \
  12. --postgres-addr 0.0.0.0:4003

Quick Start - 图2NOTE

To avoid accidently exit the Docker container, you may want to run it in the “detached” mode: add the -d flag to the docker run command.

We use the environment variables to specify the provider:

  • GREPTIMEDB_STANDALONE__WAL__PROVIDER: Set kafka to use Kafka remote WAL;
  • GREPTIMEDB_STANDALONE__WAL__BROKER_ENDPOINTS: Specify the advertised listeners for all brokers in the Kafka cluster. In this example, we will use the Kafka container name, and the bridge network will resolve it into IPv4;

Step 4: Write and Query Data

There are many ways to connect to GreptimeDB, so let’s choose mysql.

  1. Connect the GreptimeDB

    1. mysql -h 127.0.0.1 -P 4002
  2. Write the testing data

    • Create the table system_metrics

      1. CREATE TABLE IF NOT EXISTS system_metrics (
      2. host STRING,
      3. idc STRING,
      4. cpu_util DOUBLE,
      5. memory_util DOUBLE,
      6. disk_util DOUBLE,
      7. ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
      8. PRIMARY KEY(host, idc),
      9. TIME INDEX(ts)
      10. );
    • Write the testing data

      1. INSERT INTO system_metrics
      2. VALUES
      3. ("host1", "idc_a", 11.8, 10.3, 10.3, 1667446797450),
      4. ("host1", "idc_a", 80.1, 70.3, 90.0, 1667446797550),
      5. ("host1", "idc_b", 50.0, 66.7, 40.6, 1667446797650),
      6. ("host1", "idc_b", 51.0, 66.5, 39.6, 1667446797750),
      7. ("host1", "idc_b", 52.0, 66.9, 70.6, 1667446797850),
      8. ("host1", "idc_b", 53.0, 63.0, 50.6, 1667446797950),
      9. ("host1", "idc_b", 78.0, 66.7, 20.6, 1667446798050),
      10. ("host1", "idc_b", 68.0, 63.9, 50.6, 1667446798150),
      11. ("host1", "idc_b", 90.0, 39.9, 60.6, 1667446798250);
  3. Query the data

    1. SELECT * FROM system_metrics;
  4. Query the Kafka topics:

    1. # List the Kafka topics.
    2. docker exec kafka /opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

    By default, all the topics start with greptimedb_wal_topic, for example:

    1. docker exec kafka /opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
    2. greptimedb_wal_topic_0
    3. greptimedb_wal_topic_1
    4. greptimedb_wal_topic_10
    5. ...

Step 5: Cleanup

  • Stop the greptimedb and Kafka

    1. docker stop greptimedb
    2. docker stop kafka
  • Remove the user-defined bridge

    1. docker network rm greptimedb-remote-wal
  • Remove the data

    The data will be stored in the working directory that runs greptimedb:

    1. rm -r <working-dir>/greptimedb
    2. rm -r <working-dir>/kafka-data