Pulsar to ClickHouse Example

Here we use an example to introduce how to use Apache InLong creating Pulsar -> ClickHouse data synchronization.

Deployment

Install InLong

Before we begin, we need to install InLong. Here we provide two ways:

Add Connectors

Download the connectors corresponding to Flink version, and after decompression, place sort-connector-jdbc-[version]-SNAPSHOT.jar in /inlong-sort/connectors/ directory.

Install ClickHouse

  1. docker run -d --rm --net=host --name clickhouse -e CLICKHOUSE_USER=admin -e CLICKHOUSE_PASSWORD=inlong -e CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 clickhouse/clickhouse-server:22.8

Cluster Initialize

When all containers are successfully started, you can access the InLong dashboard address http://localhost, and use the following default account to log in.

  1. User: admin
  2. Password: inlong

Register ClickHouse DataNodes

Click [DataNodes] -> [Create] on the page to register ClickHouse DataNodes.

Create Clickhouse Datanode

Create Task

Create Data Streams Group

Click [Synchronization] → [Create] on the page and input the Group ID and Steam ID:

Create Group_Stream

Create Data Source

In the data source, click [New] → [Pulsar] to configure the source name, pulsar tenant, namespace, topic and other information.

Create Source

Pulsar to ClickHouse Example - 图4note

  • Please create the pulsar tenant, namespace and topic in advance, you can do it by Pulsar-admin

Create Data Sink

In the data target, click [New] → [ClickHouse] to configure the name, DB name, table name, and created ck data node.

Create Sink

Configuration Fields

Configure fields mapping in [Source Field] and [Target Field] respectively, and click [Submit] after completion.

Create Fields

Approve Data Stream

Click [Approval] -> [MyApproval] -> [Approval] -> [Ok].

Approve

Back to [Synchronization] page, wait for [success].

Test Data

Send Data

Enter the pulsar container

  1. docker exec -it pulsar /bin/bash

Insert 1000 pieces of data in total

  1. #!/bin/bash
  2. # Pulsar info
  3. TENANT="public"
  4. NAMESPACE="default"
  5. TOPIC="test"
  6. # Insert data in a loop
  7. for ((i=1; i<=1000; i++))
  8. do
  9. # Generate data
  10. id=$i
  11. name="name_$i"
  12. # Build one message
  13. message="$id|$name"
  14. # Produce message to pulsar
  15. bin/pulsar-client produce persistent://$TENANT/$NAMESPACE/$TOPIC --messages $message
  16. done

Verify Data

Then enter the ClickHouse container and view the source table data:

Source_data

FAQ

ClickHouse fails to write data, you can view the error on the Flink page and check the permissions of the user and table engine used.