ClickHouse

Overview

The ClickHouse Load Node supports to write data into ClickHouse database. This document describes how to set up the ClickHouse Load Node to run SQL queries against ClickHouse database.

Supported Version

Load NodeDriverGroup IdArtifact IdJAR
ClickHouseClickHouseru.yandex.clickhouseclickhouse-jdbcDownload

Dependencies

In order to set up the ClickHouse Load Node, the following provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with Sort Connectors JAR bundles.

Maven dependency

  1. <dependency>
  2. <groupId>org.apache.inlong</groupId>
  3. <artifactId>sort-connector-jdbc</artifactId>
  4. <version>1.13.0-SNAPSHOT</version>
  5. </dependency>

How to create a ClickHouse Load Node

Usage for SQL API

  1. -- MySQL extract node
  2. CREATE TABLE `mysql_extract_table`(
  3. PRIMARY KEY (`id`) NOT ENFORCED,
  4. `id` BIGINT,
  5. `name` STRING,
  6. `age` INT
  7. ) WITH (
  8. 'connector' = 'mysql-cdc-inlong',
  9. 'url' = 'jdbc:mysql://localhost:3306/read',
  10. 'username' = 'inlong',
  11. 'password' = 'inlong',
  12. 'table-name' = 'user'
  13. )
  14. -- ClickHouse load node
  15. CREATE TABLE `clickhouse_load_table`(
  16. PRIMARY KEY (`id`) NOT ENFORCED,
  17. `id` BIGINT,
  18. `name` STRING,
  19. `age` INT
  20. ) WITH (
  21. 'connector' = 'jdbc-inlong',
  22. 'dialect-impl' = 'org.apache.inlong.sort.jdbc.dialect.ClickHouseDialect',
  23. 'url' = 'jdbc:clickhouse://localhost:8123/demo',
  24. 'username' = 'inlong',
  25. 'password' = 'inlong',
  26. 'table-name' = 'demo.user'
  27. )
  28. -- write data into ClickHouse
  29. INSERT INTO clickhouse_load_table
  30. SELECT id, name , age FROM mysql_extract_table;

Usage for InLong Dashboard

When creating a data flow, select ClickHouse for the data stream direction, and click “Add” to configure it.

ClickHouse Configuration

Usage for InLong Manager Client

TODO: It will be supported in the future.

ClickHouse Load Node Options

OptionRequiredDefaultTypeDescription
connectorrequired(none)StringSpecify what connector to use, here should be ‘jdbc-inlong’.
urlrequired(none)StringThe JDBC database url.
dialect-implrequired(none)Stringorg.apache.inlong.sort.jdbc.dialect.ClickHouseDialect
table-namerequired(none)StringThe name of JDBC table to connect, for example database.tableName
driveroptional(none)StringThe class name of the JDBC driver to use to connect to this URL, if not set, it will automatically be derived from the URL.
usernameoptional(none)StringThe JDBC user name. ‘username’ and ‘password’ must both be specified if any of them is specified.
passwordoptional(none)StringThe JDBC password.
connection.max-retry-timeoutoptional60sDurationMaximum timeout between retries. The timeout should be in second granularity and shouldn’t be smaller than 1 second.
sink.buffer-flush.max-rowsoptional100IntegerThe max size of buffered records before flush. Can be set to zero to disable it.
sink.buffer-flush.intervaloptional1sDurationThe flush interval mills, over this time, asynchronous threads will flush data. Can be set to ‘0’ to disable it. Note, ‘sink.buffer-flush.max-rows’ can be set to ‘0’ with the flush interval set allowing for complete async processing of buffered actions.
sink.max-retriesoptional3IntegerThe max retry times if writing records to database failed.
sink.parallelismoptional(none)IntegerDefines the parallelism of the JDBC sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator.
sink.ignore.changelogoptionalfalseBooleanIgnore all RowKind, ingest them as INSERT.
inlong.metric.labelsoptional(none)StringInlong metric label, format of value is groupId={groupId}&streamId={streamId}&nodeId={nodeId}.

Data Type Mapping

ClickHouse typeFlink SQL type
StringCHAR
String
IP
UUID
VARCHAR
String
EnumL
STRING
UInt8BOOLEAN
FixedStringBYTES
Decimal
Int128
Int256
UInt64
UInt128
UInt256
DECIMAL
Int8TINYINT
Int16
UInt8
SMALLINT
Int32
UInt16
Interval
INTEGER
Int64
UInt32
BIGINT
Float32FLOAT
DateDATE
DateTimeTIME
DateTimeTIMESTAMP
DateTimeTIMESTAMP_LTZ
Int32INTERVAL_YEAR_MONTH
Int64INTERVAL_DAY_TIME
ArrayARRAY
MapMAP
Not supportedROW
Not supportedMULTISET
Not supportedRAW