Doris

Overview

The Doris Load node supports writing data to the Doris database. This document describes how to set up a Doris Load node to run SQL queries against the Doris database.

Supported Version

Load NodeDoris version
Doris0.13+

Dependencies

In order to set up the Doris Load node, the dependency information needed to use a build automation tool such as Maven or SBT is provided below.

Maven dependency

  1. <dependency>
  2. <groupId>org.apache.inlong</groupId>
  3. <artifactId>sort-connector-doris</artifactId>
  4. <version>1.3.0-SNAPSHOT</version>
  5. </dependency>

Prepare

Create a MySql Extract table

First create a table cdc_mysql_source in the MySql database, the command is as follows:

  1. [root@fe001 ~]# mysql -u root -h localhost -P 3306 -p123456
  2. mysql> use test;
  3. Database changed
  4. mysql> CREATE TABLE `cdc_mysql_source` (
  5. `id` int(11) NOT NULL AUTO_INCREMENT,
  6. `name` varchar(64) DEFAULT NULL,
  7. `dr` tinyint(3) DEFAULT 0,
  8. PRIMARY KEY (`id`)
  9. );
  10. Query OK, 0 rows affected (0.02 sec)
  11. mysql> insert into cdc_mysql_source values(1, 'zhangsan', 0),(2, 'lisi', 0),(3, 'wangwu', 0);
  12. Query OK, 3 rows affected (0.01 sec)
  13. Records: 3 Duplicates: 0 Warnings: 0
  14. mysql> select * from cdc_mysql_source;
  15. +----+----------+----+
  16. | id | name | dr |
  17. +----+----------+----+
  18. | 1 | zhangsan | 0 |
  19. | 2 | lisi | 0 |
  20. | 3 | wangwu | 0 |
  21. +----+----------+----+
  22. 3 rows in set (0.07 sec)

Create a Doris Load table

Create a table cdc_doris_sink in the Doris database, the command is as follows:

  1. [root@fe001 ~]# mysql -u root -h localhost -P 9030 -p000000
  2. mysql> use test;
  3. Reading table information for completion of table and column names
  4. You can turn off this feature to get a quicker startup with -A
  5. Database changed
  6. mysql> CREATE TABLE `cdc_doris_sink` (
  7. `id` int(11) NOT NULL COMMENT "user id",
  8. `name` varchar(50) NOT NULL COMMENT "user name",
  9. `dr` tinyint(4) NULL COMMENT "delete tag"
  10. ) ENGINE=OLAP
  11. UNIQUE KEY(`id`)
  12. COMMENT "OLAP"
  13. DISTRIBUTED BY HASH(`id`) BUCKETS 1
  14. PROPERTIES (
  15. "replication_allocation" = "tag.location.default: 1"
  16. );
  17. Query OK, 0 rows affected (0.06 sec)

How to create a Doris Load Node

Usage for SQL API

  1. [root@tasknode001 flink-1.13.5]# ./bin/sql-client.sh -l ./opt/connectors/mysql-cdc-inlong/ -l ./opt/connectors/doris/
  2. Flink SQL> SET 'execution.checkpointing.interval' = '3s';
  3. [INFO] Session property has been set.
  4. Flink SQL> SET 'table.dynamic-table-options.enabled' = 'true';
  5. [INFO] Session property has been set.
  6. Flink SQL> CREATE TABLE cdc_mysql_source (
  7. > id int
  8. > ,name VARCHAR
  9. > ,dr TINYINT
  10. > ,PRIMARY KEY (id) NOT ENFORCED
  11. > ) WITH (
  12. > 'connector' = 'mysql-cdc-inlong',
  13. > 'hostname' = 'localhost',
  14. > 'port' = '3306',
  15. > 'username' = 'root',
  16. > 'password' = '123456',
  17. > 'database-name' = 'test',
  18. > 'table-name' = 'cdc_mysql_source'
  19. > );
  20. [INFO] Execute statement succeed.
  21. Flink SQL> CREATE TABLE cdc_doris_sink (
  22. > id INT,
  23. > name STRING,
  24. > dr TINYINT
  25. > ) WITH (
  26. > 'connector' = 'doris',
  27. > 'fenodes' = 'localhost:8030',
  28. > 'table.identifier' = 'test.cdc_doris_sink',
  29. > 'username' = 'root',
  30. > 'password' = '000000',
  31. > 'sink.properties.format' = 'json',
  32. > 'sink.properties.strip_outer_array' = 'true',
  33. > 'sink.enable-delete' = 'true'
  34. > );
  35. [INFO] Execute statement succeed.
  36. -- Support delete event synchronization (sink.enable-delete='true'), requires Doris table to enable batch delete function
  37. Flink SQL> insert into cdc_doris_sink select * from cdc_mysql_source /*+ OPTIONS('server-id'='5402') */;
  38. [INFO] Submitting SQL update statement to the cluster...
  39. [INFO] SQL update statement has been successfully submitted to the cluster:
  40. Job ID: 5f89691571d7b3f3ca446589e3d0c3d3

Usage for InLong Dashboard

TODO: It will be supported in the future.

Usage for InLong Manager Client

TODO: It will be supported in the future.

Doris Load Node Options

OptionRequiredDefaultTypeDescription
connectorrequired(none)stringSpecify which connector to use, valid values are: doris
fenodesrequired(none)stringDoris FE http address, support multiple addresses, separated by commas
table.identifierrequired(none)stringDoris table identifier, eg, db1.tbl1
usernamerequired(none)stringDoris username
passwordrequired(none)stringDoris password
doris.request.retriesoptional3intNumber of retries to send requests to Doris
doris.request.connect.timeout.msoptional30000intConnection timeout for sending requests to Doris
doris.request.read.timeout.msoptional30000intRead timeout for sending request to Doris
doris.request.query.timeout.soptional3600intQuery the timeout time of Doris, the default is 1 hour, -1 means no timeout limit
doris.request.tablet.sizeoptionalInteger.MAX_VALUEintThe number of Doris Tablets corresponding to an Partition. The smaller this value is set, the more partitions will be generated. This will increase the parallelism on the flink side, but at the same time will cause greater pressure on Doris.
doris.batch.sizeoptional1024intThe maximum number of rows to read data from BE at one time. Increasing this value can reduce the number of connections between Flink and Doris. Thereby reducing the extra time overhead caused by network delay.
doris.exec.mem.limitoptional2147483648longMemory limit for a single query. The default is 2GB, in bytes.
doris.deserialize.arrow.asyncoptionalfalsebooleanWhether to support asynchronous conversion of Arrow format to RowBatch required for flink-doris-connector iteration
doris.deserialize.queue.sizeoptional64intAsynchronous conversion of the internal processing queue in Arrow format takes effect when doris.deserialize.arrow.async is true
doris.read.fieldoptional(none)stringList of column names in the Doris table, separated by commas
doris.filter.queryoptional(none)stringFilter expression of the query, which is transparently transmitted to Doris. Doris uses this expression to complete source-side data filtering.
sink.batch.sizeoptional10000intMaximum number of lines in a single write BE
sink.max-retriesoptional1intNumber of retries after writing BE failed
sink.batch.intervaloptional10sstringThe flush interval, after which the asynchronous thread will write the data in the cache to BE. The default value is 10 second, and the time units are ms, s, min, h, and d. Set to 0 to turn off periodic writing.
sink.properties.*optional(none)stringThe stream load parameters.

eg:
sink.properties.column_separator’ = ‘,’

Setting ‘sink.properties.escape_delimiters’ = ‘true’ if you want to use a control char as a separator, so that such as ‘\x01’ will translate to binary 0x01

Support JSON format import, you need to enable both ‘sink.properties.format’ =’json’ and ‘sink.properties.strip_outer_array’ =’true’
sink.enable-deleteoptionaltruebooleanWhether to enable deletion. This option requires Doris table to enable batch delete function (0.15+ version is enabled by default), and only supports Uniq model.

Data Type Mapping

Doris TypeFlink Type
NULL_TYPENULL
BOOLEANBOOLEAN
TINYINTTINYINT
SMALLINTSMALLINT
INTINT
BIGINTBIGINT
FLOATFLOAT
DOUBLEDOUBLE
DATESTRING
DATETIMESTRING
DECIMALDECIMAL
CHARSTRING
LARGEINTSTRING
VARCHARSTRING
DECIMALV2DECIMAL
TIMEDOUBLE
HLLUnsupported datatype

See flink-doris-connector for more details.