Flink engine usage documentation

This article mainly introduces the installation, use and configuration of the flink engine plugin in Linkis.

If you want to use the Flink engine on your server, you need to ensure that the following environment variables are set correctly and that the user who started the engine has these environment variables.

It is strongly recommended that you check these environment variables for the executing user before executing flink tasks. The specific way is

  1. sudo su -${username}
  2. echo ${JAVA_HOME}
  3. echo ${FLINK_HOME}
Environment variable nameEnvironment variable contentRemarks
JAVA_HOMEJDK installation pathRequired
HADOOP_HOMEHadoop installation pathRequired
HADOOP_CONF_DIRHadoop configuration pathLinkis starts the Flink on yarn mode adopted by the Flink engine, so yarn support is required.
FLINK_HOMEFlink installation pathRequired
FLINK_CONF_DIRFlink configuration pathRequired, such as ${FLINK_HOME}/conf
FLINK_LIB_DIRFlink package pathRequired, ${FLINK_HOME}/lib

Method 1: Download the engine plug-in package directly

Linkis Engine Plugin Download

Method 2: Compile the engine plug-in separately (requires a maven environment)

  1. # compile
  2. cd ${linkis_code_dir}/linkis-engineconn-plugins/flink/
  3. mvn clean install
  4. # The compiled engine plug-in package is located in the following directory
  5. ${linkis_code_dir}/linkis-engineconn-plugins/flink/target/out/

EngineConnPlugin engine plugin installation

Upload the engine plug-in package in 2.1 to the engine directory of the server

  1. ${LINKIS_HOME}/lib/linkis-engineconn-plugins

The directory structure after uploading is as follows

  1. linkis-engineconn-plugins/
  2. ├── big
  3. ├── dist
  4. └── 1.12.2
  5. ├── conf
  6. └── lib
  7. └── plugin
  8. └── 1.12.2

Refresh the engine by restarting the linkis-cg-linkismanager service

  1. cd ${LINKIS_HOME}/sbin
  2. sh linkis-daemon.sh restart cg-linkismanager

You can check whether the last_update_time of this table in the linkis_engine_conn_plugin_bml_resources in the database is the time when the refresh is triggered.

  1. #Login to the linkis database
  2. select * from linkis_cg_engine_conn_plugin_bml_resources;

The Flink engine of Linkis is started by flink on yarn, so the queue used by the user needs to be specified, as shown in the figure below.

yarn

  1. sh ./bin/linkis-cli -engineType flink-1.12.2 \
  2. -codeType sql -code "show tables" \
  3. -submitUser hadoop -proxyUser hadoop

More Linkis-Cli command parameter reference: Linkis-Cli usage

FlinkSQL can support a variety of data sources, such as binlog, kafka, hive, etc. If you want to use these data sources in Flink code, you need to put these connector plugin jar packages into In the lib of the flink engine, and restart the EnginePlugin service of Linkis. If you want to use binlog as a data source in your FlinkSQL, then you need to place flink-connector-mysql-cdc-1.1.1.jar in the lib of the flink engine.

In order to facilitate sampling and debugging, we have added the fql script type in Scriptis, which is specially used to execute FlinkSQL. But you need to ensure that your DSS has been upgraded to DSS1.0.0. After upgrading to DSS1.0.0, you can directly enter Scriptis to create a new fql script for editing and execution.

Writing example of FlinkSQL, taking binlog as an example

  1. CREATE TABLE mysql_binlog (
  2. id INT NOT NULL,
  3. name STRING,
  4. age INT
  5. ) WITH (
  6. 'connector' = 'mysql-cdc',
  7. 'hostname' = 'ip',
  8. 'port' = 'port',
  9. 'username' = 'username',
  10. 'password' = 'password',
  11. 'database-name' = 'dbname',
  12. 'table-name' = 'tablename',
  13. 'debezium.snapshot.locking.mode' = 'none' -- it is recommended to add, otherwise the lock table will be required
  14. );
  15. select * from mysql_binlog where id > 10;

When debugging using the select syntax in Scriptis, the Flink engine will have an automatic cancel mechanism, that is, when the specified time or the number of lines sampled reaches the specified number, the Flink engine will Actively cancel the task and persist the obtained result set, and then the front end will call the interface to open the result set to display the result set on the front end.

OnceEngineConn is used to officially start Flink streaming applications, specifically by calling LinkisManager createEngineConn interface through LinkisManagerClient, and sending the code to the created Flink engine, and then The Flink engine starts to execute, and this method can be called by other systems, such as Streamis. The usage of Client is also very simple, first create a maven project, or introduce the following dependencies in your project.

  1. <dependency>
  2. <groupId>org.apache.linkis</groupId>
  3. <artifactId>linkis-computation-client</artifactId>
  4. <version>${linkis.version}</version>
  5. </dependency>

Then create a scala test file, click execute, and the parsing from a binlog data is completed and inserted into a table in another mysql database. But it should be noted that you must create a resources directory in the maven project, place a linkis.properties file, and specify the gateway address and api version of linkis, such as

  1. wds.linkis.server.version=v1
  2. wds.linkis.gateway.url=http://ip:9001/
  1. object OnceJobTest {
  2. def main(args: Array[String]): Unit = {
  3. val sql = """CREATE TABLE mysql_binlog (
  4. | id INT NOT NULL,
  5. | name STRING,
  6. | age INT
  7. |) WITH (
  8. | 'connector' = 'mysql-cdc',
  9. | 'hostname' = 'ip',
  10. | 'port' = 'port',
  11. | 'username' = '${username}',
  12. | 'password' = '${password}',
  13. | 'database-name' = '${database}',
  14. | 'table-name' = '${tablename}',
  15. | 'debezium.snapshot.locking.mode' = 'none'
  16. |);
  17. |CREATE TABLE sink_table (
  18. | id INT NOT NULL,
  19. | name STRING,
  20. | age INT,
  21. | primary key(id) not enforced
  22. |) WITH (
  23. | 'connector' = 'jdbc',
  24. | 'url' = 'jdbc:mysql://${ip}:port/${database}',
  25. | 'table-name' = '${tablename}',
  26. | 'driver' = 'com.mysql.jdbc.Driver',
  27. | 'username' = '${username}',
  28. | 'password' = '${password}'
  29. |);
  30. |INSERT INTO sink_table SELECT id, name, age FROM mysql_binlog;
  31. |""".stripMargin
  32. val onceJob = SimpleOnceJob.builder().setCreateService("Flink-Test").addLabel(LabelKeyUtils.ENGINE_TYPE_LABEL_KEY, "flink-1.12.2")
  33. .addLabel(LabelKeyUtils.USER_CREATOR_LABEL_KEY, "hadoop-Streamis").addLabel(LabelKeyUtils.ENGINE_CONN_MODE_LABEL_KEY, "once")
  34. .addStartupParam(Configuration.IS_TEST_MODE.key, true)
  35. // .addStartupParam("label." + LabelKeyConstant.CODE_TYPE_KEY, "sql")
  36. .setMaxSubmitTime(300000)
  37. .addExecuteUser("hadoop").addJobContent("runType", "sql").addJobContent("code", sql).addSource("jobName", "OnceJobTest")
  38. .build()
  39. onceJob.submit()
  40. println(onceJob.getId)
  41. onceJob.waitForCompleted()
  42. System.exit(0)
  43. }
  44. }