Flink

已配置前置组件

  • SFTP
  • YARN
  • HDFS

配置Flink的前提是YARN、HDFS组件正常配置并保存

tip

部署模式分为 perjobsession 两种模式

参数说明

perjob、session公共参数

参数项默认值说明是否必填
clusterModeperjob、session任务执行模式:perjob, session
flinkJarPath/data/insight_plugin/flink110_libflink lib path(taier本地目录)
remoteFlinkJarPath/data/insight_plugin/flink110_libflink lib 远程路径
flinkPluginRoot/data/insight_pluginflinkx plugins父级本地目录(taier本地目录)
remotePluginRootDir/data/insight_pluginflinkx plugins父级远程目录
pluginLoadModeshipfile插件加载类型
monitorAcceptedAppfalse是否监控yarn accepted状态任务
yarnAccepterTaskNumber3允许yarn accepter任务数量,达到这个值后不允许任务提交
prometheusHostprometheus地址,获取数据同步指标使用
prometheusPort9090prometheus,获取数据同步指标使用
classloader.dtstack-cachetrue是否缓存classloader

session特定参数

参数项默认值说明是否必填
checkSubmitJobGraphInterval60session check间隔(60 * 10s)
flinkSessionSlotCount10flink session允许的最大slot数
sessionRetryNum5session重试次数,达到后会放缓重试的频率
sessionStartAutotrue是否允许Taier启动拉起flink session
flinkSessionNameflink_sessionflink session任务名
jobmanager.heap.mb2048jobmanager内存大小
taskmanager.heap.mb1024taskmanager内存大小
参数项默认值说明是否必填
env.java.opts-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:MaxMetaspaceSize=300m -Dfile.encoding=UTF-8jvm参数
classloader.resolve-orderperjob默认为child-first,session默认为parent-first类加载模式
high-availabilityZOOKEEPERflink ha类型
high-availability.zookeeper.quorumzookeeper地址,当ha选择是zookeeper时必填
high-availability.zookeeper.path.root/flink110ha节点路径
high-availability.storageDirhdfs://ns1/dtInsight/flink110/haha元数据存储路径
jobmanager.archive.fs.dirhdfs://ns1/dtInsight/flink110/completed-jobs 任务结束后任务信息存储路径 是
metrics.reporter.promgateway.classorg.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter用来推送指标类
metrics.reporter.promgateway.hostpromgateway地址
metrics.reporter.promgateway.port9091promgateway端口
metrics.reporter.promgateway.deleteOnShutdowntrue任务结束后是否删除指标 是
metrics.reporter.promgateway.jobName110job 指标任务名
metrics.reporter.promgateway.randomJobNameSuffixtrue是否在任务名上添加随机值
state.backendRocksDB状态后端
state.backend.incrementaltrue是否开启增量
state.checkpoints.dirhdfs://ns1/dtInsight/flink110/checkpointscheckpoint路径地址
state.checkpoints.num-retained11checkpoint保存个数
state.savepoints.dirhdfs://ns1/dtInsight/flink110/savepointssavepoint路径
yarn.application-attempts3重试次数
yarn.application-attempt-failures-validity-interval3600000重试窗口时间大小
akka.ask.timeout60 s
akka.tcp.timeout60 s

更多 Flink 参数项详见官方文档

tip

Flink在自定义参数中添加Flink官方参数来调整任务提交参数信息

文件结构

tip

FlinkJarPath为Flink jar 需要配置taier部署机器上的centos路径

如 flinkJarPath 配置为/opt/dtstack/flink110_lib
/opt/dtstack/flink110_lib 目录包含文件为:

  1. ├── flink-dist_2.11-1.10.0.jar
  2. ├── flink-metrics-prometheus-1.10.0.jar
  3. ├── flink-shaded-hadoop-2-uber-2.7.5-10.0.jar
  4. ├── flink-streaming-java_2.11-1.10.0.jar
  5. ├── flink-table_2.11-1.10.0.jar
  6. ├── flink-table-blink_2.11-1.10.0.jar
  7. └── log4j-1.2.17.jar
tip

FlinkPluginRoot配置的是chunjun的插件包目录 需要配置taier部署机器上的centos路径

如 flinkPluginRoot 配置为 /data/insight_plugin1.12/chunjun-dist
/data/insight_plugin1.12/chunjun-dist 目录包含文件为:

  1. /data/insight_plugin1.12/chunjun-dist
  2. ├── chunjun-core-master.jar
  3. ├── connector
  4. ├── binlog
  5. └── chunjun-connector-binlog-master.jar
  6. ├── cassandra
  7. └── chunjun-connector-cassandra-master.jar
  8. ├── clickhouse
  9. └── chunjun-connector-clickhouse-master.jar
  10. ├── db2
  11. └── chunjun-connector-db2-master.jar
  12. ├── dm
  13. └── chunjun-connector-dm-master.jar
  14. ├── doris
  15. └── chunjun-connector-doris-master.jar
  16. ├── elasticsearch7
  17. └── chunjun-connector-elasticsearch7-master.jar
  18. ├── emqx
  19. └── chunjun-connector-emqx-master.jar
  20. ├── file
  21. └── chunjun-connector-file-master.jar
  22. ├── filesystem
  23. └── chunjun-connector-filesystem-master.jar
  24. ├── ftp
  25. └── chunjun-connector-ftp-master.jar
  26. ├── gbase
  27. └── chunjun-connector-gbase-master.jar
  28. ├── greenplum
  29. └── chunjun-connector-greenplum-master.jar
  30. ├── hbase14
  31. └── chunjun-connector-hbase-1.4-master.jar
  32. ├── hdfs
  33. └── chunjun-connector-hdfs-master.jar
  34. ├── hive
  35. └── chunjun-connector-hive-master.jar
  36. ├── http
  37. └── chunjun-connector-http-master.jar
  38. ├── influxdb
  39. └── chunjun-connector-influxdb-master.jar
  40. ├── kingbase
  41. └── chunjun-connector-kingbase-master.jar
  42. ├── kudu
  43. └── chunjun-connector-kudu-master.jar
  44. ├── mongodb
  45. └── chunjun-connector-mongodb-master.jar
  46. ├── mysql
  47. └── chunjun-connector-mysql-master.jar
  48. ├── mysqld
  49. └── chunjun-connector-mysqld-master.jar
  50. ├── oceanbase
  51. └── chunjun-connector-oceanbase-master.jar
  52. ├── oracle
  53. └── chunjun-connector-oracle-master.jar
  54. ├── oraclelogminer
  55. └── chunjun-connector-oraclelogminer-master.jar
  56. ├── pgwal
  57. └── chunjun-connector-pgwal-master.jar
  58. ├── postgresql
  59. └── chunjun-connector-postgresql-master.jar
  60. ├── redis
  61. └── chunjun-connector-redis-master.jar
  62. ├── saphana
  63. └── chunjun-connector-saphana-master.jar
  64. ├── socket
  65. └── chunjun-connector-socket-master.jar
  66. ├── solr
  67. └── chunjun-connector-solr-master.jar
  68. ├── sqlserver
  69. └── chunjun-connector-sqlserver-master.jar
  70. ├── sqlservercdc
  71. └── chunjun-connector-sqlservercdc-master.jar
  72. ├── starrocks
  73. └── chunjun-connector-starrocks-master.jar
  74. └── stream
  75. └── chunjun-connector-stream-master.jar
  76. ├── ddl
  77. └── mysql
  78. └── chunjun-ddl-mysql-master.jar
  79. ├── dirty-data-collector
  80. ├── log
  81. └── chunjun-dirty-log-master.jar
  82. └── mysql
  83. └── chunjun-dirty-mysql-master.jar
  84. ├── formats
  85. └── pbformat
  86. └── flinkx-protobuf-master.jar
  87. ├── metrics
  88. ├── mysql
  89. └── chunjun-metrics-mysql-master.jar
  90. └── prometheus
  91. └── chunjun-metrics-prometheus-master.jar
  92. └── restore-plugins
  93. └── mysql
  94. └── chunjun-restore-mysql-master.jar

Flink 配置

caution

配置好数据同步任务之后运行,如果一直提示等待运行,可以去monitor.log查看相应日志,确认是否有部分参数配置错误