Build && Deployment

Currently inlong-sort is based on flink, before you run an inlong-sort application, you need to set up flink environment.

how to set up flink environment

Currently, inlong-sort relys on flink-1.9.3. Chose flink-1.9.3-bin-scala_2.11.tgz when downloading package.

Once your flink environment is set up, you can visit web ui of flink, whose address is stored in /${your_flink_path}/conf/masters.

Prepare installation files

All installation files at inlong-sort directory.

Starting an inlong-sort application

Now you can submit job to flink with the jar compiled.

how to submit job to flink

Example:

  • ./bin/flink run -c org.apache.inlong.sort.flink.Entrance inlong-sort-core-1.0-SNAPSHOT.jar --cluster-id my_application --zookeeper.quorum 127.0.0.1:2181 --zookeeper.path.root /inlong-sort --source.type tubemq --sink.type hive

Notice:

  • -c org.apache.inlong.sort.flink.Entrance is the main class name

  • inlong-sort-core-1.0-SNAPSHOT.jar is the compiled jar

Necessary configurations

  • --cluster-id which is used to represent a specified inlong-sort application
  • --zookeeper.quorum zk quorum
  • --zookeeper.path.root zk root path
  • --source.type source of the application, currently “tubemq” and “pulsar” are supported
  • --sink.type sink of the application, currently “clickhouse” and “hive” are supported

Configurations above are necessary, you can see full configurations in

~/Inlong/inlong-sort/common/src/main/java/org/apache/inlong/sort/configuration/Constants.java

Example

--cluster-id my_application --zookeeper.quorum 192.127.0.1:2181 --zookeeper.path.root /zk_root --source.type tubemq --sink.type hive

All configurations

namenecessarydefault valuedescription
cluster-idYNAused to represent a specified inlong-sort application
zookeeper.quorumYNAzk quorum
zookeeper.path.rootY“/inlong-sort”zk root path
source.typeYNAsource of the application, currently “tubemq” and “pulsar” are supported
sink.typeYNAsink of the application, currently “clickhouse” and “hive” are supported
source.parallelismN1parallelism of source
deserialization.parallelismN1parallelism of deserialization
sink.parallelismN1parallelism of sink
tubemq.master.addressNNAtube master address used if absent in DataFlowInfo on zk
tubemq.session.keyN“inlong-sort”session key used when subscribing to tubemq
tubemq.bootstrap.from.maxNfalsewhether consume from max or not when subscribing to tubemq
tubemq.message.not.found.wait.periodN350msThe time of waiting period if tube broker return message not found
tubemq.subscribe.retry.timeoutN300000The time of subscribing tube timeout, in millisecond
zookeeper.client.session-timeoutN60000The session timeout for the ZooKeeper session in ms
zookeeper.client.connection-timeoutN15000The connection timeout for ZooKeeper in ms
zookeeper.client.retry-waitN5000The pause between consecutive retries in ms
zookeeper.client.max-retry-attemptsN3The number of connection retries before the client gives up
zookeeper.client.aclN“open”Defines the ACL (open/creator) to be configured on ZK node. The configuration value can be set to “creator” if the ZooKeeper server configuration has the “authProvider” property mapped to use SASLAuthenticationProvider and the cluster is configured to run in secure mode (Kerberos)
zookeeper.sasl.disableNfalseWhether disable zk sasl or not