集群执行

Flink programs can run distributed on clusters of many machines. There are two ways to send a program to a cluster for execution:

Command Line Interface

The command line interface lets you submit packaged programs (JARs) to a cluster (or single machine setup).

Please refer to the Command Line Interface documentation for details.

Remote Environment

The remote environment lets you execute Flink Java programs on a cluster directly. The remote environment points to the cluster on which you want to execute the program.

Maven Dependency

If you are developing your program as a Maven project, you have to add the flink-clients module using this dependency:

  1. <dependency>
  2. <groupId>org.apache.flink</groupId>
  3. <artifactId>flink-clients_2.11</artifactId>
  4. <version>1.11.0</version>
  5. </dependency>

Example

The following illustrates the use of the RemoteEnvironment:

  1. public static void main(String[] args) throws Exception {
  2. ExecutionEnvironment env = ExecutionEnvironment
  3. .createRemoteEnvironment("flink-jobmanager", 8081, "/home/user/udfs.jar");
  4. DataSet<String> data = env.readTextFile("hdfs://path/to/file");
  5. data
  6. .filter(new FilterFunction<String>() {
  7. public boolean filter(String value) {
  8. return value.startsWith("http://");
  9. }
  10. })
  11. .writeAsText("hdfs://path/to/result");
  12. env.execute();
  13. }

Note that the program contains custom user code and hence requires a JAR file with the classes of the code attached. The constructor of the remote environment takes the path(s) to the JAR file(s).