How to use Maven to configure your project

This guide will show you how to configure a Flink job project with Maven, an open-source build automation tool developed by the Apache Software Foundation that enables you to build, publish, and deploy projects. You can use it to manage the entire lifecycle of your software project.

Requirements

  • Maven 3.8.6 (recommended or higher)
  • Java 8 (deprecated) or Java 11

Importing the project into your IDE

Once the project folder and files have been created, we recommend that you import this project into your IDE for developing and testing.

IntelliJ IDEA supports Maven projects out-of-the-box. Eclipse offers the m2e plugin to import Maven projects.

Note: The default JVM heap size for Java may be too small for Flink and you have to manually increase it. In Eclipse, choose Run Configurations -> Arguments and write into the VM Arguments box: -Xmx800m. In IntelliJ IDEA recommended way to change JVM options is from the Help | Edit Custom VM Options menu. See this article for details.

Note on IntelliJ: To make the applications run within IntelliJ IDEA, it is necessary to tick the Include dependencies with "Provided" scope box in the run configuration. If this option is not available (possibly due to using an older IntelliJ IDEA version), then a workaround is to create a test that calls the application’s main() method.

Building the project

If you want to build/package your project, navigate to your project directory and run the ‘mvn clean package’ command. You will find a JAR file that contains your application (plus connectors and libraries that you may have added as dependencies to the application) here:target/<artifact-id>-<version>.jar.

Note: If you used a different class than DataStreamJob as the application’s main class / entry point, we recommend you change the mainClass setting in the pom.xml file accordingly so that Flink can run the application from the JAR file without additionally specifying the main class.

Adding dependencies to the project

Open the pom.xml file in your project directory and add the dependency in between the dependencies tab.

For example, you can add the Kafka connector as a dependency like this:

  1. <dependencies>
  2. <dependency>
  3. <groupId>org.apache.flink</groupId>
  4. <artifactId>flink-connector-kafka</artifactId>
  5. <version>1.18.1</version>
  6. </dependency>
  7. </dependencies>

Then execute mvn install on the command line.

Projects created from the Java Project Template, the Scala Project Template, or Gradle are configured to automatically include the application dependencies into the application JAR when you run mvn clean package. For projects that are not set up from those templates, we recommend adding the Maven Shade Plugin to build the application jar with all required dependencies.

Important: Note that all these core API dependencies should have their scope set to provided. This means that they are needed to compile against, but that they should not be packaged into the project’s resulting application JAR file. If not set to provided, the best case scenario is that the resulting JAR becomes excessively large, because it also contains all Flink core dependencies. The worst case scenario is that the Flink core dependencies that are added to the application’s JAR file clash with some of your own dependency versions (which is normally avoided through inverted classloading).

To correctly package the dependencies into the application JAR, the Flink API dependencies must be set to the compile scope.

Packaging the application

Depending on your use case, you may need to package your Flink application in different ways before it gets deployed to a Flink environment.

If you want to create a JAR for a Flink Job and use only Flink dependencies without any third-party dependencies (i.e. using the filesystem connector with JSON format), you do not need to create an uber/fat JAR or shade any dependencies.

If you want to create a JAR for a Flink Job and use external dependencies not built into the Flink distribution, you can either add them to the classpath of the distribution or shade them into your uber/fat application JAR.

With the generated uber/fat JAR, you can submit it to a local or remote cluster with:

  1. bin/flink run -c org.example.MyJob myFatJar.jar

To learn more about how to deploy Flink jobs, check out the deployment guide.

Template for creating an uber/fat JAR with dependencies

To build an application JAR that contains all dependencies required for declared connectors and libraries, you can use the following shade plugin definition:

  1. <build>
  2. <plugins>
  3. <plugin>
  4. <groupId>org.apache.maven.plugins</groupId>
  5. <artifactId>maven-shade-plugin</artifactId>
  6. <version>3.1.1</version>
  7. <executions>
  8. <execution>
  9. <phase>package</phase>
  10. <goals>
  11. <goal>shade</goal>
  12. </goals>
  13. <configuration>
  14. <artifactSet>
  15. <excludes>
  16. <exclude>com.google.code.findbugs:jsr305</exclude>
  17. </excludes>
  18. </artifactSet>
  19. <filters>
  20. <filter>
  21. <!-- Do not copy the signatures in the META-INF folder.
  22. Otherwise, this might cause SecurityExceptions when using the JAR. -->
  23. <artifact>*:*</artifact>
  24. <excludes>
  25. <exclude>META-INF/*.SF</exclude>
  26. <exclude>META-INF/*.DSA</exclude>
  27. <exclude>META-INF/*.RSA</exclude>
  28. </excludes>
  29. </filter>
  30. </filters>
  31. <transformers>
  32. <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
  33. <!-- Replace this with the main class of your job -->
  34. <mainClass>my.programs.main.clazz</mainClass>
  35. </transformer>
  36. <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
  37. </transformers>
  38. </configuration>
  39. </execution>
  40. </executions>
  41. </plugin>
  42. </plugins>
  43. </build>

The Maven shade plugin will include, by default, all the dependencies in the “runtime” and “compile” scope.