Project Configuration

The guides in this section will show you how to configure your projects via popular build tools (Maven, Gradle), add the necessary dependencies (i.e. connectors and formats, testing), and cover some advanced configuration topics.

Every Flink application depends on a set of Flink libraries. At a minimum, the application depends on the Flink APIs and, in addition, on certain connector libraries (i.e. Kafka, Cassandra) and 3rd party dependencies required to the user to develop custom functions to process the data.

Getting started

To get started working on your Flink application, use the following commands, scripts, and templates to create a Flink project.

Maven

You can create a project based on an Archetype with the Maven command below or use the provided quickstart bash script.

All Flink Scala APIs are deprecated and will be removed in a future Flink version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.

See FLIP-265 Deprecate and remove Scala API support

Maven command

  1. $ mvn archetype:generate \
  2. -DarchetypeGroupId=org.apache.flink \
  3. -DarchetypeArtifactId=flink-quickstart-java \
  4. -DarchetypeVersion=1.19.0

This allows you to name your newly created project and will interactively ask you for the groupId, artifactId, and package name.

Quickstart script

  1. $ curl https://flink.apache.org/q/quickstart.sh | bash -s 1.19.0

Gradle

You can create an empty project, where you are required to create the src/main/java and src/main/resources directories manually and start writing some class(es) in that, with the use of the following Gradle build script or instead use the provided quickstart bash script to get a completely functional startup project.

Gradle build script

To execute these build configuration scripts, run the gradle command in the directory with these scripts.

build.gradle

  1. plugins {
  2. id 'java'
  3. id 'application'
  4. // shadow plugin to produce fat JARs
  5. id 'com.github.johnrengelman.shadow' version '7.1.2'
  6. }
  7. // artifact properties
  8. group = 'org.quickstart'
  9. version = '0.1-SNAPSHOT'
  10. mainClassName = 'org.quickstart.DataStreamJob'
  11. description = """Flink Quickstart Job"""
  12. ext {
  13. javaVersion = '1.8'
  14. flinkVersion = '1.19.0'
  15. scalaBinaryVersion = '_2.12'
  16. slf4jVersion = '1.7.36'
  17. log4jVersion = '2.17.1'
  18. }
  19. sourceCompatibility = javaVersion
  20. targetCompatibility = javaVersion
  21. tasks.withType(JavaCompile) {
  22. options.encoding = 'UTF-8'
  23. }
  24. applicationDefaultJvmArgs = ["-Dlog4j.configurationFile=log4j2.properties"]
  25. // declare where to find the dependencies of your project
  26. repositories {
  27. mavenCentral()
  28. maven {
  29. url "https://repository.apache.org/content/repositories/snapshots"
  30. mavenContent {
  31. snapshotsOnly()
  32. }
  33. }
  34. }
  35. // NOTE: We cannot use "compileOnly" or "shadow" configurations since then we could not run code
  36. // in the IDE or with "gradle run". We also cannot exclude transitive dependencies from the
  37. // shadowJar yet (see https://github.com/johnrengelman/shadow/issues/159).
  38. // -> Explicitly define the // libraries we want to be included in the "flinkShadowJar" configuration!
  39. configurations {
  40. flinkShadowJar // dependencies which go into the shadowJar
  41. // always exclude these (also from transitive dependencies) since they are provided by Flink
  42. flinkShadowJar.exclude group: 'org.apache.flink', module: 'force-shading'
  43. flinkShadowJar.exclude group: 'com.google.code.findbugs', module: 'jsr305'
  44. flinkShadowJar.exclude group: 'org.slf4j'
  45. flinkShadowJar.exclude group: 'org.apache.logging.log4j'
  46. }
  47. // declare the dependencies for your production and test code
  48. dependencies {
  49. // --------------------------------------------------------------
  50. // Compile-time dependencies that should NOT be part of the
  51. // shadow (uber) jar and are provided in the lib folder of Flink
  52. // --------------------------------------------------------------
  53. implementation "org.apache.flink:flink-streaming-java:${flinkVersion}"
  54. implementation "org.apache.flink:flink-clients:${flinkVersion}"
  55. // --------------------------------------------------------------
  56. // Dependencies that should be part of the shadow jar, e.g.
  57. // connectors. These must be in the flinkShadowJar configuration!
  58. // --------------------------------------------------------------
  59. //flinkShadowJar "org.apache.flink:flink-connector-kafka:${flinkVersion}"
  60. runtimeOnly "org.apache.logging.log4j:log4j-slf4j-impl:${log4jVersion}"
  61. runtimeOnly "org.apache.logging.log4j:log4j-api:${log4jVersion}"
  62. runtimeOnly "org.apache.logging.log4j:log4j-core:${log4jVersion}"
  63. // Add test dependencies here.
  64. // testCompile "junit:junit:4.12"
  65. }
  66. // make compileOnly dependencies available for tests:
  67. sourceSets {
  68. main.compileClasspath += configurations.flinkShadowJar
  69. main.runtimeClasspath += configurations.flinkShadowJar
  70. test.compileClasspath += configurations.flinkShadowJar
  71. test.runtimeClasspath += configurations.flinkShadowJar
  72. javadoc.classpath += configurations.flinkShadowJar
  73. }
  74. run.classpath = sourceSets.main.runtimeClasspath
  75. jar {
  76. manifest {
  77. attributes 'Built-By': System.getProperty('user.name'),
  78. 'Build-Jdk': System.getProperty('java.version')
  79. }
  80. }
  81. shadowJar {
  82. configurations = [project.configurations.flinkShadowJar]
  83. }

settings.gradle

  1. rootProject.name = 'quickstart'

Quickstart script

  1. bash -c "$(curl https://flink.apache.org/q/gradle-quickstart.sh)" -- 1.19.0 _2.12

Which dependencies do you need?

To start working on a Flink job, you usually need the following dependencies:

And in addition to these, you might want to add 3rd party dependencies that you need to develop custom functions.

Flink offers two major APIs: Datastream API and Table API & SQL. They can be used separately, or they can be mixed, depending on your use cases:

APIs you want to useDependency you need to add
DataStreamflink-streaming-java
DataStream with Scalaflink-streaming-scala_2.12
Table APIflink-table-api-java
Table API with Scalaflink-table-api-scala_2.12
Table API + DataStreamflink-table-api-java-bridge
Table API + DataStream with Scalaflink-table-api-scala-bridge_2.12

Just include them in your build tool script/descriptor, and you can start developing your job!

Running and packaging

If you want to run your job by simply executing the main class, you will need flink-clients in your classpath. In case of Table API programs, you will also need flink-table-runtime and flink-table-planner-loader.

As a rule of thumb, we suggest packaging the application code and all its required dependencies into one fat/uber JAR. This includes packaging connectors, formats, and third-party dependencies of your job. This rule does not apply to Java APIs, DataStream Scala APIs, and the aforementioned runtime modules, which are already provided by Flink itself and should not be included in a job uber JAR. This job JAR can be submitted to an already running Flink cluster, or added to a Flink application container image easily without modifying the distribution.

What’s next?

  • To start developing your job, check out DataStream API and Table API & SQL.
  • For more details on how to package your job depending on the build tools, check out the following specific guides:
  • For more advanced topics about project configuration, check out the section on advanced topics.