Application Profiling & Debugging

Each standalone JobManager, TaskManager, HistoryServer, and ZooKeeper daemon redirects stdout and stderr to a filewith a .out filename suffix and writes internal logging to a file with a .log suffix. Java options configured by theuser in env.java.opts, env.java.opts.jobmanager, env.java.opts.taskmanager and env.java.opts.historyserver can likewise define log files withuse of the script variable FLINK_LOG_PREFIX and by enclosing the options in double quotes for late evaluation. Log filesusing FLINK_LOG_PREFIX are rotated along with the default .out and .log files.

Profiling with Java Flight Recorder

Java Flight Recorder is a profiling and event collection framework built into the Oracle JDK.Java Mission Controlis an advanced set of tools that enables efficient and detailed analysis of the extensive of data collected by JavaFlight Recorder. Example configuration:

  1. env.java.opts: "-XX:+UnlockCommercialFeatures -XX:+UnlockDiagnosticVMOptions -XX:+FlightRecorder -XX:+DebugNonSafepoints -XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,dumponexitpath=${FLINK_LOG_PREFIX}.jfr"

Profiling with JITWatch

JITWatch is a log analyser and visualizer for the Java HotSpot JITcompiler used to inspect inlining decisions, hot methods, bytecode, and assembly. Example configuration:

  1. env.java.opts: "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly"

Analyzing Out of Memory Problems

If you encounter OutOfMemoryExceptions with your Flink application, then it is a good idea to enable heap dumps on out of memory errors.

  1. env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${FLINK_LOG_PREFIX}.hprof"

The heap dump will allow you to analyze potential memory leaks in your user code.If the memory leak should be caused by Flink, then please reach out to the dev mailing list.

Analyzing Memory & Garbage Collection Behaviour

Memory usage and garbage collection can have a profound impact on your application.The effects can range from slight performance degradation to a complete cluster failure if the GC pauses are too long.If you want to better understand the memory and GC behaviour of your application, then you can enable memory logging on the TaskManagers.

  1. taskmanager.debug.memory.log: true
  2. taskmanager.debug.memory.log-interval: 10000 // 10s interval

If you are interested in more detailed GC statistics, then you can activate the JVM’s GC logging via:

  1. env.java.opts: "-Xloggc:${FLINK_LOG_PREFIX}.gc.log -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+PrintPromotionFailure -XX:+PrintGCCause"