State & Fault Tolerance
Stateful functions and operators store data across the processing of individual elements/events, making state a critical building block forany type of more elaborate operation.
For example:
- When an application searches for certain event patterns, the state will store the sequence of events encountered so far.
- When aggregating events per minute/hour/day, the state holds the pending aggregates.
- When training a machine learning model over a stream of data points, the state holds the current version of the model parameters.
- When historic data needs to be managed, the state allows efficient access to events that occurred in the past.Flink needs to be aware of the state in order to make state fault tolerant using checkpoints and to allow savepoints of streaming applications.
Knowledge about the state also allows for rescaling Flink applications, meaning that Flink takes care of redistributing state across parallel instances.
The queryable state feature of Flink allows you to access state from outside of Flink during runtime.
When working with state, it might also be useful to read about Flink’s state backends. Flink provides different state backends that specify how and where state is stored. State can be located on Java’s heap or off-heap. Depending on your state backend, Flink can also manage the state for the application, meaning Flink deals with the memory management (possibly spilling to disk if necessary) to allow applications to hold very large state. State backends can be configured without changing your application logic.
Where to go next?
- Working with State: Shows how to use state in a Flink application and explains the different kinds of state.
- The Broadcast State Pattern: Explains how to connect a broadcast stream with a non-broadcast stream and use state to exchange information between them.
- Checkpointing: Describes how to enable and configure checkpointing for fault tolerance.
- Queryable State: Explains how to access state from outside of Flink during runtime.
- State Schema Evolution: Shows how schema of state types can be evolved.
- Custom Serialization for Managed State: Discusses how to implement custom serializers, especially for schema evolution.