Time Attributes

Flink is able to process streaming data based on different notions of time.

  • Processing time refers to the system time of the machine (also known as “wall-clock time”) that is executing the respective operation.
  • Event time refers to the processing of streaming data based on timestamps which are attached to each row. The timestamps can encode when an event happened.
  • Ingestion time is the time that events enter Flink; internally, it is treated similarly to event time.

For more information about time handling in Flink, see the introduction about Event Time and Watermarks.

This page explains how time attributes can be defined for time-based operations in Flink’s Table API & SQL.

Introduction to Time Attributes

Time-based operations such as windows in both the Table API and SQL require information about the notion of time and its origin. Therefore, tables can offer logical time attributes for indicating time and accessing corresponding timestamps in table programs.

Time attributes can be part of every table schema. They are defined when creating a table from a DataStream or are pre-defined when using a TableSource. Once a time attribute has been defined at the beginning, it can be referenced as a field and can be used in time-based operations.

As long as a time attribute is not modified and is simply forwarded from one part of the query to another, it remains a valid time attribute. Time attributes behave like regular timestamps and can be accessed for calculations. If a time attribute is used in a calculation, it will be materialized and becomes a regular timestamp. Regular timestamps do not cooperate with Flink’s time and watermarking system and thus can not be used for time-based operations anymore.

Table programs require that the corresponding time characteristic has been specified for the streaming environment:

  1. final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  2. env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime); // default
  3. // alternatively:
  4. // env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);
  5. // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
  1. val env = StreamExecutionEnvironment.getExecutionEnvironment
  2. env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) // default
  3. // alternatively:
  4. // env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
  5. // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
  1. env = StreamExecutionEnvironment.get_execution_environment()
  2. env.set_stream_time_characteristic(TimeCharacteristic.ProcessingTime) # default
  3. # alternatively:
  4. # env.set_stream_time_characteristic(TimeCharacteristic.IngestionTime)
  5. # env.set_stream_time_characteristic(TimeCharacteristic.EventTime)

Processing time

Processing time allows a table program to produce results based on the time of the local machine. It is the simplest notion of time but does not provide determinism. It neither requires timestamp extraction nor watermark generation.

There are two ways to define a processing time attribute.

During DataStream-to-Table Conversion

The processing time attribute is defined with the .proctime property during schema definition. The time attribute must only extend the physical schema by an additional logical field. Thus, it can only be defined at the end of the schema definition.

  1. DataStream<Tuple2<String, String>> stream = ...;
  2. // declare an additional logical field as a processing time attribute
  3. Table table = tEnv.fromDataStream(stream, "Username, Data, UserActionTime.proctime");
  4. WindowedTable windowedTable = table.window(Tumble.over("10.minutes").on("UserActionTime").as("userActionWindow"));
  1. val stream: DataStream[(String, String)] = ...
  2. // declare an additional logical field as a processing time attribute
  3. val table = tEnv.fromDataStream(stream, 'UserActionTimestamp, 'Username, 'Data, 'UserActionTime.proctime)
  4. val windowedTable = table.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)

Using a TableSource

The processing time attribute is defined by a TableSource that implements the DefinedProctimeAttribute interface. The logical time attribute is appended to the physical schema defined by the return type of the TableSource.

  1. // define a table source with a processing attribute
  2. public class UserActionSource implements StreamTableSource<Row>, DefinedProctimeAttribute {
  3. @Override
  4. public TypeInformation<Row> getReturnType() {
  5. String[] names = new String[] {"Username" , "Data"};
  6. TypeInformation[] types = new TypeInformation[] {Types.STRING(), Types.STRING()};
  7. return Types.ROW(names, types);
  8. }
  9. @Override
  10. public DataStream<Row> getDataStream(StreamExecutionEnvironment execEnv) {
  11. // create stream
  12. DataStream<Row> stream = ...;
  13. return stream;
  14. }
  15. @Override
  16. public String getProctimeAttribute() {
  17. // field with this name will be appended as a third field
  18. return "UserActionTime";
  19. }
  20. }
  21. // register table source
  22. tEnv.registerTableSource("UserActions", new UserActionSource());
  23. WindowedTable windowedTable = tEnv
  24. .scan("UserActions")
  25. .window(Tumble.over("10.minutes").on("UserActionTime").as("userActionWindow"));
  1. // define a table source with a processing attribute
  2. class UserActionSource extends StreamTableSource[Row] with DefinedProctimeAttribute {
  3. override def getReturnType = {
  4. val names = Array[String]("Username" , "Data")
  5. val types = Array[TypeInformation[_]](Types.STRING, Types.STRING)
  6. Types.ROW(names, types)
  7. }
  8. override def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[Row] = {
  9. // create stream
  10. val stream = ...
  11. stream
  12. }
  13. override def getProctimeAttribute = {
  14. // field with this name will be appended as a third field
  15. "UserActionTime"
  16. }
  17. }
  18. // register table source
  19. tEnv.registerTableSource("UserActions", new UserActionSource)
  20. val windowedTable = tEnv
  21. .scan("UserActions")
  22. .window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)

Event time

Event time allows a table program to produce results based on the time that is contained in every record. This allows for consistent results even in case of out-of-order events or late events. It also ensures replayable results of the table program when reading records from persistent storage.

Additionally, event time allows for unified syntax for table programs in both batch and streaming environments. A time attribute in a streaming environment can be a regular field of a record in a batch environment.

In order to handle out-of-order events and distinguish between on-time and late events in streaming, Flink needs to extract timestamps from events and make some kind of progress in time (so-called watermarks).

An event time attribute can be defined either during DataStream-to-Table conversion or by using a TableSource.

During DataStream-to-Table Conversion

The event time attribute is defined with the .rowtime property during schema definition. Timestamps and watermarks must have been assigned in the DataStream that is converted.

There are two ways of defining the time attribute when converting a DataStream into a Table. Depending on whether the specified .rowtime field name exists in the schema of the DataStream or not, the timestamp field is either

  • appended as a new field to the schema or
  • replaces an existing field.

In either case the event time timestamp field will hold the value of the DataStream event time timestamp.

  1. // Option 1:
  2. // extract timestamp and assign watermarks based on knowledge of the stream
  3. DataStream<Tuple2<String, String>> stream = inputStream.assignTimestampsAndWatermarks(...);
  4. // declare an additional logical field as an event time attribute
  5. Table table = tEnv.fromDataStream(stream, "Username, Data, UserActionTime.rowtime");
  6. // Option 2:
  7. // extract timestamp from first field, and assign watermarks based on knowledge of the stream
  8. DataStream<Tuple3<Long, String, String>> stream = inputStream.assignTimestampsAndWatermarks(...);
  9. // the first field has been used for timestamp extraction, and is no longer necessary
  10. // replace first field with a logical event time attribute
  11. Table table = tEnv.fromDataStream(stream, "UserActionTime.rowtime, Username, Data");
  12. // Usage:
  13. WindowedTable windowedTable = table.window(Tumble.over("10.minutes").on("UserActionTime").as("userActionWindow"));
  1. // Option 1:
  2. // extract timestamp and assign watermarks based on knowledge of the stream
  3. val stream: DataStream[(String, String)] = inputStream.assignTimestampsAndWatermarks(...)
  4. // declare an additional logical field as an event time attribute
  5. val table = tEnv.fromDataStream(stream, 'Username, 'Data, 'UserActionTime.rowtime)
  6. // Option 2:
  7. // extract timestamp from first field, and assign watermarks based on knowledge of the stream
  8. val stream: DataStream[(Long, String, String)] = inputStream.assignTimestampsAndWatermarks(...)
  9. // the first field has been used for timestamp extraction, and is no longer necessary
  10. // replace first field with a logical event time attribute
  11. val table = tEnv.fromDataStream(stream, 'UserActionTime.rowtime, 'Username, 'Data)
  12. // Usage:
  13. val windowedTable = table.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)

Using a TableSource

The event time attribute is defined by a TableSource that implements the DefinedRowtimeAttributes interface. The getRowtimeAttributeDescriptors() method returns a list of RowtimeAttributeDescriptor for describing the final name of a time attribute, a timestamp extractor to derive the values of the attribute, and the watermark strategy associated with the attribute.

Please make sure that the DataStream returned by the getDataStream() method is aligned with the defined time attribute.The timestamps of the DataStream (the ones which are assigned by a TimestampAssigner) are only considered if a StreamRecordTimestamp timestamp extractor is defined.Watermarks of a DataStream are only preserved if a PreserveWatermarks watermark strategy is defined.Otherwise, only the values of the TableSource’s rowtime attribute are relevant.

  1. // define a table source with a rowtime attribute
  2. public class UserActionSource implements StreamTableSource<Row>, DefinedRowtimeAttributes {
  3. @Override
  4. public TypeInformation<Row> getReturnType() {
  5. String[] names = new String[] {"Username", "Data", "UserActionTime"};
  6. TypeInformation[] types =
  7. new TypeInformation[] {Types.STRING(), Types.STRING(), Types.LONG()};
  8. return Types.ROW(names, types);
  9. }
  10. @Override
  11. public DataStream<Row> getDataStream(StreamExecutionEnvironment execEnv) {
  12. // create stream
  13. // ...
  14. // assign watermarks based on the "UserActionTime" attribute
  15. DataStream<Row> stream = inputStream.assignTimestampsAndWatermarks(...);
  16. return stream;
  17. }
  18. @Override
  19. public List<RowtimeAttributeDescriptor> getRowtimeAttributeDescriptors() {
  20. // Mark the "UserActionTime" attribute as event-time attribute.
  21. // We create one attribute descriptor of "UserActionTime".
  22. RowtimeAttributeDescriptor rowtimeAttrDescr = new RowtimeAttributeDescriptor(
  23. "UserActionTime",
  24. new ExistingField("UserActionTime"),
  25. new AscendingTimestamps());
  26. List<RowtimeAttributeDescriptor> listRowtimeAttrDescr = Collections.singletonList(rowtimeAttrDescr);
  27. return listRowtimeAttrDescr;
  28. }
  29. }
  30. // register the table source
  31. tEnv.registerTableSource("UserActions", new UserActionSource());
  32. WindowedTable windowedTable = tEnv
  33. .scan("UserActions")
  34. .window(Tumble.over("10.minutes").on("UserActionTime").as("userActionWindow"));
  1. // define a table source with a rowtime attribute
  2. class UserActionSource extends StreamTableSource[Row] with DefinedRowtimeAttributes {
  3. override def getReturnType = {
  4. val names = Array[String]("Username" , "Data", "UserActionTime")
  5. val types = Array[TypeInformation[_]](Types.STRING, Types.STRING, Types.LONG)
  6. Types.ROW(names, types)
  7. }
  8. override def getDataStream(execEnv: StreamExecutionEnvironment): DataStream[Row] = {
  9. // create stream
  10. // ...
  11. // assign watermarks based on the "UserActionTime" attribute
  12. val stream = inputStream.assignTimestampsAndWatermarks(...)
  13. stream
  14. }
  15. override def getRowtimeAttributeDescriptors: util.List[RowtimeAttributeDescriptor] = {
  16. // Mark the "UserActionTime" attribute as event-time attribute.
  17. // We create one attribute descriptor of "UserActionTime".
  18. val rowtimeAttrDescr = new RowtimeAttributeDescriptor(
  19. "UserActionTime",
  20. new ExistingField("UserActionTime"),
  21. new AscendingTimestamps)
  22. val listRowtimeAttrDescr = Collections.singletonList(rowtimeAttrDescr)
  23. listRowtimeAttrDescr
  24. }
  25. }
  26. // register the table source
  27. tEnv.registerTableSource("UserActions", new UserActionSource)
  28. val windowedTable = tEnv
  29. .scan("UserActions")
  30. .window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)