New Record State Extraction

Change event structure

Debezium generates data change events that have a complex structure. Each event consists of three parts:

  • Metadata, which includes but is not limited to:

    • The operation that made the change

    • Source information such as the names of the database and table where the change was made

    • Time stamp for when the change was made

    • Optional transaction information

  • Row data before the change

  • Row data after the change

For example, the structure of an UPDATE change event looks like this:

  1. {
  2. "op": "u",
  3. "source": {
  4. ...
  5. },
  6. "ts_ms" : "...",
  7. "before" : {
  8. "field1" : "oldvalue1",
  9. "field2" : "oldvalue2"
  10. },
  11. "after" : {
  12. "field1" : "newvalue1",
  13. "field2" : "newvalue2"
  14. }
  15. }

More details about change event structure are provided in the documentation for each connector.

This complex format provides the most information about changes happening in the system. However, other connectors or other parts of the Kafka ecosystem usually expect the data in a simple format like this:

  1. {
  2. "field1" : "newvalue1",
  3. "field2" : "newvalue2"
  4. }

To provide the needed Kafka record format for consumers, configure the ExtractNewRecordState SMT.

Behavior

The ExtractNewRecordState SMT extracts the after field from a Debezium change event in a Kafka record. The SMT replaces the original change event with only its after field to create a simple Kafka record.

You can configure the ExtractNewRecordState SMT for a Debezium connector or for a sink connector that consumes messages emitted by a Debezium connector. The advantage of configuring ExtractNewRecordState for a sink connector is that records stored in Apache Kafka contain whole Debezium change events. The decision to apply the SMT to a source or sink connector depends on your particular use case.

You can configure the transformation to do any of the following:

  • Add metadata from the change event to the simplified Kafka record. The default behavior is that the SMT does not add metadata.

  • Keep Kafka records that contain change events for DELETE operations in the stream. The default behavior is that the SMT drops Kafka records for DELETE operation change events because most consumers cannot yet handle them.

A database DELETE operation causes Debezium to generate two Kafka records:

  • A record that contains "op": "d", the before row data, and some other fields.

  • A tombstone record that has the same key as the deleted row and a value of null. This record is a marker for Apache Kafka. It indicates that log compaction can remove all records that have this key.

Instead of dropping the record that contains the before row data, you can configure the ExtractNewRecordData SMT to do one of the following:

  • Keep the record in the stream and edit it to have only the "value": "null" field.

  • Keep the record in the stream and edit it to have a value field that contains the key/value pairs that were in the before field with an added "__deleted": "true" entry.

Similary, instead of dropping the tombstone record, you can configure the ExtractNewRecordData SMT to keep the tombstone record in the stream.

Configuration

Configure the Debezium ExtractNewRecordState SMT in a Kafka Connect source or sink connector by adding the SMT configuration details to your connector’s configuration. To obtain the default behavior, in a .properties file, you would specify something like the following:

  1. transforms=unwrap,...
  2. transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState

As for any Kafka Connect connector configuration, you can set transforms= to multiple, comma-separated, SMT aliases in the order in which you want Kafka Connect to apply the SMTs.

The following .properties example sets several ExtractNewRecordState options:

  1. transforms=unwrap,...
  2. transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
  3. transforms.unwrap.drop.tombstones=false
  4. transforms.unwrap.delete.handling.mode=rewrite
  5. transforms.unwrap.add.fields=table,lsn

drop.tombstones=false

Keeps tombstone records for DELETE operations in the event stream.

delete.handling.mode=rewrite

For DELETE operations, edits the Kafka record by flattening the value field that was in the change event. The value field directly contains the key/value pairs that were in the before field. The SMT adds __deleted and sets it to true, for example:

  1. "value": {
  2. "pk": 2,
  3. "cola": null,
  4. "__deleted": "true"
  5. }

add.fields=table,lsn

Adds change event metadata for the table and lsn fields to the simplified Kafka record.

Adding metadata

The ExtractNewRecordState SMT can add original, change event metadata to the simplified Kafka record. For example, you might want the simplified record’s header or value to contain any of the following:

  • The type of operation that made the change

  • The name of the database or table that was changed

  • Connector-specific fields such as the Postgres LSN field

For more information on what is available see the documentation for each connector.

To add metadata to the simplified Kafka record’s header, specify the add.header option. To add metadata to the simplified Kafka record’s value, specify the add.fields option. Each of these options takes a comma separated list of change event field names. Do not specify spaces. When there are duplicate field names, to add metadata for one of those fields, specify the struct as well as the field. For example:

  1. transforms=unwrap,...
  2. transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
  3. transforms.unwrap.add.fields=op,table,lsn,source.ts_ms
  4. transforms.unwrap.add.headers=db
  5. transforms.unwrap.delete.handling.mode=rewrite

With that configuration, a simplified Kafka record would contain something like the following:

  1. {
  2. ...
  3. "__op" : "c",
  4. "__table": "MY_TABLE",
  5. "__lsn": "123456789",
  6. "__source_ts_ms" : "123456789",
  7. ...
  8. }

Also, simplified Kafka records would have a __db header.

In the simplified Kafka record, the SMT prefixes the metadata field names with a double underscore. When you specify a struct, the SMT also inserts an underscore between the struct name and the field name.

To add metadata to a simplified Kafka record that is for a DELETE operation, you must also configure delete.handling.mode=rewrite.

Determine original operation [DEPRECATED]

The operation.header option is deprecated and scheduled for removal. Please use add.headers instead. If both add.headers and operation.header are specified, the latter will be ignored.

When a Kafka record is flattened the final result won’t show whether it was an insert, update or first read (deletions can be detected via tombstones or rewrites, see Configuration options).

To solve this problem Debezium offers an option to propagate the original operation via a header added to the Kafka record. To enable this feature the option operation.header must be set to true.

  1. transforms=unwrap,...
  2. transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
  3. transforms.unwrap.operation.header=true

The possible values are the ones from the op field of the original change event.

Adding source metadata fields [DEPRECATED]

The add.source.fields option is deprecated and scheduled for removal. Please use add.fields instead. If both add.fields and add.source.fields are specified, the latter will be ignored.

The SMT can optionally add metadata fields from the original change event’s source structure to the final flattened record (prefixed with “__“). This functionality can be used to add things like the table from the change event, or connector-specific fields like the Postgres LSN field. For more information on what’s available in the source structure see the documentation for each connector.

For example, the configuration

  1. transforms=unwrap,...
  2. transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
  3. transforms.unwrap.add.source.fields=table,lsn

will add

  1. { "__table": "MY_TABLE", "__lsn": "123456789", ...}

to the final flattened record.

For DELETE events, this option is only supported when the delete.handling.mode option is set to “rewrite”.

Configuration options

The following table describes the options that you can specify for the ExtractNewRecordState SMT.

Property

Default

Description

true

Debezium generates a tombstone record for each DELETE operation. The default behavior is that ExtractNewRecordState removes tombstone records from the stream. To keep tombstone records in the stream, specify drop.tombstones=false.

drop

Debezium generates a change event record for each DELETE operation. The default behavior is that ExtractNewRecordState removes these records from the stream. To keep Kafka records for DELETE operations in the stream, set delete.handling.mode to none or rewrite.

Specify none to keep the change event record in the stream. The record contains only “value”: “null”.

Specify rewrite to keep the change event record in the stream and edit the record to have a value field that contains the key/value pairs that were in the before field and also add deleted: true to the value. This is another way to indicate that the record has been deleted.

When you specify rewrite, the updated simplified records for DELETE operations might be all you need to track deleted records. You can consider accepting the default behavior of dropping the tombstone records that the Debezium connector creates.

To use row data to determine the topic to route the record to, set this option to an after field attribute. The SMT routes the record to the topic whose name matches the value of the specified after field attribute. For a DELETE operation, set this option to a before field attribute.

For example, configuration of route.by.field=destination routes records to the topic whose name is the value of after.destination. The default behavior is that a Debezium connector sends each change event record to a topic whose name is formed from the name of the database and the name of the table in which the change was made.

If you are configuring the ExtractNewRecordState SMT on a sink connector, setting this option might be useful when the destination topic name dictates the name of the database table that will be updated with the simplified change event record. If the topic name is not correct for your use case, you can configure route.by.field to re-route the event.

Set this option to a comma-separated list, with no spaces, of metadata fields to add to the simplified Kafka record’s value. When there are duplicate field names, to add metadata for one of those fields, specify the struct as well as the field, for example source.ts_ms.

When the SMT adds metadata fields to the simplified record’s value, it prefixes each metadata field name with a double underscore. For a struct specification, the SMT also inserts an underscore between the struct name and the field name.

If you specify a field that is not in the change event record, the SMT still adds the field to the record’s value.

Set this option to a comma-separated list, with no spaces, of metadata fields to add to the header of the simplified Kafka record. When there are duplicate field names, to add metadata for one of those fields, specify the struct as well as the field, for example source.ts_ms.

When the SMT adds metadata fields to the simplified record’s header, it prefixes each metadata field name with a double underscore. For a struct specification, the SMT also inserts an underscore between the struct name and the field name.

If you specify a field that is not in the change event record, the SMT does not add the field to the header.

false

This option is deprecated and scheduled for removal. Please use add.headers instead. If both add.headers and operation.header are specified, the latter will be ignored.

The SMT adds the event operation (as obtained from the op field of the original record) as a Kafka record header.

This option is deprecated and scheduled for removal. Please use add.fields instead. If both add.fields and add.source.fields are specified, the latter will be ignored.

Fields from the change event’s source structure to add as metadata (prefixed with ““) to the flattened record.