Debezium Features
Debezium is a set of source connectors for Apache Kafka Connect. Each connector ingests changes from a different database by using that database’s features for change data capture (CDC). Unlike other approaches, such as polling or dual writes, log-based CDC as implemented by Debezium:
Ensures that all data changes are captured.
Produces change events with a very low delay while avoiding increased CPU usage required for frequent polling. For example, for MySQL or PostgreSQL, the delay is in the millisecond range.
Requires no changes to your data model, such as a “Last Updated” column.
Can capture deletes.
Can capture old record state and additional metadata such as transaction ID and causing query, depending on the database’s capabilities and configuration.
Five Advantages of Log-Based Change Data Capture is a blog post that provides more details.
Debezium connectors capture data changes with a range of related capabilities and options:
Snapshots: optionally, an initial snapshot of a database’s current state can be taken if a connector is started and not all logs still exist. Typically, this is the case when the database has been running for some time and has discarded trannsaction logs that are no longer needed for transaction recovery or replication. There are different modes for performing snapshots. See the documentation for the connector that you are using.
Filters: you can configure the set of captured schemas, tables and columns with include/exclude list filters.
Masking: the values from specific columns can be masked, for example, when they contain sensitive data.
Monitoring: most connectors can be monitored by using JMX.
Ready-to-use message transformations for:
Extraction of new record state for relational connectors and for the MongoDB connector
Routing of events from a transactional outbox table
See the connector documentation for a list of all supported databases and detailed information about the features and configuration options of each connector.