Debezium Features
Debezium is a set of source connectors for Apache Kafka Connect, ingesting changes from different databases using change data capture (CDC). Unlike other approaches such as polling or dual writes, log-based CDC as implemented by Debezium:
makes sure that all data changes are captured
produces change events with a very low delay (e.g. ms range for MySQL or Postgres) while avoiding increased CPU usage of frequent polling
requires no changes to your data model (such as “Last Updated” column)
can capture deletes
can capture old record state and further metadata such as transaction id and causing query (depending on the database’s capabilities and configuration)
To learn more about the advantages of log-based CDC, refer to this blog post.
The actual change data capture feature of Debezium is amended with a range of related capabilities and options:
Snapshots: optionally, an initial snapshot of a database’s current state can be taken if a connector gets started up and not all logs still exist (typically the case when the database has been running for some time and has discarded any transaction logs not needed any longer for transaction recovery or replication); different modes exist for snapshotting, refer to the docs of the specific connector you’re using to learn more
Filters: the set of captured schemas, tables and columns can be configured via whitelist/blacklist filters
Masking: the values from specific columns can be masked, e.g. for sensitive data
Monitoring: most connectors can be monitored using JMX
Different ready-to-use message transformations: e.g. for message routing, extraction of new record state (relational connectors, MongoDB) and routing of events from a transactional outbox table
Refer to the connector documentation for a list of all supported databases and detailed information about the features and configuration options of each connector.