Post-commit Callback
Apache Hudi provides the ability to post a callback notification about a write commit. This may be valuable if you need an event notification stream to take actions with other services after a Hudi write commit. You can push a write commit callback notification into HTTP endpoints or to a Kafka server.
HTTP Endpoints
You can push a commit notification to an HTTP URL and can specify custom values by extending a callback class defined below.
Config | Description | Required | Default |
---|---|---|---|
TURN_CALLBACK_ON | Turn commit callback on/off | optional | false (callbacks off) |
CALLBACK_HTTP_URL | Callback host to be sent along with callback messages | required | N/A |
CALLBACK_HTTP_TIMEOUT_IN_SECONDS | Callback timeout in seconds | optional | 3 |
CALLBACK_CLASS_NAME | Full path of callback class and must be a subclass of HoodieWriteCommitCallback class, org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback by default | optional | org.apache.hudi.callback.impl.HoodieWriteCommitHttpCallback |
CALLBACK_HTTP_API_KEY_VALUE | Http callback API key | optional | hudi_write_commit_http_callback |
Kafka Endpoints
You can push a commit notification to a Kafka topic so it can be used by other real time systems.
Config | Description | Required | Default |
---|---|---|---|
TOPIC | Kafka topic name to publish timeline activity into. | required | N/A |
PARTITION | It may be desirable to serialize all changes into a single Kafka partition for providing strict ordering. By default, Kafka messages are keyed by table name, which guarantees ordering at the table level, but not globally (or when new partitions are added) | required | N/A |
RETRIES | Times to retry the produce | optional | 3 |
ACKS | kafka acks level, all by default to ensure strong durability | optional | all |
BOOTSTRAP_SERVERS | Bootstrap servers of kafka cluster, to be used for publishing commit metadata | required | N/A |
Pulsar Endpoints
You can push a commit notification to a Pulsar topic so it can be used by other real time systems.
Config | Description | Required | Default |
---|---|---|---|
hoodie.write.commit.callback.pulsar.broker.service.url | Server’s Url of pulsar cluster to use to publish commit metadata. | required | N/A |
hoodie.write.commit.callback.pulsar.topic | Pulsar topic name to publish timeline activity into | required | N/A |
hoodie.write.commit.callback.pulsar.producer.route-mode | Message routing logic for producers on partitioned topics. | optional | RoundRobinPartition |
hoodie.write.commit.callback.pulsar.producer.pending-queue-size | The maximum size of a queue holding pending messages. | optional | 1000 |
hoodie.write.commit.callback.pulsar.producer.pending-total-size | The maximum number of pending messages across partitions. | required | 50000 |
hoodie.write.commit.callback.pulsar.producer.block-if-queue-full | When the queue is full, the method is blocked instead of an exception is thrown. | optional | true |
hoodie.write.commit.callback.pulsar.producer.send-timeout | The timeout in each sending to pulsar. | optional | 30s |
hoodie.write.commit.callback.pulsar.operation-timeout | Duration of waiting for completing an operation. | optional | 30s |
hoodie.write.commit.callback.pulsar.connection-timeout | Duration of waiting for a connection to a broker to be established. | optional | 10s |
hoodie.write.commit.callback.pulsar.request-timeout | Duration of waiting for completing a request. | optional | 60s |
hoodie.write.commit.callback.pulsar.keepalive-interval | Duration of keeping alive interval for each client broker connection. | optional | 30s |
Bring your own implementation
You can extend the HoodieWriteCommitCallback class to implement your own way to asynchronously handle the callback of a successful write. Use this public API: