Apache Kafka
Stateful Functions offers an Apache Kafka I/O Module for reading from and writing to Kafka topics. It is based on Apache Flink’s universal Kafka connector and provides exactly-once processing semantics. The Kafka I/O Module is configurable in Yaml or Java.
Dependency
To use the Kafka I/O Module in Java, please include the following dependency in your pom.
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>statefun-kafka-io</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
Kafka Ingress Spec
A KafkaIngressSpec
declares an ingress spec for consuming from Kafka cluster.
It accepts the following arguments:
- The ingress identifier associated with this ingress
- The topic name / list of topic names
- The address of the bootstrap servers
- The consumer group id to use
- A
KafkaIngressDeserializer
for deserializing data from Kafka (Java only) - The position to start consuming from
version: "1.0"
module:
meta:
type: remote
spec:
ingresses:
- ingress:
meta:
type: statefun.kafka.io/routable-protobuf-ingress
id: example/user-ingress
spec:
address: kafka-broker:9092
consumerGroupId: routable-kafka-e2e
startupPosition:
type: earliest
topics:
- topic: messages-1
typeUrl: org.apache.flink.statefun.docs.models.User
targets:
- example-namespace/my-function-1
- example-namespace/my-function-2
package org.apache.flink.statefun.docs.io.kafka;
import org.apache.flink.statefun.docs.models.User;
import org.apache.flink.statefun.sdk.io.IngressIdentifier;
import org.apache.flink.statefun.sdk.io.IngressSpec;
import org.apache.flink.statefun.sdk.kafka.KafkaIngressBuilder;
import org.apache.flink.statefun.sdk.kafka.KafkaIngressStartupPosition;
public class IngressSpecs {
public static final IngressIdentifier<User> ID =
new IngressIdentifier<>(User.class, "example", "input-ingress");
public static final IngressSpec<User> kafkaIngress =
KafkaIngressBuilder.forIdentifier(ID)
.withKafkaAddress("localhost:9092")
.withConsumerGroupId("greetings")
.withTopic("my-topic")
.withDeserializer(UserDeserializer.class)
.withStartupPosition(KafkaIngressStartupPosition.fromLatest())
.build();
}
The ingress also accepts properties to directly configure the Kafka client, using KafkaIngressBuilder#withProperties(Properties)
. Please refer to the Kafka consumer configuration documentation for the full list of available properties. Note that configuration passed using named methods, such as KafkaIngressBuilder#withConsumerGroupId(String)
, will have higher precedence and overwrite their respective settings in the provided properties.
Startup Position
The ingress allows configuring the startup position to be one of the following:
From Group Offset (default)
Starts from offsets that were committed to Kafka for the specified consumer group.
startupPosition:
type: group-offsets
KafkaIngressStartupPosition#fromGroupOffsets();
Earlist
Starts from the earliest offset.
startupPosition:
type: earliest
KafkaIngressStartupPosition#fromEarliest();
Latest
Starts from the latest offset.
startupPosition:
type: latest
KafkaIngressStartupPosition#fromLatest();
Specific Offsets
Starts from specific offsets, defined as a map of partitions to their target starting offset.
startupPosition:
type: specific-offsets
offsets:
- user-topic/0: 91
- user-topic/1: 11
- user-topic/2: 8
Map<TopicPartition, Long> offsets = new HashMap<>();
offsets.add(new TopicPartition("user-topic", 0), 91);
offsets.add(new TopicPartition("user-topic", 11), 11);
offsets.add(new TopicPartition("user-topic", 8), 8);
KafkaIngressStartupPosition#fromSpecificOffsets(offsets);
Date
Starts from offsets that have an ingestion time larger than or equal to a specified date.
startupPosition:
type: date
date: 2020-02-01 04:15:00.00 Z
KafkaIngressStartupPosition#fromDate(ZonedDateTime.now());
On startup, if the specified startup offset for a partition is out-of-range or does not exist (which may be the case if the ingress is configured to start from group offsets, specific offsets, or from a date), then the ingress will fallback to using the position configured using KafkaIngressBuilder#withAutoOffsetResetPosition(KafkaIngressAutoResetPosition)
. By default, this is set to be the latest position.
Kafka Deserializer
When using the Java api, the Kafka ingress needs to know how to turn the binary data in Kafka into Java objects. The KafkaIngressDeserializer
allows users to specify such a schema. The T deserialize(ConsumerRecord<byte[], byte[]> record)
method gets called for each Kafka message, passing the key, value, and metadata from Kafka.
package org.apache.flink.statefun.docs.io.kafka;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.IOException;
import org.apache.flink.statefun.docs.models.User;
import org.apache.flink.statefun.sdk.kafka.KafkaIngressDeserializer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class UserDeserializer implements KafkaIngressDeserializer<User> {
private static Logger LOG = LoggerFactory.getLogger(UserDeserializer.class);
private final ObjectMapper mapper = new ObjectMapper();
@Override
public User deserialize(ConsumerRecord<byte[], byte[]> input) {
try {
return mapper.readValue(input.value(), User.class);
} catch (IOException e) {
LOG.debug("Failed to deserialize record", e);
return null;
}
}
}
Kafka Egress Spec
A KafkaEgressBuilder
declares an egress spec for writing data out to a Kafka cluster.
It accepts the following arguments:
- The egress identifier associated with this egress
- The address of the bootstrap servers
- A
KafkaEgressSerializer
for serializing data into Kafka (Java only) - The fault tolerance semantic
- Properties for the Kafka producer
version: "1.0"
module:
meta:
type: remote
spec:
egresses:
- egress:
meta:
type: statefun.kafka.io/generic-egress
id: example/output-messages
spec:
address: kafka-broker:9092
deliverySemantic:
type: exactly-once
transactionTimeoutMillis: 100000
properties:
- foo.config: bar
package org.apache.flink.statefun.docs.io.kafka;
import org.apache.flink.statefun.docs.models.User;
import org.apache.flink.statefun.sdk.io.EgressIdentifier;
import org.apache.flink.statefun.sdk.io.EgressSpec;
import org.apache.flink.statefun.sdk.kafka.KafkaEgressBuilder;
public class EgressSpecs {
public static final EgressIdentifier<User> ID =
new EgressIdentifier<>("example", "output-egress", User.class);
public static final EgressSpec<User> kafkaEgress =
KafkaEgressBuilder.forIdentifier(ID)
.withKafkaAddress("localhost:9092")
.withSerializer(UserSerializer.class)
.build();
}
Please refer to the Kafka producer configuration documentation for the full list of available properties.
Kafka Egress and Fault Tolerance
With fault tolerance enabled, the Kafka egress can provide exactly-once delivery guarantees. You can choose three different modes of operation.
None
Nothing is guaranteed, produced records can be lost or duplicated.
deliverySemantic:
type: none
KafkaEgressBuilder#withNoProducerSemantics();
At Least Once
Stateful Functions will guarantee that no records will be lost but they can be duplicated.
deliverySemantic:
type: at-least-once
KafkaEgressBuilder#withAtLeastOnceProducerSemantics();
Exactly Once
Stateful Functions uses Kafka transactions to provide exactly-once semantics.
deliverySemantic:
type: exactly-once
transactionTimeoutMillis: 900000 # 15 min
KafkaEgressBuilder#withExactlyOnceProducerSemantics(Duration.minutes(15));
Kafka Serializer
When using the Java api, the Kafka egress needs to know how to turn Java objects into binary data. The KafkaEgressSerializer
allows users to specify such a schema. The ProducerRecord<byte[], byte[]> serialize(T out)
method gets called for each message, allowing users to set a key, value, and other metadata.
package org.apache.flink.statefun.docs.io.kafka;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.statefun.docs.models.User;
import org.apache.flink.statefun.sdk.kafka.KafkaEgressSerializer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class UserSerializer implements KafkaEgressSerializer<User> {
private static final Logger LOG = LoggerFactory.getLogger(UserSerializer.class);
private static final String TOPIC = "user-topic";
private final ObjectMapper mapper = new ObjectMapper();
@Override
public ProducerRecord<byte[], byte[]> serialize(User user) {
try {
byte[] key = user.getUserId().getBytes();
byte[] value = mapper.writeValueAsBytes(user);
return new ProducerRecord<>(TOPIC, key, value);
} catch (JsonProcessingException e) {
LOG.info("Failed to serializer user", e);
return null;
}
}
}