Pulsar Java client

You can use a Pulsar Java client to create the Java producer, consumer, readers and TableView of messages and to perform administrative tasks. The current Java client version is 2.10.0.

All the methods in producer, consumer, readers and TableView of a Java client are thread-safe.

Javadoc for the Pulsar client is divided into two domains by package as follows.

PackageDescriptionMaven Artifact
org.apache.pulsar.client.apiThe producer and consumer APIorg.apache.pulsar:pulsar-client:2.10.0
org.apache.pulsar.client.adminThe Java admin APIorg.apache.pulsar:pulsar-client-admin:2.10.0
org.apache.pulsar.client.allInclude both pulsar-client and pulsar-client-admin
Both pulsar-client and pulsar-client-admin are shaded packages and they shade dependencies independently. Consequently, the applications using both pulsar-client and pulsar-client-admin have redundant shaded classes. It would be troublesome if you introduce new dependencies but forget to update shading rules.
In this case, you can use pulsar-client-all, which shades dependencies only one time and reduces the size of dependencies.
org.apache.pulsar:pulsar-client-all:2.10.0

This document focuses only on the client API for producing and consuming messages on Pulsar topics. For how to use the Java admin client, see Pulsar admin interface.

Installation

The latest version of the Pulsar Java client library is available via Maven Central. To use the latest version, add the pulsar-client library to your build configuration.

tip

Maven

If you use Maven, add the following information to the pom.xml file.

  1. <!-- in your <properties> block -->
  2. <pulsar.version>2.10.0</pulsar.version>
  3. <!-- in your <dependencies> block -->
  4. <dependency>
  5. <groupId>org.apache.pulsar</groupId>
  6. <artifactId>pulsar-client</artifactId>
  7. <version>${pulsar.version}</version>
  8. </dependency>

Gradle

If you use Gradle, add the following information to the build.gradle file.

  1. def pulsarVersion = '2.10.0'
  2. dependencies {
  3. compile group: 'org.apache.pulsar', name: 'pulsar-client', version: pulsarVersion
  4. }

Connection URLs

To connect to Pulsar using client libraries, you need to specify a Pulsar protocol URL.

You can assign Pulsar protocol URLs to specific clusters and use the pulsar scheme. The default port is 6650. The following is an example of localhost.

  1. pulsar://localhost:6650

If you have multiple brokers, the URL is as follows.

  1. pulsar://localhost:6550,localhost:6651,localhost:6652

A URL for a production Pulsar cluster is as follows.

  1. pulsar://pulsar.us-west.example.com:6650

If you use TLS authentication, the URL is as follows.

  1. pulsar+ssl://pulsar.us-west.example.com:6651

Client

You can instantiate a PulsarClient object using just a URL for the target Pulsar cluster like this:

  1. PulsarClient client = PulsarClient.builder()
  2. .serviceUrl("pulsar://localhost:6650")
  3. .build();

If you have multiple brokers, you can initiate a PulsarClient like this:

  1. PulsarClient client = PulsarClient.builder()
  2. .serviceUrl("pulsar://localhost:6650,localhost:6651,localhost:6652")
  3. .build();

Default broker URLs for standalone clusters

If you run a cluster in standalone mode, the broker is available at the pulsar://localhost:6650 URL by default.

If you create a client, you can use the loadConf configuration. The following parameters are available in loadConf.

NameType
Description
Default
serviceUrlStringService URL provider for Pulsar serviceNone
authPluginClassNameStringName of the authentication pluginNone
authParamsStringParameters for the authentication plugin

Example
key1:val1,key2:val2
None
operationTimeoutMslongoperationTimeoutMsOperation timeout
statsIntervalSecondslongInterval between each stats information

Stats is activated with positive statsInterval

Set statsIntervalSeconds to 1 second at least.
60
numIoThreadsintThe number of threads used for handling connections to brokers1
numListenerThreadsintThe number of threads used for handling message listeners. The listener thread pool is shared across all the consumers and readers using the “listener” model to get messages. For a given consumer, the listener is always invoked from the same thread to ensure ordering. If you want multiple threads to process a single topic, you need to create a shared subscription and multiple consumers for this subscription. This does not ensure ordering.1
useTcpNoDelaybooleanWhether to use TCP no-delay flag on the connection to disable Nagle algorithmtrue
useTlsbooleanWhether to use TLS encryption on the connectionfalse
tlsTrustCertsFilePathstringPath to the trusted TLS certificate fileNone
tlsAllowInsecureConnectionbooleanWhether the Pulsar client accepts untrusted TLS certificate from brokerfalse
tlsHostnameVerificationEnablebooleanWhether to enable TLS hostname verificationfalse
concurrentLookupRequestintThe number of concurrent lookup requests allowed to send on each broker connection to prevent overload on broker5000
maxLookupRequestintThe maximum number of lookup requests allowed on each broker connection to prevent overload on broker50000
maxNumberOfRejectedRequestPerConnectionintThe maximum number of rejected requests of a broker in a certain time frame (30 seconds) after the current connection is closed and the client creates a new connection to connect to a different broker50
keepAliveIntervalSecondsintSeconds of keeping alive interval for each client broker connection30
connectionTimeoutMsintDuration of waiting for a connection to a broker to be established

If the duration passes without a response from a broker, the connection attempt is dropped
10000
requestTimeoutMsintMaximum duration for completing a request60000
defaultBackoffIntervalNanosintDefault duration for a backoff intervalTimeUnit.MILLISECONDS.toNanos(100);
maxBackoffIntervalNanoslongMaximum duration for a backoff intervalTimeUnit.SECONDS.toNanos(30)
socks5ProxyAddressSocketAddressSOCKS5 proxy addressNone
socks5ProxyUsernamestringSOCKS5 proxy usernameNone
socks5ProxyPasswordstringSOCKS5 proxy passwordNone

Check out the Javadoc for the PulsarClient class for a full list of configurable parameters.

In addition to client-level configuration, you can also apply producer and consumer specific configuration as described in sections below.

Client memory allocator configuration

You can set the client memory allocator configurations through Java properties.

PropertyType
Description
DefaultAvailable values
pulsar.allocator.pooledStringIf set to true, the client uses a direct memory pool.
If set to false, the client uses a heap memory without pool
true
  • true
  • false
  • pulsar.allocator.exit_on_oomStringWhether to exit the JVM when OOM happensfalse
  • true
  • false
  • pulsar.allocator.leak_detectionStringService URL provider for Pulsar serviceDisabled
  • Disabled
  • Simple
  • Advanced
  • Paranoid
  • pulsar.allocator.out_of_memory_policyStringWhen an OOM occurs, the client throws an exception or fallbacks to heapFallbackToHeap
  • ThrowException
  • FallbackToHeap
  • Example:

    1. -Dpulsar.allocator.pooled=true
    2. -Dpulsar.allocator.exit_on_oom=false
    3. -Dpulsar.allocator.leak_detection=Disabled
    4. -Dpulsar.allocator.out_of_memory_policy=ThrowException

    Cluster-level failover

    This chapter describes the concept, benefits, use cases, constraints, usage, working principles, and more information about the cluster-level failover. It contains the following sections:

    What is cluster-level failover

    This chapter helps you better understand the concept of cluster-level failover.

    Concept of cluster-level failover
    • Automatic cluster-level failover
    • Controlled cluster-level failover

    Automatic cluster-level failover supports Pulsar clients switching from a primary cluster to one or several backup clusters automatically and seamlessly when it detects a failover event based on the configured detecting policy set by users.

    Automatic cluster-level failover

    Controlled cluster-level failover supports Pulsar clients switching from a primary cluster to one or several backup clusters. The switchover is manually set by administrators.

    Controlled cluster-level failover

    Once the primary cluster functions again, Pulsar clients can switch back to the primary cluster. Most of the time users won’t even notice a thing. Users can keep using applications and services without interruptions or timeouts.

    Why use cluster-level failover?

    The cluster-level failover provides fault tolerance, continuous availability, and high availability together. It brings a number of benefits, including but not limited to:

    • Reduced cost: services can be switched and recovered automatically with no data loss.

    • Simplified management: businesses can operate on an “always-on” basis since no immediate user intervention is required.

    • Improved stability and robustness: it ensures continuous performance and minimizes service downtime.

    When to use cluster-level failover?

    The cluster-level failover protects your environment in a number of ways, including but not limited to:

    • Disaster recovery: cluster-level failover can automatically and seamlessly transfer the production workload on a primary cluster to one or several backup clusters, which ensures minimum data loss and reduced recovery time.

    • Planned migration: if you want to migrate production workloads from an old cluster to a new cluster, you can improve the migration efficiency with cluster-level failover. For example, you can test whether the data migration goes smoothly in case of a failover event, identify possible issues and risks before the migration.

    When cluster-level failover is triggered?
    • Automatic cluster-level failover
    • Controlled cluster-level failover

    Automatic cluster-level failover is triggered when Pulsar clients cannot connect to the primary cluster for a prolonged period of time. This can be caused by any number of reasons including, but not limited to:

    • Network failure: internet connection is lost.

    • Power failure: shutdown time of a primary cluster exceeds time limits.

    • Service error: errors occur on a primary cluster (for example, the primary cluster does not function because of time limits).

    • Crashed storage space: the primary cluster does not have enough storage space, but the corresponding storage space on the backup server functions normally.

    Controlled cluster-level failover is triggered when administrators set the switchover manually.

    Why does cluster-level failover fail?

    Obviously, the cluster-level failover does not succeed if the backup cluster is unreachable by active Pulsar clients. This can happen for many reasons, including but not limited to:

    • Power failure: the backup cluster is shut down or does not function normally.

    • Crashed storage space: primary and backup clusters do not have enough storage space.

    • If the failover is initiated, but no cluster can assume the role of an available cluster due to errors, and the primary cluster is not able to provide service normally.

    • If you manually initiate a switchover, but services cannot be switched to the backup cluster server, then the system will attempt to switch services back to the primary cluster.

    • Fail to authenticate or authorize between 1) primary and backup clusters, or 2) between two backup clusters.

    What are the limitations of cluster-level failover?

    Currently, cluster-level failover can perform probes to prevent data loss, but it can not check the status of backup clusters. If backup clusters are not healthy, you cannot produce or consume data.

    What are the relationships between cluster-level failover and geo-replication?

    The cluster-level failover is an extension of geo-replication to improve stability and robustness. The cluster-level failover depends on geo-replication, and they have some differences as below.

    InfluenceCluster-level failoverGeo-replication
    Do administrators have heavy workloads?No or maybe.

    - For the automatic cluster-level failover, the cluster switchover is triggered automatically based on the policies set by users.

    - For the controlled cluster-level failover, the switchover is triggered manually by administrators.
    Yes.

    If a cluster fails, immediate administration intervention is required.
    Result in data loss?No.

    For both automatic and controlled cluster-level failover, if the failed primary cluster doesn’t replicate messages immediately to the backup cluster, the Pulsar client can’t consume the non-replicated messages. After the primary cluster is restored and the Pulsar client switches back, the non-replicated data can still be consumed by the Pulsar client. Consequently, the data is not lost.

    - For the automatic cluster-level failover, services can be switched and recovered automatically with no data loss.

    - For the controlled cluster-level failover, services can be switched and recovered manually and data loss may happen.
    Yes.

    Pulsar clients and DNS systems have caches. When administrators switch the DNS from a primary cluster to a backup cluster, it takes some time for cache trigger timeout, which delays client recovery time and fails to produce or consume messages.
    Result in Pulsar client failure?No or maybe.

    - For automatic cluster-level failover, services can be switched and recovered automatically and the Pulsar client does not fail.

    - For controlled cluster-level failover, services can be switched and recovered manually, but the Pulsar client fails before administrators can take action.
    Same as above.

    How to use cluster-level failover

    This section guides you through every step on how to configure cluster-level failover.

    Tip

    • You should configure cluster-level failover only when the cluster contains sufficient resources to handle all possible consequences. Workload intensity on the backup cluster may increase significantly.

    • Connect clusters to an uninterruptible power supply (UPS) unit to reduce the risk of unexpected power loss.

    Requirements

    • Pulsar client 2.10 or later versions.

    • For backup clusters:

      • The number of BooKeeper nodes should be equal to or greater than the ensemble quorum.

      • The number of ZooKeeper nodes should be equal to or greater than 3.

    • Turn on geo-replication between the primary cluster and any dependent cluster (primary to backup or backup to backup) to prevent data loss.

    • Set replicateSubscriptionState to true when creating consumers.

    • Automatic cluster-level failover

    • Controlled cluster-level failover

    This is an example of how to construct a Java Pulsar client to use automatic cluster-level failover. The switchover is triggered automatically.

    1. private PulsarClient getAutoFailoverClient() throws PulsarClientException {
    2. ServiceUrlProvider failover = AutoClusterFailover.builder()
    3. .primary("pulsar://localhost:6650")
    4. .secondary(Collections.singletonList("pulsar://other1:6650","pulsar://other2:6650"))
    5. .failoverDelay(30, TimeUnit.SECONDS)
    6. .switchBackDelay(60, TimeUnit.SECONDS)
    7. .checkInterval(1000, TimeUnit.MILLISECONDS)
    8. .secondaryTlsTrustCertsFilePath("/path/to/ca.cert.pem")
    9. .secondaryAuthentication("org.apache.pulsar.client.impl.auth.AuthenticationTls",
    10. "tlsCertFile:/path/to/my-role.cert.pem,tlsKeyFile:/path/to/my-role.key-pk8.pem")
    11. .build();
    12. PulsarClient pulsarClient = PulsarClient.builder()
    13. .build();
    14. failover.initialize(pulsarClient);
    15. return pulsarClient;
    16. }

    Configure the following parameters:

    ParameterDefault valueRequired?Description
    primaryN/AYesService URL of the primary cluster.
    secondaryN/AYesService URL(s) of one or several backup clusters.

    You can specify several backup clusters using a comma-separated list.

    Note that:
    - The backup cluster is chosen in the sequence shown in the list.
    - If all backup clusters are available, the Pulsar client chooses the first backup cluster.
    failoverDelayN/AYesThe delay before the Pulsar client switches from the primary cluster to the backup cluster.

    Automatic failover is controlled by a probe task:
    1) The probe task first checks the health status of the primary cluster.
    2) If the probe task finds the continuous failure time of the primary cluster exceeds failoverDelayMs, it switches the Pulsar client to the backup cluster.
    switchBackDelayN/AYesThe delay before the Pulsar client switches from the backup cluster to the primary cluster.

    Automatic failover switchover is controlled by a probe task:
    1) After the Pulsar client switches from the primary cluster to the backup cluster, the probe task continues to check the status of the primary cluster.
    2) If the primary cluster functions well and continuously remains active longer than switchBackDelay, the Pulsar client switches back to the primary cluster.
    checkInterval30sNoFrequency of performing a probe task (in seconds).
    secondaryTlsTrustCertsFilePathN/ANoPath to the trusted TLS certificate file of the backup cluster.
    secondaryAuthenticationN/ANoAuthentication of the backup cluster.

    This is an example of how to construct a Java Pulsar client to use controlled cluster-level failover. The switchover is triggered by administrators manually.

    Note: you can have one or several backup clusters but can only specify one.

    1. public PulsarClient getControlledFailoverClient() throws IOException {
    2. Map<String, String> header = new HashMap();
    3. header.put(“service_user_id”, my-user”);
    4. header.put(“service_password”, tiger”);
    5. header.put(“clusterA”, tokenA”);
    6. header.put(“clusterB”, tokenB”);
    7. ServiceUrlProvider provider =
    8. ControlledClusterFailover.builder()
    9. .defaultServiceUrl("pulsar://localhost:6650")
    10. .checkInterval(1, TimeUnit.MINUTES)
    11. .urlProvider("http://localhost:8080/test")
    12. .urlProviderHeader(header)
    13. .build();
    14. PulsarClient pulsarClient =
    15. PulsarClient.builder()
    16. .build();
    17. provider.initialize(pulsarClient);
    18. return pulsarClient;
    19. }
    ParameterDefault valueRequired?Description
    defaultServiceUrlN/AYesPulsar service URL.
    checkInterval30sNoFrequency of performing a probe task (in seconds).
    urlProviderN/AYesURL provider service.
    urlProviderHeaderN/ANourlProviderHeader is a map containing tokens and credentials.

    If you enable authentication or authorization between Pulsar clients and primary and backup clusters, you need to provide urlProviderHeader.

    Here is an example of how urlProviderHeader works.

    How urlProviderHeader works

    Assume that you want to connect Pulsar client 1 to cluster A.

    1. Pulsar client 1 sends the token t1 to the URL provider service.

    2. The URL provider service returns the credential c1 and the cluster A URL to the Pulsar client.

      The URL provider service manages all tokens and credentials. It returns different credentials based on different tokens and different target cluster URLs to different Pulsar clients.

      Note: the credential must be in a JSON file and contain parameters as shown.

      1. {
      2. "serviceUrl": "pulsar+ssl://target:6651",
      3. "tlsTrustCertsFilePath": "/security/ca.cert.pem",
      4. "authPluginClassName":"org.apache.pulsar.client.impl.auth.AuthenticationTls",
      5. "authParamsString": " \"tlsCertFile\": \"/security/client.cert.pem\"
      6. \"tlsKeyFile\": \"/security/client-pk8.pem\" "
      7. }
    1. Pulsar client 1 connects to cluster A using credential c1.

    How does cluster-level failover work?

    This chapter explains the working process of cluster-level failover. For more implementation details, see PIP-121.

    • Automatic cluster-level failover
    • Controlled cluster-level failover

    In automatic failover cluster, the primary cluster and backup cluster are aware of each other’s availability. The automatic failover cluster performs the following actions without administrator intervention:

    1. The Pulsar client runs a probe task at intervals defined in checkInterval.

    2. If the probe task finds the failure time of the primary cluster exceeds the time set in the failoverDelay parameter, it searches backup clusters for an available healthy cluster.

      2a) If there are healthy backup clusters, the Pulsar client switches to a backup cluster in the order defined in secondary.

      2b) If there is no healthy backup cluster, the Pulsar client does not perform the switchover, and the probe task continues to look for an available backup cluster.

    3. The probe task checks whether the primary cluster functions well or not.

      3a) If the primary cluster comes back and the continuous healthy time exceeds the time set in switchBackDelay, the Pulsar client switches back to the primary cluster.

      3b) If the primary cluster does not come back, the Pulsar client does not perform the switchover.

    Workflow of automatic failover cluster

    1. The Pulsar client runs a probe task at intervals defined in checkInterval.

    2. The probe task fetches the service URL configuration from the URL provider service, which is configured by urlProvider.

      2a) If the service URL configuration is changed, the probe task switches to the target cluster without checking the health status of the target cluster.

      2b) If the service URL configuration is not changed, the Pulsar client does not perform the switchover.

    3. If the Pulsar client switches to the target cluster, the probe task continues to fetch service URL configuration from the URL provider service at intervals defined in checkInterval.

      3a) If the service URL configuration is changed, the probe task switches to the target cluster without checking the health status of the target cluster.

      3b) If the service URL configuration is not changed, it does not perform the switchover.

    Workflow of controlled failover cluster

    Producer

    In Pulsar, producers write messages to topics. Once you’ve instantiated a PulsarClient object (as in the section above), you can create a Producer for a specific Pulsar topic.

    1. Producer<byte[]> producer = client.newProducer()
    2. .topic("my-topic")
    3. .create();
    4. // You can then send messages to the broker and topic you specified:
    5. producer.send("My message".getBytes());

    By default, producers produce messages that consist of byte arrays. You can produce different types by specifying a message schema.

    1. Producer<String> stringProducer = client.newProducer(Schema.STRING)
    2. .topic("my-topic")
    3. .create();
    4. stringProducer.send("My message");

    Make sure that you close your producers, consumers, and clients when you do not need them.

    1. producer.close();
    2. consumer.close();
    3. client.close();

    Close operations can also be asynchronous:

    1. producer.closeAsync()
    2. .thenRun(() -> System.out.println("Producer closed"))
    3. .exceptionally((ex) -> {
    4. System.err.println("Failed to close producer: " + ex);
    5. return null;
    6. });

    Configure producer

    If you instantiate a Producer object by specifying only a topic name as the example above, the default configuration of producer is used.

    If you create a producer, you can use the loadConf configuration. The following parameters are available in loadConf.

    NameType
    Description
    Default
    topicNamestringTopic namenull
    producerNamestringProducer namenull
    sendTimeoutMslongMessage send timeout in ms.
    If a message is not acknowledged by a server before the sendTimeout expires, an error occurs.
    30000
    blockIfQueueFullbooleanIf it is set to true, when the outgoing message queue is full, the Send and SendAsync methods of producer block, rather than failing and throwing errors.
    If it is set to false, when the outgoing message queue is full, the Send and SendAsync methods of producer fail and ProducerQueueIsFullError exceptions occur.

    The MaxPendingMessages parameter determines the size of the outgoing message queue.
    false
    maxPendingMessagesintThe maximum size of a queue holding pending messages.

    For example, a message waiting to receive an acknowledgment from a broker.

    By default, when the queue is full, all calls to the Send and SendAsync methods fail unless you set BlockIfQueueFull to true.
    1000
    maxPendingMessagesAcrossPartitionsintThe maximum number of pending messages across partitions.

    Use the setting to lower the max pending messages for each partition ({@link #setMaxPendingMessages(int)}) if the total number exceeds the configured value.
    50000
    messageRoutingModeMessageRoutingModeMessage routing logic for producers on partitioned topics.
    Apply the logic only when setting no key on messages.
    Available options are as follows:
  • pulsar.RoundRobinDistribution: round robin
  • pulsar.UseSinglePartition: publish all messages to a single partition
  • pulsar.CustomPartition: a custom partitioning scheme
  • pulsar.RoundRobinDistribution
  • hashingSchemeHashingSchemeHashing function determining the partition where you publish a particular message (partitioned topics only).
    Available options are as follows:
  • pulsar.JavastringHash: the equivalent of string.hashCode() in Java
  • pulsar.Murmur3_32Hash: applies the Murmur3 hashing function
  • pulsar.BoostHash: applies the hashing function from C++’s Boost library
  • HashingScheme.JavastringHash
    cryptoFailureActionProducerCryptoFailureActionProducer should take action when encryption fails.
  • FAIL: if encryption fails, unencrypted messages fail to send.
  • SEND: if encryption fails, unencrypted messages are sent.
  • ProducerCryptoFailureAction.FAIL
    batchingMaxPublishDelayMicroslongBatching time period of sending messages.TimeUnit.MILLISECONDS.toMicros(1)
    batchingMaxMessagesintThe maximum number of messages permitted in a batch.1000
    batchingEnabledbooleanEnable batching of messages.true
    chunkingEnabledbooleanEnable chunking of messages.false
    compressionTypeCompressionTypeMessage data compression type used by a producer.
    Available options:
  • LZ4
  • ZLIB
  • ZSTD
  • SNAPPY
  • No compression
    initialSubscriptionNamestringUse this configuration to automatically create an initial subscription when creating a topic. If this field is not set, the initial subscription is not created.null

    You can configure parameters if you do not want to use the default configuration.

    For a full list, see the Javadoc for the ProducerBuilder class. The following is an example.

    1. Producer<byte[]> producer = client.newProducer()
    2. .topic("my-topic")
    3. .batchingMaxPublishDelay(10, TimeUnit.MILLISECONDS)
    4. .sendTimeout(10, TimeUnit.SECONDS)
    5. .blockIfQueueFull(true)
    6. .create();

    Message routing

    When using partitioned topics, you can specify the routing mode whenever you publish messages using a producer. For more information on specifying a routing mode using the Java client, see the Partitioned Topics cookbook.

    Async send

    You can publish messages asynchronously using the Java client. With async send, the producer puts the message in a blocking queue and returns it immediately. Then the client library sends the message to the broker in the background. If the queue is full (max size configurable), the producer is blocked or fails immediately when calling the API, depending on arguments passed to the producer.

    The following is an example.

    1. producer.sendAsync("my-async-message".getBytes()).thenAccept(msgId -> {
    2. System.out.println("Message with ID " + msgId + " successfully sent");
    3. });

    As you can see from the example above, async send operations return a MessageId wrapped in a CompletableFuture.

    Configure messages

    In addition to a value, you can set additional items on a given message:

    1. producer.newMessage()
    2. .key("my-message-key")
    3. .value("my-async-message".getBytes())
    4. .property("my-key", "my-value")
    5. .property("my-other-key", "my-other-value")
    6. .send();

    You can terminate the builder chain with sendAsync() and get a future return.

    Enable chunking

    Message chunking enables Pulsar to process large payload messages by splitting the message into chunks at the producer side and aggregating chunked messages at the consumer side.

    The message chunking feature is OFF by default. The following is an example about how to enable message chunking when creating a producer.

    1. Producer<byte[]> producer = client.newProducer()
    2. .topic(topic)
    3. .enableChunking(true)
    4. .enableBatching(false)
    5. .create();

    By default, producer chunks the large message based on max message size (maxMessageSize) configured at broker (eg: 5MB). However, client can also configure max chunked size using producer configuration chunkMaxMessageSize.

    Note: To enable chunking, you need to disable batching (enableBatching\=false) concurrently.

    Consumer

    In Pulsar, consumers subscribe to topics and handle messages that producers publish to those topics. You can instantiate a new consumer by first instantiating a PulsarClient object and passing it a URL for a Pulsar broker (as above).

    Once you’ve instantiated a PulsarClient object, you can create a Consumer by specifying a topic and a subscription.

    1. Consumer consumer = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .subscribe();

    The subscribe method will auto subscribe the consumer to the specified topic and subscription. One way to make the consumer listen on the topic is to set up a while loop. In this example loop, the consumer listens for messages, prints the contents of any received message, and then acknowledges that the message has been processed. If the processing logic fails, you can use negative acknowledgement to redeliver the message later.

    1. while (true) {
    2. // Wait for a message
    3. Message msg = consumer.receive();
    4. try {
    5. // Do something with the message
    6. System.out.println("Message received: " + new String(msg.getData()));
    7. // Acknowledge the message so that it can be deleted by the message broker
    8. consumer.acknowledge(msg);
    9. } catch (Exception e) {
    10. // Message failed to process, redeliver later
    11. consumer.negativeAcknowledge(msg);
    12. }
    13. }

    If you don’t want to block your main thread and rather listen constantly for new messages, consider using a MessageListener.

    1. MessageListener myMessageListener = (consumer, msg) -> {
    2. try {
    3. System.out.println("Message received: " + new String(msg.getData()));
    4. consumer.acknowledge(msg);
    5. } catch (Exception e) {
    6. consumer.negativeAcknowledge(msg);
    7. }
    8. }
    9. Consumer consumer = client.newConsumer()
    10. .topic("my-topic")
    11. .subscriptionName("my-subscription")
    12. .messageListener(myMessageListener)
    13. .subscribe();

    Configure consumer

    If you instantiate a Consumer object by specifying only a topic and subscription name as in the example above, the consumer uses the default configuration.

    When you create a consumer, you can use the loadConf configuration. The following parameters are available in loadConf.

    NameType
    Description
    Default
    topicNamesSet<String>Topic nameSets.newTreeSet()
    topicsPatternPatternTopic patternNone
    subscriptionNameStringSubscription nameNone
    subscriptionTypeSubscriptionTypeSubscription type
    Four subscription types are available:
  • Exclusive
  • Failover
  • Shared
  • Key_Shared
  • SubscriptionType.Exclusive
    receiverQueueSizeintSize of a consumer’s receiver queue.

    For example, the number of messages accumulated by a consumer before an application calls Receive.

    A value higher than the default value increases consumer throughput, though at the expense of more memory utilization.
    1000
    acknowledgementsGroupTimeMicroslongGroup a consumer acknowledgment for a specified time.

    By default, a consumer uses 100ms grouping time to send out acknowledgments to a broker.

    Setting a group time of 0 sends out acknowledgments immediately.

    A longer ack group time is more efficient at the expense of a slight increase in message re-deliveries after a failure.
    TimeUnit.MILLISECONDS.toMicros(100)
    negativeAckRedeliveryDelayMicroslongDelay to wait before redelivering messages that failed to be processed.

    When an application uses {@link Consumer#negativeAcknowledge(Message)}, failed messages are redelivered after a fixed timeout.
    TimeUnit.MINUTES.toMicros(1)
    maxTotalReceiverQueueSizeAcrossPartitionsintThe max total receiver queue size across partitions.

    This setting reduces the receiver queue size for individual partitions if the total receiver queue size exceeds this value.
    50000
    consumerNameStringConsumer namenull
    ackTimeoutMillislongTimeout of unacked messages0
    tickDurationMillislongGranularity of the ack-timeout redelivery.

    Using an higher tickDurationMillis reduces the memory overhead to track messages when setting ack-timeout to a bigger value (for example, 1 hour).
    1000
    priorityLevelintPriority level for a consumer to which a broker gives more priority while dispatching messages in Shared subscription type.

    The broker follows descending priorities. For example, 0=max-priority, 1, 2,…

    In Shared subscription type, the broker first dispatches messages to the max priority level consumers if they have permits. Otherwise, the broker considers next priority level consumers.

    Example 1
    If a subscription has consumerA with priorityLevel 0 and consumerB with priorityLevel 1, then the broker only dispatches messages to consumerA until it runs out permits and then starts dispatching messages to consumerB.

    Example 2
    Consumer Priority, Level, Permits
    C1, 0, 2
    C2, 0, 1
    C3, 0, 1
    C4, 1, 2
    C5, 1, 1

    Order in which a broker dispatches messages to consumers is: C1, C2, C3, C1, C4, C5, C4.
    0
    cryptoFailureActionConsumerCryptoFailureActionConsumer should take action when it receives a message that can not be decrypted.
  • FAIL: this is the default option to fail messages until crypto succeeds.
  • DISCARD:silently acknowledge and not deliver message to an application.
  • CONSUME: deliver encrypted messages to applications. It is the application’s responsibility to decrypt the message.

  • The decompression of message fails.

    If messages contain batch messages, a client is not be able to retrieve individual messages in batch.

    Delivered encrypted message contains {@link EncryptionContext} which contains encryption and compression information in it using which application can decrypt consumed message payload.
  • ConsumerCryptoFailureAction.FAIL
  • propertiesSortedMap<String, String>A name or value property of this consumer.

    properties is application defined metadata attached to a consumer.

    When getting a topic stats, associate this metadata with the consumer stats for easier identification.
    new TreeMap()
    readCompactedbooleanIf enabling readCompacted, a consumer reads messages from a compacted topic rather than reading a full message backlog of a topic.

    A consumer only sees the latest value for each key in the compacted topic, up until reaching the point in the topic message when compacting backlog. Beyond that point, send messages as normal.

    Only enabling readCompacted on subscriptions to persistent topics, which have a single active consumer (like failure or exclusive subscriptions).

    Attempting to enable it on subscriptions to non-persistent topics or on shared subscriptions leads to a subscription call throwing a PulsarClientException.
    false
    subscriptionInitialPositionSubscriptionInitialPositionInitial position at which to set cursor when subscribing to a topic at first time.SubscriptionInitialPosition.Latest
    patternAutoDiscoveryPeriodintTopic auto discovery period when using a pattern for topic’s consumer.

    The default and minimum value is 1 minute.
    1
    regexSubscriptionModeRegexSubscriptionModeWhen subscribing to a topic using a regular expression, you can pick a certain type of topics.

  • PersistentOnly: only subscribe to persistent topics.
  • NonPersistentOnly: only subscribe to non-persistent topics.
  • AllTopics: subscribe to both persistent and non-persistent topics.
  • RegexSubscriptionMode.PersistentOnly
    deadLetterPolicyDeadLetterPolicyDead letter policy for consumers.

    By default, some messages are probably redelivered many times, even to the extent that it never stops.

    By using the dead letter mechanism, messages have the max redelivery count. When exceeding the maximum number of redeliveries, messages are sent to the Dead Letter Topic and acknowledged automatically.

    You can enable the dead letter mechanism by setting deadLetterPolicy.

    Example

    1. client.newConsumer()
      .deadLetterPolicy(DeadLetterPolicy.builder().maxRedeliverCount(10).build())
      .subscribe();


    Default dead letter topic name is {TopicName}-{Subscription}-DLQ.

    To set a custom dead letter topic name:
    1. client.newConsumer()
      .deadLetterPolicy(DeadLetterPolicy.builder().maxRedeliverCount(10)
      .deadLetterTopic(“your-topic-name”).build())
      .subscribe();


    When specifying the dead letter policy while not specifying ackTimeoutMillis, you can set the ack timeout to 30000 millisecond.
    None
    autoUpdatePartitionsbooleanIf autoUpdatePartitions is enabled, a consumer subscribes to partition increasement automatically.

    Note: this is only for partitioned consumers.
    true
    replicateSubscriptionStatebooleanIf replicateSubscriptionState is enabled, a subscription state is replicated to geo-replicated clusters.false
    negativeAckRedeliveryBackoffRedeliveryBackoffInterface for custom message is negativeAcked policy. You can specify RedeliveryBackoff for a consumer.MultiplierRedeliveryBackoff
    ackTimeoutRedeliveryBackoffRedeliveryBackoffInterface for custom message is ackTimeout policy. You can specify RedeliveryBackoff for a consumer.MultiplierRedeliveryBackoff
    autoAckOldestChunkedMessageOnQueueFullbooleanWhether to automatically acknowledge pending chunked messages when the threashold of maxPendingChunkedMessage is reached. If set to false, these messages will be redelivered by their broker.true
    maxPendingChunkedMessageintThe maximum size of a queue holding pending chunked messages. When the threshold is reached, the consumer drops pending messages to optimize memory utilization.10
    expireTimeOfIncompleteChunkedMessageMillislongThe time interval to expire incomplete chunks if a consumer fails to receive all the chunks in the specified time period. The default value is 1 minute.60000

    You can configure parameters if you do not want to use the default configuration. For a full list, see the Javadoc for the ConsumerBuilder class.

    The following is an example.

    1. Consumer consumer = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .ackTimeout(10, TimeUnit.SECONDS)
    5. .subscriptionType(SubscriptionType.Exclusive)
    6. .subscribe();

    Async receive

    The receive method receives messages synchronously (the consumer process is blocked until a message is available). You can also use async receive, which returns a CompletableFuture object immediately once a new message is available.

    The following is an example.

    1. CompletableFuture<Message> asyncMessage = consumer.receiveAsync();

    Async receive operations return a Message wrapped inside of a CompletableFuture.

    Batch receive

    Use batchReceive to receive multiple messages for each call.

    The following is an example.

    1. Messages messages = consumer.batchReceive();
    2. for (Object message : messages) {
    3. // do something
    4. }
    5. consumer.acknowledge(messages)
    note

    Batch receive policy limits the number and bytes of messages in a single batch. You can specify a timeout to wait for enough messages. The batch receive is completed if any of the following condition is met: enough number of messages, bytes of messages, wait timeout.

    1. Consumer consumer = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .batchReceivePolicy(BatchReceivePolicy.builder()
    5. .maxNumMessages(100)
    6. .maxNumBytes(1024 * 1024)
    7. .timeout(200, TimeUnit.MILLISECONDS)
    8. .build())
    9. .subscribe();

    The default batch receive policy is:

    1. BatchReceivePolicy.builder()
    2. .maxNumMessage(-1)
    3. .maxNumBytes(10 * 1024 * 1024)
    4. .timeout(100, TimeUnit.MILLISECONDS)
    5. .build();

    Configure chunking

    You can limit the maximum number of chunked messages a consumer maintains concurrently by configuring the maxPendingChunkedMessage and autoAckOldestChunkedMessageOnQueueFull parameters. When the threshold is reached, the consumer drops pending messages by silently acknowledging them or asking the broker to redeliver them later. The expireTimeOfIncompleteChunkedMessage parameter decides the time interval to expire incomplete chunks if the consumer fails to receive all chunks of a message within the specified time period.

    The following is an example of how to configure message chunking.

    1. Consumer<byte[]> consumer = client.newConsumer()
    2. .topic(topic)
    3. .subscriptionName("test")
    4. .autoAckOldestChunkedMessageOnQueueFull(true)
    5. .maxPendingChunkedMessage(100)
    6. .expireTimeOfIncompleteChunkedMessage(10, TimeUnit.MINUTES)
    7. .subscribe();

    Negative acknowledgment redelivery backoff

    The RedeliveryBackoff introduces a redelivery backoff mechanism. You can achieve redelivery with different delays by setting redeliveryCount of messages.

    1. Consumer consumer = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .negativeAckRedeliveryBackoff(MultiplierRedeliveryBackoff.builder()
    5. .minDelayMs(1000)
    6. .maxDelayMs(60 * 1000)
    7. .build())
    8. .subscribe();

    Acknowledgement timeout redelivery backoff

    The RedeliveryBackoff introduces a redelivery backoff mechanism. You can redeliver messages with different delays by setting the number of times the messages is retried.

    1. Consumer consumer = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .ackTimeout(10, TimeUnit.SECOND)
    5. .ackTimeoutRedeliveryBackoff(MultiplierRedeliveryBackoff.builder()
    6. .minDelayMs(1000)
    7. .maxDelayMs(60000)
    8. .multiplier(2)
    9. .build())
    10. .subscribe();

    The message redelivery behavior should be as follows.

    Redelivery countRedelivery delay
    110 + 1 seconds
    210 + 2 seconds
    310 + 4 seconds
    410 + 8 seconds
    510 + 16 seconds
    610 + 32 seconds
    710 + 60 seconds
    810 + 60 seconds
    note
    • The negativeAckRedeliveryBackoff does not work with consumer.negativeAcknowledge(MessageId messageId) because you are not able to get the redelivery count from the message ID.
    • If a consumer crashes, it triggers the redelivery of unacked messages. In this case, RedeliveryBackoff does not take effect and the messages might get redelivered earlier than the delay time from the backoff.

    Multi-topic subscriptions

    In addition to subscribing a consumer to a single Pulsar topic, you can also subscribe to multiple topics simultaneously using multi-topic subscriptions. To use multi-topic subscriptions you can supply either a regular expression (regex) or a List of topics. If you select topics via regex, all topics must be within the same Pulsar namespace.

    The followings are some examples.

    1. import org.apache.pulsar.client.api.Consumer;
    2. import org.apache.pulsar.client.api.PulsarClient;
    3. import java.util.Arrays;
    4. import java.util.List;
    5. import java.util.regex.Pattern;
    6. ConsumerBuilder consumerBuilder = pulsarClient.newConsumer()
    7. .subscriptionName(subscription);
    8. // Subscribe to all topics in a namespace
    9. Pattern allTopicsInNamespace = Pattern.compile("public/default/.*");
    10. Consumer allTopicsConsumer = consumerBuilder
    11. .topicsPattern(allTopicsInNamespace)
    12. .subscribe();
    13. // Subscribe to a subsets of topics in a namespace, based on regex
    14. Pattern someTopicsInNamespace = Pattern.compile("public/default/foo.*");
    15. Consumer allTopicsConsumer = consumerBuilder
    16. .topicsPattern(someTopicsInNamespace)
    17. .subscribe();

    In the above example, the consumer subscribes to the persistent topics that can match the topic name pattern. If you want the consumer subscribes to all persistent and non-persistent topics that can match the topic name pattern, set subscriptionTopicsMode to RegexSubscriptionMode.AllTopics.

    1. Pattern pattern = Pattern.compile("public/default/.*");
    2. pulsarClient.newConsumer()
    3. .subscriptionName("my-sub")
    4. .topicsPattern(pattern)
    5. .subscriptionTopicsMode(RegexSubscriptionMode.AllTopics)
    6. .subscribe();
    note

    By default, the subscriptionTopicsMode of the consumer is PersistentOnly. Available options of subscriptionTopicsMode are PersistentOnly, NonPersistentOnly, and AllTopics.

    You can also subscribe to an explicit list of topics (across namespaces if you wish):

    1. List<String> topics = Arrays.asList(
    2. "topic-1",
    3. "topic-2",
    4. "topic-3"
    5. );
    6. Consumer multiTopicConsumer = consumerBuilder
    7. .topics(topics)
    8. .subscribe();
    9. // Alternatively:
    10. Consumer multiTopicConsumer = consumerBuilder
    11. .topic(
    12. "topic-1",
    13. "topic-2",
    14. "topic-3"
    15. )
    16. .subscribe();

    You can also subscribe to multiple topics asynchronously using the subscribeAsync method rather than the synchronous subscribe method. The following is an example.

    1. Pattern allTopicsInNamespace = Pattern.compile("persistent://public/default.*");
    2. consumerBuilder
    3. .topics(topics)
    4. .subscribeAsync()
    5. .thenAccept(this::receiveMessageFromConsumer);
    6. private void receiveMessageFromConsumer(Object consumer) {
    7. ((Consumer)consumer).receiveAsync().thenAccept(message -> {
    8. // Do something with the received message
    9. receiveMessageFromConsumer(consumer);
    10. });
    11. }

    Subscription types

    Pulsar has various subscription types to match different scenarios. A topic can have multiple subscriptions with different subscription types. However, a subscription can only have one subscription type at a time.

    A subscription is identical with the subscription name; a subscription name can specify only one subscription type at a time. To change the subscription type, you should first stop all consumers of this subscription.

    Different subscription types have different message distribution types. This section describes the differences of subscription types and how to use them.

    In order to better describe their differences, assuming you have a topic named “my-topic”, and the producer has published 10 messages.

    1. Producer<String> producer = client.newProducer(Schema.STRING)
    2. .topic("my-topic")
    3. .enableBatching(false)
    4. .create();
    5. // 3 messages with "key-1", 3 messages with "key-2", 2 messages with "key-3" and 2 messages with "key-4"
    6. producer.newMessage().key("key-1").value("message-1-1").send();
    7. producer.newMessage().key("key-1").value("message-1-2").send();
    8. producer.newMessage().key("key-1").value("message-1-3").send();
    9. producer.newMessage().key("key-2").value("message-2-1").send();
    10. producer.newMessage().key("key-2").value("message-2-2").send();
    11. producer.newMessage().key("key-2").value("message-2-3").send();
    12. producer.newMessage().key("key-3").value("message-3-1").send();
    13. producer.newMessage().key("key-3").value("message-3-2").send();
    14. producer.newMessage().key("key-4").value("message-4-1").send();
    15. producer.newMessage().key("key-4").value("message-4-2").send();

    Exclusive

    Create a new consumer and subscribe with the Exclusive subscription type.

    1. Consumer consumer = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .subscriptionType(SubscriptionType.Exclusive)
    5. .subscribe()

    Only the first consumer is allowed to the subscription, other consumers receive an error. The first consumer receives all 10 messages, and the consuming order is the same as the producing order.

    note

    If topic is a partitioned topic, the first consumer subscribes to all partitioned topics, other consumers are not assigned with partitions and receive an error.

    Failover

    Create new consumers and subscribe with theFailover subscription type.

    1. Consumer consumer1 = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .subscriptionType(SubscriptionType.Failover)
    5. .subscribe()
    6. Consumer consumer2 = client.newConsumer()
    7. .topic("my-topic")
    8. .subscriptionName("my-subscription")
    9. .subscriptionType(SubscriptionType.Failover)
    10. .subscribe()
    11. //conumser1 is the active consumer, consumer2 is the standby consumer.
    12. //consumer1 receives 5 messages and then crashes, consumer2 takes over as an active consumer.

    Multiple consumers can attach to the same subscription, yet only the first consumer is active, and others are standby. When the active consumer is disconnected, messages will be dispatched to one of standby consumers, and the standby consumer then becomes active consumer.

    If the first active consumer is disconnected after receiving 5 messages, the standby consumer becomes active consumer. Consumer1 will receive:

    1. ("key-1", "message-1-1")
    2. ("key-1", "message-1-2")
    3. ("key-1", "message-1-3")
    4. ("key-2", "message-2-1")
    5. ("key-2", "message-2-2")

    consumer2 will receive:

    1. ("key-2", "message-2-3")
    2. ("key-3", "message-3-1")
    3. ("key-3", "message-3-2")
    4. ("key-4", "message-4-1")
    5. ("key-4", "message-4-2")
    note

    If a topic is a partitioned topic, each partition has only one active consumer, messages of one partition are distributed to only one consumer, and messages of multiple partitions are distributed to multiple consumers.

    Shared

    Create new consumers and subscribe with Shared subscription type.

    1. Consumer consumer1 = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .subscriptionType(SubscriptionType.Shared)
    5. .subscribe()
    6. Consumer consumer2 = client.newConsumer()
    7. .topic("my-topic")
    8. .subscriptionName("my-subscription")
    9. .subscriptionType(SubscriptionType.Shared)
    10. .subscribe()
    11. //Both consumer1 and consumer 2 is active consumers.

    In Shared subscription type, multiple consumers can attach to the same subscription and messages are delivered in a round robin distribution across consumers.

    If a broker dispatches only one message at a time, consumer1 receives the following information.

    1. ("key-1", "message-1-1")
    2. ("key-1", "message-1-3")
    3. ("key-2", "message-2-2")
    4. ("key-3", "message-3-1")
    5. ("key-4", "message-4-1")

    consumer2 receives the following information.

    1. ("key-1", "message-1-2")
    2. ("key-2", "message-2-1")
    3. ("key-2", "message-2-3")
    4. ("key-3", "message-3-2")
    5. ("key-4", "message-4-2")

    Shared subscription is different from Exclusive and Failover subscription types. Shared subscription has better flexibility, but cannot provide order guarantee.

    Key_shared

    This is a new subscription type since 2.4.0 release. Create new consumers and subscribe with Key_Shared subscription type.

    1. Consumer consumer1 = client.newConsumer()
    2. .topic("my-topic")
    3. .subscriptionName("my-subscription")
    4. .subscriptionType(SubscriptionType.Key_Shared)
    5. .subscribe()
    6. Consumer consumer2 = client.newConsumer()
    7. .topic("my-topic")
    8. .subscriptionName("my-subscription")
    9. .subscriptionType(SubscriptionType.Key_Shared)
    10. .subscribe()
    11. //Both consumer1 and consumer2 are active consumers.

    Just like in Shared subscription, all consumers in Key_Shared subscription type can attach to the same subscription. But Key_Shared subscription type is different from the Shared subscription. In Key_Shared subscription type, messages with the same key are delivered to only one consumer in order. The possible distribution of messages between different consumers (by default we do not know in advance which keys will be assigned to a consumer, but a key will only be assigned to a consumer at the same time).

    consumer1 receives the following information.

    1. ("key-1", "message-1-1")
    2. ("key-1", "message-1-2")
    3. ("key-1", "message-1-3")
    4. ("key-3", "message-3-1")
    5. ("key-3", "message-3-2")

    consumer2 receives the following information.

    1. ("key-2", "message-2-1")
    2. ("key-2", "message-2-2")
    3. ("key-2", "message-2-3")
    4. ("key-4", "message-4-1")
    5. ("key-4", "message-4-2")

    If batching is enabled at the producer side, messages with different keys are added to a batch by default. The broker will dispatch the batch to the consumer, so the default batch mechanism may break the Key_Shared subscription guaranteed message distribution semantics. The producer needs to use the KeyBasedBatcher.

    1. Producer producer = client.newProducer()
    2. .topic("my-topic")
    3. .batcherBuilder(BatcherBuilder.KEY_BASED)
    4. .create();

    Or the producer can disable batching.

    1. Producer producer = client.newProducer()
    2. .topic("my-topic")
    3. .enableBatching(false)
    4. .create();
    note

    If the message key is not specified, messages without key are dispatched to one consumer in order by default.

    Reader

    With the reader interface, Pulsar clients can “manually position” themselves within a topic and reading all messages from a specified message onward. The Pulsar API for Java enables you to create Reader objects by specifying a topic and a MessageId.

    The following is an example.

    1. byte[] msgIdBytes = // Some message ID byte array
    2. MessageId id = MessageId.fromByteArray(msgIdBytes);
    3. Reader reader = pulsarClient.newReader()
    4. .topic(topic)
    5. .startMessageId(id)
    6. .create();
    7. while (true) {
    8. Message message = reader.readNext();
    9. // Process message
    10. }

    In the example above, a Reader object is instantiated for a specific topic and message (by ID); the reader iterates over each message in the topic after the message is identified by msgIdBytes (how that value is obtained depends on the application).

    The code sample above shows pointing the Reader object to a specific message (by ID), but you can also use MessageId.earliest to point to the earliest available message on the topic of MessageId.latest to point to the most recent available message.

    Configure reader

    When you create a reader, you can use the loadConf configuration. The following parameters are available in loadConf.

    NameType
    Description
    Default
    topicNameStringTopic name.None
    receiverQueueSizeintSize of a consumer’s receiver queue.

    For example, the number of messages that can be accumulated by a consumer before an application calls Receive.

    A value higher than the default value increases consumer throughput, though at the expense of more memory utilization.
    1000
    readerListenerReaderListener<T>A listener that is called for message received.None
    readerNameStringReader name.null
    subscriptionNameStringSubscription nameWhen there is a single topic, the default subscription name is “reader-“ + 10-digit UUID.
    When there are multiple topics, the default subscription name is “multiTopicsReader-“ + 10-digit UUID.
    subscriptionRolePrefixStringPrefix of subscription role.null
    cryptoKeyReaderCryptoKeyReaderInterface that abstracts the access to a key store.null
    cryptoFailureActionConsumerCryptoFailureActionConsumer should take action when it receives a message that can not be decrypted.
  • FAIL: this is the default option to fail messages until crypto succeeds.
  • DISCARD: silently acknowledge and not deliver message to an application.
  • CONSUME: deliver encrypted messages to applications. It is the application’s responsibility to decrypt the message.

  • The message decompression fails.

    If messages contain batch messages, a client is not be able to retrieve individual messages in batch.

    Delivered encrypted message contains {@link EncryptionContext} which contains encryption and compression information in it using which application can decrypt consumed message payload.
  • ConsumerCryptoFailureAction.FAIL
  • readCompactedbooleanIf enabling readCompacted, a consumer reads messages from a compacted topic rather than a full message backlog of a topic.

    A consumer only sees the latest value for each key in the compacted topic, up until reaching the point in the topic message when compacting backlog. Beyond that point, send messages as normal.

    readCompacted can only be enabled on subscriptions to persistent topics, which have a single active consumer (for example, failure or exclusive subscriptions).

    Attempting to enable it on subscriptions to non-persistent topics or on shared subscriptions leads to a subscription call throwing a PulsarClientException.
    false
    resetIncludeHeadbooleanIf set to true, the first message to be returned is the one specified by messageId.

    If set to false, the first message to be returned is the one next to the message specified by messageId.
    false

    Sticky key range reader

    In sticky key range reader, broker will only dispatch messages which hash of the message key contains by the specified key hash range. Multiple key hash ranges can be specified on a reader.

    The following is an example to create a sticky key range reader.

    1. pulsarClient.newReader()
    2. .topic(topic)
    3. .startMessageId(MessageId.earliest)
    4. .keyHashRange(Range.of(0, 10000), Range.of(20001, 30000))
    5. .create();

    Total hash range size is 65536, so the max end of the range should be less than or equal to 65535.

    TableView

    The TableView interface serves an encapsulated access pattern, providing a continuously updated key-value map view of the compacted topic data. Messages without keys will be ignored.

    With TableView, Pulsar clients can fetch all the message updates from a topic and construct a map with the latest values of each key. These values can then be used to build a local cache of data. In addition, you can register consumers with the TableView by specifying a listener to perform a scan of the map and then receive notifications when new messages are received. Consequently, event handling can be triggered to serve use cases, such as event-driven applications and message monitoring.

    Note: Each TableView uses one Reader instance per partition, and reads the topic starting from the compacted view by default. It is highly recommended to enable automatic compaction by configuring the topic compaction policies for the given topic or namespace. More frequent compaction results in shorter startup times because less data is replayed to reconstruct the TableView of the topic.

    The following figure illustrates the dynamic construction of a TableView updated with newer values of each key. TableView

    Configure TableView

    The following is an example of how to configure a TableView.

    1. TableView<String> tv = client.newTableViewBuilder(Schema.STRING)
    2. .topic("my-tableview")
    3. .create()

    You can use the available parameters in the loadConf configuration or related API to customize your TableView.

    NameTypeRequired?
    Description
    Default
    topicstringyesThe topic name of the TableView.N/A
    autoUpdatePartitionIntervalintnoThe interval to check for newly added partitions.60 (seconds)

    Register listeners

    You can register listeners for both existing messages on a topic and new messages coming into the topic by using forEachAndListen, and specify to perform operations for all existing messages by using forEach.

    The following is an example of how to register listeners with TableView.

    1. // Register listeners for all existing and incoming messages
    2. tv.forEachAndListen((key, value) -> /*operations on all existing and incoming messages*/)
    3. // Register action for all existing messages
    4. tv.forEach((key, value) -> /*operations on all existing messages*/)

    Schema

    In Pulsar, all message data consists of byte arrays “under the hood.” Message schemas enable you to use other types of data when constructing and handling messages (from simple types like strings to more complex, application-specific types). If you construct, say, a producer without specifying a schema, then the producer can only produce messages of type byte[]. The following is an example.

    1. Producer<byte[]> producer = client.newProducer()
    2. .topic(topic)
    3. .create();

    The producer above is equivalent to a Producer<byte[]> (in fact, you should always explicitly specify the type). If you’d like to use a producer for a different type of data, you’ll need to specify a schema that informs Pulsar which data type will be transmitted over the topic.

    AvroBaseStructSchema example

    Let’s say that you have a SensorReading class that you’d like to transmit over a Pulsar topic:

    1. public class SensorReading {
    2. public float temperature;
    3. public SensorReading(float temperature) {
    4. this.temperature = temperature;
    5. }
    6. // A no-arg constructor is required
    7. public SensorReading() {
    8. }
    9. public float getTemperature() {
    10. return temperature;
    11. }
    12. public void setTemperature(float temperature) {
    13. this.temperature = temperature;
    14. }
    15. }

    You could then create a Producer<SensorReading> (or Consumer<SensorReading>) like this:

    1. Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))
    2. .topic("sensor-readings")
    3. .create();

    The following schema formats are currently available for Java:

    • No schema or the byte array schema (which can be applied using Schema.BYTES):

      1. Producer<byte[]> bytesProducer = client.newProducer(Schema.BYTES)
      2. .topic("some-raw-bytes-topic")
      3. .create();
    1. Or, equivalently:
    2. ```
    3. Producer<byte[]> bytesProducer = client.newProducer()
    4. .topic("some-raw-bytes-topic")
    5. .create();
    6. ```
    • String for normal UTF-8-encoded string data. Apply the schema using Schema.STRING:

      1. Producer<String> stringProducer = client.newProducer(Schema.STRING)
      2. .topic("some-string-topic")
      3. .create();
    • Create JSON schemas for POJOs using Schema.JSON. The following is an example.

      1. Producer<MyPojo> pojoProducer = client.newProducer(Schema.JSON(MyPojo.class))
      2. .topic("some-pojo-topic")
      3. .create();
    • Generate Protobuf schemas using Schema.PROTOBUF. The following example shows how to create the Protobuf schema and use it to instantiate a new producer:

      1. Producer<MyProtobuf> protobufProducer = client.newProducer(Schema.PROTOBUF(MyProtobuf.class))
      2. .topic("some-protobuf-topic")
      3. .create();
    • Define Avro schemas with Schema.AVRO. The following code snippet demonstrates how to create and use Avro schema.

      1. Producer<MyAvro> avroProducer = client.newProducer(Schema.AVRO(MyAvro.class))
      2. .topic("some-avro-topic")
      3. .create();

    ProtobufNativeSchema example

    For example of ProtobufNativeSchema, see SchemaDefinition in Complex type.

    Authentication

    Pulsar currently supports three authentication schemes: TLS, Athenz, and Oauth2. You can use the Pulsar Java client with all of them.

    TLS Authentication

    To use TLS, you need to set TLS to true using the setUseTls method, point your Pulsar client to a TLS cert path, and provide paths to cert and key files.

    The following is an example.

    1. Map<String, String> authParams = new HashMap();
    2. authParams.put("tlsCertFile", "/path/to/client-cert.pem");
    3. authParams.put("tlsKeyFile", "/path/to/client-key.pem");
    4. Authentication tlsAuth = AuthenticationFactory
    5. .create(AuthenticationTls.class.getName(), authParams);
    6. PulsarClient client = PulsarClient.builder()
    7. .serviceUrl("pulsar+ssl://my-broker.com:6651")
    8. .enableTls(true)
    9. .tlsTrustCertsFilePath("/path/to/cacert.pem")
    10. .authentication(tlsAuth)
    11. .build();

    Athenz

    To use Athenz as an authentication provider, you need to use TLS and provide values for four parameters in a hash:

    • tenantDomain
    • tenantService
    • providerDomain
    • privateKey

    You can also set an optional keyId. The following is an example.

    1. Map<String, String> authParams = new HashMap();
    2. authParams.put("tenantDomain", "shopping"); // Tenant domain name
    3. authParams.put("tenantService", "some_app"); // Tenant service name
    4. authParams.put("providerDomain", "pulsar"); // Provider domain name
    5. authParams.put("privateKey", "file:///path/to/private.pem"); // Tenant private key path
    6. authParams.put("keyId", "v1"); // Key id for the tenant private key (optional, default: "0")
    7. Authentication athenzAuth = AuthenticationFactory
    8. .create(AuthenticationAthenz.class.getName(), authParams);
    9. PulsarClient client = PulsarClient.builder()
    10. .serviceUrl("pulsar+ssl://my-broker.com:6651")
    11. .enableTls(true)
    12. .tlsTrustCertsFilePath("/path/to/cacert.pem")
    13. .authentication(athenzAuth)
    14. .build();

    Supported pattern formats

    The privateKey parameter supports the following three pattern formats:

    • file:///path/to/file
    • file:/path/to/file
    • data:application/x-pem-file;base64,<base64-encoded value>

    Oauth2

    The following example shows how to use Oauth2 as an authentication provider for the Pulsar Java client.

    You can use the factory method to configure authentication for Pulsar Java client.

    1. PulsarClient client = PulsarClient.builder()
    2. .serviceUrl("pulsar://broker.example.com:6650/")
    3. .authentication(
    4. AuthenticationFactoryOAuth2.clientCredentials(this.issuerUrl, this.credentialsUrl, this.audience))
    5. .build();

    In addition, you can also use the encoded parameters to configure authentication for Pulsar Java client.

    1. Authentication auth = AuthenticationFactory
    2. .create(AuthenticationOAuth2.class.getName(), "{"type":"client_credentials","privateKey":"...","issuerUrl":"...","audience":"..."}");
    3. PulsarClient client = PulsarClient.builder()
    4. .serviceUrl("pulsar://broker.example.com:6650/")
    5. .authentication(auth)
    6. .build();