How to debug Pulsar connectors

This guide explains how to debug connectors in localrun or cluster mode and gives a debugging checklist. To better demonstrate how to debug Pulsar connectors, take the Mongo sink connector as an example.

Deploy a Mongo sink environment

  1. Start a Mongo service.

    1. docker pull mongo:4
    2. docker run -d -p 27017:27017 --name pulsar-mongo -v $PWD/data:/data/db mongo:4
  2. Create a DB and a collection.

    1. docker exec -it pulsar-mongo /bin/bash
    2. mongo
    3. > use pulsar
    4. > db.createCollection('messages')
    5. > exit
  3. Start Pulsar standalone.

    1. docker pull apachepulsar/pulsar:2.4.0
    2. docker run -d -it -p 6650:6650 -p 8080:8080 -v $PWD/data:/pulsar/data --link pulsar-mongo --name pulsar-mongo-standalone apachepulsar/pulsar:2.4.0 bin/pulsar standalone
  4. Configure the Mongo sink with the mongo-sink-config.yaml file.

    1. configs:
    2. mongoUri: "mongodb://pulsar-mongo:27017"
    3. database: "pulsar"
    4. collection: "messages"
    5. batchSize: 2
    6. batchTimeMs: 500
    1. docker cp mongo-sink-config.yaml pulsar-mongo-standalone:/pulsar/
  5. Download the Mongo sink nar package.

    1. docker exec -it pulsar-mongo-standalone /bin/bash
    2. curl -O http://apache.01link.hk/pulsar/pulsar-2.4.0/connectors/pulsar-io-mongo-2.4.0.nar

Debug in localrun mode

Start the Mongo sink in localrun mode using the localrun command.

Debug - 图1tip

For more information about the localrun command, see localrun.

  1. ./bin/pulsar-admin sinks localrun \
  2. --archive $PWD/connectors/pulsar-io-mongo-3.3.2.nar \
  3. --tenant public --namespace default \
  4. --inputs test-mongo \
  5. --name pulsar-mongo-sink \
  6. --sink-config-file $PWD/mongo-sink-config.yaml \
  7. --parallelism 1

Use connector log

To debug a connector in localrun mode, you can use one of the following methods to get a connector log:

  • After executing the localrun command, the log is automatically printed on the console.

  • The log is located at:

    1. logs/functions/tenant/namespace/function-name/function-name-instance-id.log

    Example

    The path of the Mongo sink connector is:

    1. logs/functions/public/default/pulsar-mongo-sink/pulsar-mongo-sink-0.log

To clearly explain the log information, the following is a breakdown into smaller blocks with added descriptions.

  • This piece of log information shows the storage path of the nar package after decompression.

    1. 08:21:54.132 [main] INFO org.apache.pulsar.common.nar.NarClassLoader - Created class loader with paths: [file:/tmp/pulsar-nar/pulsar-io-mongo-2.4.0.nar-unpacked/, file:/tmp/pulsar-nar/pulsar-io-mongo-2.4.0.nar-unpacked/META-INF/bundled-dependencies/,

    Debug - 图2tip

    If class cannot be found exception is thrown, check whether the nar file is decompressed in the folder file:/tmp/pulsar-nar/pulsar-io-mongo-2.4.0.nar-unpacked/META-INF/bundled-dependencies/ or not.

  • This piece of log information illustrates the basic information about the Mongo sink connector, such as tenant, namespace, name, parallelism, resources, and so on, which can be used to check whether the Mongo sink connector is configured correctly or not.

    1. 08:21:55.390 [main] INFO org.apache.pulsar.functions.runtime.ThreadRuntime - ThreadContainer starting function with instance config InstanceConfig(instanceId=0, functionId=853d60a1-0c48-44d5-9a5c-6917386476b2, functionVersion=c2ce1458-b69e-4175-88c0-a0a856a2be8c, functionDetails=tenant: "public"
    2. namespace: "default"
    3. name: "pulsar-mongo-sink"
    4. className: "org.apache.pulsar.functions.api.utils.IdentityFunction"
    5. autoAck: true
    6. parallelism: 1
    7. source {
    8. typeClassName: "[B"
    9. inputSpecs {
    10. key: "test-mongo"
    11. value {
    12. }
    13. }
    14. cleanupSubscription: true
    15. }
    16. sink {
    17. className: "org.apache.pulsar.io.mongodb.MongoSink"
    18. configs: "{\"mongoUri\":\"mongodb://pulsar-mongo:27017\",\"database\":\"pulsar\",\"collection\":\"messages\",\"batchSize\":2,\"batchTimeMs\":500}"
    19. typeClassName: "[B"
    20. }
    21. resources {
    22. cpu: 1.0
    23. ram: 1073741824
    24. disk: 10737418240
    25. }
    26. componentType: SINK
    27. , maxBufferedTuples=1024, functionAuthenticationSpec=null, port=38459, clusterName=local)
  • This piece of log information demonstrates the status of the connections to Mongo and configuration information.

    1. 08:21:56.231 [cluster-ClusterId{value='5d6396a3c9e77c0569ff00eb', description='null'}-pulsar-mongo:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:8}] to pulsar-mongo:27017
    2. 08:21:56.326 [cluster-ClusterId{value='5d6396a3c9e77c0569ff00eb', description='null'}-pulsar-mongo:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=pulsar-mongo:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 0]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=89058800}
  • This piece of log information explains the configuration of consumers and clients, including the topic name, subscription name, subscription type, and so on.

    1. 08:21:56.719 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - Starting Pulsar consumer status recorder with config: {
    2. "topicNames" : [ "test-mongo" ],
    3. "topicsPattern" : null,
    4. "subscriptionName" : "public/default/pulsar-mongo-sink",
    5. "subscriptionType" : "Shared",
    6. "receiverQueueSize" : 1000,
    7. "acknowledgementsGroupTimeMicros" : 100000,
    8. "negativeAckRedeliveryDelayMicros" : 60000000,
    9. "maxTotalReceiverQueueSizeAcrossPartitions" : 50000,
    10. "consumerName" : null,
    11. "ackTimeoutMillis" : 0,
    12. "tickDurationMillis" : 1000,
    13. "priorityLevel" : 0,
    14. "cryptoFailureAction" : "CONSUME",
    15. "properties" : {
    16. "application" : "pulsar-sink",
    17. "id" : "public/default/pulsar-mongo-sink",
    18. "instance_id" : "0"
    19. },
    20. "readCompacted" : false,
    21. "subscriptionInitialPosition" : "Latest",
    22. "patternAutoDiscoveryPeriod" : 1,
    23. "regexSubscriptionMode" : "PersistentOnly",
    24. "deadLetterPolicy" : null,
    25. "autoUpdatePartitions" : true,
    26. "replicateSubscriptionState" : false,
    27. "resetIncludeHead" : false
    28. }
    29. 08:21:56.726 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - Pulsar client config: {
    30. "serviceUrl" : "pulsar://localhost:6650",
    31. "authPluginClassName" : null,
    32. "authParams" : null,
    33. "operationTimeoutMs" : 30000,
    34. "statsIntervalSeconds" : 60,
    35. "numIoThreads" : 1,
    36. "numListenerThreads" : 1,
    37. "connectionsPerBroker" : 1,
    38. "useTcpNoDelay" : true,
    39. "useTls" : false,
    40. "tlsTrustCertsFilePath" : null,
    41. "tlsAllowInsecureConnection" : false,
    42. "tlsHostnameVerificationEnable" : false,
    43. "concurrentLookupRequest" : 5000,
    44. "maxLookupRequest" : 50000,
    45. "maxNumberOfRejectedRequestPerConnection" : 50,
    46. "keepAliveIntervalSeconds" : 30,
    47. "connectionTimeoutMs" : 10000,
    48. "requestTimeoutMs" : 60000,
    49. "defaultBackoffIntervalNanos" : 100000000,
    50. "maxBackoffIntervalNanos" : 30000000000
    51. }

Debug in cluster mode

To debug a connector in cluster mode, you can use the following methods:

Use connector log

In cluster mode, multiple connectors can run on a worker. To find the log path of a specified connector, use the workerId to locate the connector log.

Use admin CLI

Pulsar admin CLI helps you debug Pulsar connectors with the following subcommands:

Create a Mongo sink

  1. ./bin/pulsar-admin sinks create \
  2. --archive $PWD/pulsar-io-mongo-2.4.0.nar \
  3. --tenant public \
  4. --namespace default \
  5. --inputs test-mongo \
  6. --name pulsar-mongo-sink \
  7. --sink-config-file $PWD/mongo-sink-config.yaml \
  8. --parallelism 1

get

Use the get command to get the basic information about the Mongo sink connector, such as tenant, namespace, name, parallelism, and so on.

  1. ./bin/pulsar-admin sinks get --tenant public --namespace default --name pulsar-mongo-sink

Output:

  1. {
  2. "tenant": "public",
  3. "namespace": "default",
  4. "name": "pulsar-mongo-sink",
  5. "className": "org.apache.pulsar.io.mongodb.MongoSink",
  6. "inputSpecs": {
  7. "test-mongo": {
  8. "isRegexPattern": false
  9. }
  10. },
  11. "configs": {
  12. "mongoUri": "mongodb://pulsar-mongo:27017",
  13. "database": "pulsar",
  14. "collection": "messages",
  15. "batchSize": 2.0,
  16. "batchTimeMs": 500.0
  17. },
  18. "parallelism": 1,
  19. "processingGuarantees": "ATLEAST_ONCE",
  20. "retainOrdering": false,
  21. "autoAck": true
  22. }

Debug - 图3tip

For more information about the get command, see get.

status

Use the status command to get the current status about the Mongo sink connector, such as the number of instance, the number of running instance, instanceId, workerId and so on.

  1. ./bin/pulsar-admin sinks status
  2. --tenant public \
  3. --namespace default \
  4. --name pulsar-mongo-sink

Output:

  1. {
  2. "numInstances" : 1,
  3. "numRunning" : 1,
  4. "instances" : [ {
  5. "instanceId" : 0,
  6. "status" : {
  7. "running" : true,
  8. "error" : "",
  9. "numRestarts" : 0,
  10. "numReadFromPulsar" : 0,
  11. "numSystemExceptions" : 0,
  12. "latestSystemExceptions" : [ ],
  13. "numSinkExceptions" : 0,
  14. "latestSinkExceptions" : [ ],
  15. "numWrittenToSink" : 0,
  16. "lastReceivedTime" : 0,
  17. "workerId" : "c-standalone-fw-5d202832fd18-8080"
  18. }
  19. } ]
  20. }

Debug - 图4tip

For more information about the status command, see status. If there are multiple connectors running on a worker, workerId can locate the worker on which the specified connector is running.

topics stats

Use the topics stats command to get the stats for a topic and its connected producer and consumer, such as whether the topic has received messages or not, whether there is a backlog of messages or not, the available permits and other key information. All rates are computed over a 1-minute window and are relative to the last completed 1-minute period.

  1. ./bin/pulsar-admin topics stats test-mongo

Output:

  1. {
  2. "msgRateIn" : 0.0,
  3. "msgThroughputIn" : 0.0,
  4. "msgRateOut" : 0.0,
  5. "msgThroughputOut" : 0.0,
  6. "averageMsgSize" : 0.0,
  7. "storageSize" : 1,
  8. "publishers" : [ ],
  9. "subscriptions" : {
  10. "public/default/pulsar-mongo-sink" : {
  11. "msgRateOut" : 0.0,
  12. "msgThroughputOut" : 0.0,
  13. "msgRateRedeliver" : 0.0,
  14. "msgBacklog" : 0,
  15. "blockedSubscriptionOnUnackedMsgs" : false,
  16. "msgDelayed" : 0,
  17. "unackedMessages" : 0,
  18. "type" : "Shared",
  19. "msgRateExpired" : 0.0,
  20. "consumers" : [ {
  21. "msgRateOut" : 0.0,
  22. "msgThroughputOut" : 0.0,
  23. "msgRateRedeliver" : 0.0,
  24. "consumerName" : "dffdd",
  25. "availablePermits" : 999,
  26. "unackedMessages" : 0,
  27. "blockedConsumerOnUnackedMsgs" : false,
  28. "metadata" : {
  29. "instance_id" : "0",
  30. "application" : "pulsar-sink",
  31. "id" : "public/default/pulsar-mongo-sink"
  32. },
  33. "connectedSince" : "2019-08-26T08:48:07.582Z",
  34. "clientVersion" : "2.4.0",
  35. "address" : "/172.17.0.3:57790"
  36. } ],
  37. "isReplicated" : false
  38. }
  39. },
  40. "replication" : { },
  41. "deduplicationStatus" : "Disabled"
  42. }

Debug - 图5tip

For more information about the topic stats command, see topic stats.

Checklist

This checklist indicates the major areas to check when you debug connectors. It is a reminder of what to look for to ensure a thorough review and an evaluation tool to get the status of connectors.

  • Does Pulsar start successfully?

  • Does the external service run normally?

  • Is the nar package complete?

  • Is the connector configuration file correct?

  • In localrun mode, run a connector and check the printed information (connector log) on the console.

  • In cluster mode:

    • Use the get command to get the basic information.

    • Use the status command to get the current status.

    • Use the topics stats command to get the stats for a specified topic and its connected producers and consumers.

    • Check the connector log.

  • Enter into the external system and verify the result.