Develop Pulsar Functions

You learn how to develop Pulsar Functions with different APIs for Java, Python and Go.

Available APIs

In Java and Python, you have two options to write Pulsar Functions. In Go, you can use Pulsar Functions SDK for Go.

InterfaceDescriptionUse cases
Language-native interfaceNo Pulsar-specific libraries or special dependencies required (only core libraries from Java/Python).Functions that do not require access to the function context.
Pulsar Function SDK for Java/Python/GoPulsar-specific libraries that provide a range of functionality not provided by “native” interfaces.Functions that require access to the function context.
Extended Pulsar Function SDK for JavaAn extension to Pulsar-specific libraries, providing the initialization and close interfaces in Java.Functions that require initializing and releasing external resources.

Language-native interface

The language-native function, which adds an exclamation point to all incoming strings and publishes the resulting string to a topic, has no external dependencies. The following example is language-native function.

  • Java
  • Python
  1. import java.util.function.Function;
  2. public class JavaNativeExclamationFunction implements Function<String, String> {
  3. @Override
  4. public String apply(String input) {
  5. return String.format("%s!", input);
  6. }
  7. }

For complete code, see here.

  1. def process(input):
  2. return "{}!".format(input)

For complete code, see here.

note

You can write Pulsar Functions in python2 or python3. However, Pulsar only looks for python as the interpreter. If you’re running Pulsar Functions on an Ubuntu system that only supports python3, you might fail to start the functions. In this case, you can create a symlink. Your system will fail if you subsequently install any other package that depends on Python 2.x. A solution is under development in Issue 5518.

  1. sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10

Pulsar Function SDK for Java/Python/Go

The following example uses Pulsar Functions SDK.

  • Java
  • Python
  • Go
  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. public class ExclamationFunction implements Function<String, String> {
  4. @Override
  5. public String process(String input, Context context) {
  6. return String.format("%s!", input);
  7. }
  8. }

For complete code, see here.

  1. from pulsar import Function
  2. class ExclamationFunction(Function):
  3. def __init__(self):
  4. pass
  5. def process(self, input, context):
  6. return input + '!'

For complete code, see here.

  1. package main
  2. import (
  3. "context"
  4. "fmt"
  5. "github.com/apache/pulsar/pulsar-function-go/pf"
  6. )
  7. func HandleRequest(ctx context.Context, in []byte) error{
  8. fmt.Println(string(in) + "!")
  9. return nil
  10. }
  11. func main() {
  12. pf.Start(HandleRequest)
  13. }

For complete code, see here.

Extended Pulsar Function SDK for Java

This extended Pulsar Function SDK provides two additional interfaces to initialize and release external resources.

  • By using the initialize interface, you can initialize external resources which only need one-time initialization when the function instance starts.
  • By using the close interface, you can close the referenced external resources when the function instance closes.
note

The extended Pulsar Function SDK for Java is available in Pulsar 2.10.0 and later versions. Before using it, you need to set up Pulsar Function worker 2.10.0 or later versions.

The following example uses the extended interface of Pulsar Function SDK for Java to initialize RedisClient when the function instance starts and release it when the function instance closes.

  • Java
  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. import io.lettuce.core.RedisClient;
  4. public class InitializableFunction implements Function<String, String> {
  5. private RedisClient redisClient;
  6. private void initRedisClient(Map<String, Object> connectInfo) {
  7. redisClient = RedisClient.create(connectInfo.get("redisURI"));
  8. }
  9. @Override
  10. public void initialize(Context context) {
  11. Map<String, Object> connectInfo = context.getUserConfigMap();
  12. redisClient = initRedisClient(connectInfo);
  13. }
  14. @Override
  15. public String process(String input, Context context) {
  16. String value = client.get(key);
  17. return String.format("%s-%s", input, value);
  18. }
  19. @Override
  20. public void close() {
  21. redisClient.close();
  22. }
  23. }

Schema registry

Pulsar has a built-in schema registry and is bundled with popular schema types, such as Avro, JSON and Protobuf. Pulsar Functions can leverage the existing schema information from input topics and derive the input type. The schema registry applies for output topic as well.

SerDe

SerDe stands for Serialization and Deserialization. Pulsar Functions uses SerDe when publishing data to and consuming data from Pulsar topics. How SerDe works by default depends on the language you use for a particular function.

  • Java
  • Python
  • Go

When you write Pulsar Functions in Java, the following basic Java types are built in and supported by default: String, Double, Integer, Float, Long, Short, and Byte.

To customize Java types, you need to implement the following interface.

  1. public interface SerDe<T> {
  2. T deserialize(byte[] input);
  3. byte[] serialize(T input);
  4. }

SerDe works in the following ways in Java Functions.

  • If the input and output topics have schema, Pulsar Functions use schema for SerDe.
  • If the input or output topics do not exist, Pulsar Functions adopt the following rules to determine SerDe:
    • If the schema type is specified, Pulsar Functions use the specified schema type.
    • If SerDe is specified, Pulsar Functions use the specified SerDe, and the schema type for input and output topics is Byte.
    • If neither the schema type nor SerDe is specified, Pulsar Functions use the built-in SerDe. For non-primitive schema type, the built-in SerDe serializes and deserializes objects in the JSON format.

In Python, the default SerDe is identity, meaning that the type is serialized as whatever type the producer function returns.

You can specify the SerDe when creating or running functions.

  1. $ bin/pulsar-admin functions create \
  2. --tenant public \
  3. --namespace default \
  4. --name my_function \
  5. --py my_function.py \
  6. --classname my_function.MyFunction \
  7. --custom-serde-inputs '{"input-topic-1":"Serde1","input-topic-2":"Serde2"}' \
  8. --output-serde-classname Serde3 \
  9. --output output-topic-1

This case contains two input topics: input-topic-1 and input-topic-2, each of which is mapped to a different SerDe class (the map must be specified as a JSON string). The output topic, output-topic-1, uses the Serde3 class for SerDe. At the moment, all Pulsar Functions logic, include processing function and SerDe classes, must be contained within a single Python file.

When using Pulsar Functions for Python, you have three SerDe options:

  1. You can use the IdentitySerde, which leaves the data unchanged. The IdentitySerDe is the default. Creating or running a function without explicitly specifying SerDe means that this option is used.
  2. You can use the PickleSerDe, which uses Python pickle for SerDe.
  3. You can create a custom SerDe class by implementing the baseline SerDe class, which has just two methods: serialize for converting the object into bytes, and deserialize for converting bytes into an object of the required application-specific type.

The table below shows when you should use each SerDe.

SerDe optionWhen to use
IdentitySerdeWhen you work with simple types like strings, Booleans, integers.
PickleSerDeWhen you work with complex, application-specific types and are comfortable with the “best effort” approach of pickle.
Custom SerDeWhen you require explicit control over SerDe, potentially for performance or data compatibility purposes.

Currently, the feature is not available in Go.

Example

Imagine that you’re writing Pulsar Functions that are processing tweet objects, you can refer to the following example of Tweet class.

  • Java
  • Python
  1. public class Tweet {
  2. private String username;
  3. private String tweetContent;
  4. public Tweet(String username, String tweetContent) {
  5. this.username = username;
  6. this.tweetContent = tweetContent;
  7. }
  8. // Standard setters and getters
  9. }

To pass Tweet objects directly between Pulsar Functions, you need to provide a custom SerDe class. In the example below, Tweet objects are basically strings in which the username and tweet content are separated by a |.

  1. package com.example.serde;
  2. import org.apache.pulsar.functions.api.SerDe;
  3. import java.util.regex.Pattern;
  4. public class TweetSerde implements SerDe<Tweet> {
  5. public Tweet deserialize(byte[] input) {
  6. String s = new String(input);
  7. String[] fields = s.split(Pattern.quote("|"));
  8. return new Tweet(fields[0], fields[1]);
  9. }
  10. public byte[] serialize(Tweet input) {
  11. return "%s|%s".format(input.getUsername(), input.getTweetContent()).getBytes();
  12. }
  13. }

To apply this customized SerDe to a particular Pulsar Function, you need to:

  • Package the Tweet and TweetSerde classes into a JAR.
  • Specify a path to the JAR and SerDe class name when deploying the function.

The following is an example of create operation.

  1. $ bin/pulsar-admin functions create \
  2. --jar /path/to/your.jar \
  3. --output-serde-classname com.example.serde.TweetSerde \
  4. # Other function attributes

Custom SerDe classes must be packaged with your function JARs

Pulsar does not store your custom SerDe classes separately from your Pulsar Functions. So you need to include your SerDe classes in your function JARs. If not, Pulsar returns an error.

  1. class Tweet(object):
  2. def __init__(self, username, tweet_content):
  3. self.username = username
  4. self.tweet_content = tweet_content

In order to use this class in Pulsar Functions, you have two options:

  1. You can specify PickleSerDe, which applies the pickle library SerDe.

  2. You can create your own SerDe class. The following is an example.

    1. from pulsar import SerDe
    2. class TweetSerDe(SerDe):
    3. def serialize(self, input):
    4. return bytes("{0}|{1}".format(input.username, input.tweet_content))
    5. def deserialize(self, input_bytes):
    6. tweet_components = str(input_bytes).split('|')
    7. return Tweet(tweet_components[0], tweet_componentsp[1])

For complete code, see here.

In both languages, however, you can write custom SerDe logic for more complex, application-specific types.

Context

Java, Python and Go SDKs provide access to a context object that can be used by a function. This context object provides a wide variety of information and functionality to the function.

  • The name and ID of a Pulsar Function.
  • The message ID of each message. Each Pulsar message is automatically assigned with an ID.
  • The key, event time, properties and partition key of each message.
  • The name of the topic to which the message is sent.
  • The names of all input topics as well as the output topic associated with the function.
  • The name of the class used for SerDe.
  • The tenant and namespace associated with the function.
  • The ID of the Pulsar Functions instance running the function.
  • The version of the function.
  • The logger object used by the function, which can be used to create function log messages.
  • Access to arbitrary user configuration values supplied via the CLI.
  • An interface for recording metrics.
  • An interface for storing and retrieving state in state storage.
  • A function to publish new messages onto arbitrary topics.
  • A function to ack the message being processed (if auto-ack is disabled).
  • (Java) get Pulsar admin client.

  • Java

  • Python
  • Go

The Context interface provides a number of methods that you can use to access the function context. The various method signatures for the Context interface are listed as follows.

  1. public interface Context {
  2. Record<?> getCurrentRecord();
  3. Collection<String> getInputTopics();
  4. String getOutputTopic();
  5. String getOutputSchemaType();
  6. String getTenant();
  7. String getNamespace();
  8. String getFunctionName();
  9. String getFunctionId();
  10. String getInstanceId();
  11. String getFunctionVersion();
  12. Logger getLogger();
  13. void incrCounter(String key, long amount);
  14. void incrCounterAsync(String key, long amount);
  15. long getCounter(String key);
  16. long getCounterAsync(String key);
  17. void putState(String key, ByteBuffer value);
  18. void putStateAsync(String key, ByteBuffer value);
  19. void deleteState(String key);
  20. ByteBuffer getState(String key);
  21. ByteBuffer getStateAsync(String key);
  22. Map<String, Object> getUserConfigMap();
  23. Optional<Object> getUserConfigValue(String key);
  24. Object getUserConfigValueOrDefault(String key, Object defaultValue);
  25. void recordMetric(String metricName, double value);
  26. <O> CompletableFuture<Void> publish(String topicName, O object, String schemaOrSerdeClassName);
  27. <O> CompletableFuture<Void> publish(String topicName, O object);
  28. <O> TypedMessageBuilder<O> newOutputMessage(String topicName, Schema<O> schema) throws PulsarClientException;
  29. <O> ConsumerBuilder<O> newConsumerBuilder(Schema<O> schema) throws PulsarClientException;
  30. PulsarAdmin getPulsarAdmin();
  31. PulsarAdmin getPulsarAdmin(String clusterName);
  32. }

The following example uses several methods available via the Context object.

  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. import org.slf4j.Logger;
  4. import java.util.stream.Collectors;
  5. public class ContextFunction implements Function<String, Void> {
  6. public Void process(String input, Context context) {
  7. Logger LOG = context.getLogger();
  8. String inputTopics = context.getInputTopics().stream().collect(Collectors.joining(", "));
  9. String functionName = context.getFunctionName();
  10. String logMessage = String.format("A message with a value of \"%s\" has arrived on one of the following topics: %s\n",
  11. input,
  12. inputTopics);
  13. LOG.info(logMessage);
  14. String metricName = String.format("function-%s-messages-received", functionName);
  15. context.recordMetric(metricName, 1);
  16. return null;
  17. }
  18. }
  1. class ContextImpl(pulsar.Context):
  2. def get_message_id(self):
  3. ...
  4. def get_message_key(self):
  5. ...
  6. def get_message_eventtime(self):
  7. ...
  8. def get_message_properties(self):
  9. ...
  10. def get_current_message_topic_name(self):
  11. ...
  12. def get_partition_key(self):
  13. ...
  14. def get_function_name(self):
  15. ...
  16. def get_function_tenant(self):
  17. ...
  18. def get_function_namespace(self):
  19. ...
  20. def get_function_id(self):
  21. ...
  22. def get_instance_id(self):
  23. ...
  24. def get_function_version(self):
  25. ...
  26. def get_logger(self):
  27. ...
  28. def get_user_config_value(self, key):
  29. ...
  30. def get_user_config_map(self):
  31. ...
  32. def record_metric(self, metric_name, metric_value):
  33. ...
  34. def get_input_topics(self):
  35. ...
  36. def get_output_topic(self):
  37. ...
  38. def get_output_serde_class_name(self):
  39. ...
  40. def publish(self, topic_name, message, serde_class_name="serde.IdentitySerDe",
  41. properties=None, compression_type=None, callback=None, message_conf=None):
  42. ...
  43. def ack(self, msgid, topic):
  44. ...
  45. def get_and_reset_metrics(self):
  46. ...
  47. def reset_metrics(self):
  48. ...
  49. def get_metrics(self):
  50. ...
  51. def incr_counter(self, key, amount):
  52. ...
  53. def get_counter(self, key):
  54. ...
  55. def del_counter(self, key):
  56. ...
  57. def put_state(self, key, value):
  58. ...
  59. def get_state(self, key):
  60. ...
  1. func (c *FunctionContext) GetInstanceID() int {
  2. return c.instanceConf.instanceID
  3. }
  4. func (c *FunctionContext) GetInputTopics() []string {
  5. return c.inputTopics
  6. }
  7. func (c *FunctionContext) GetOutputTopic() string {
  8. return c.instanceConf.funcDetails.GetSink().Topic
  9. }
  10. func (c *FunctionContext) GetFuncTenant() string {
  11. return c.instanceConf.funcDetails.Tenant
  12. }
  13. func (c *FunctionContext) GetFuncName() string {
  14. return c.instanceConf.funcDetails.Name
  15. }
  16. func (c *FunctionContext) GetFuncNamespace() string {
  17. return c.instanceConf.funcDetails.Namespace
  18. }
  19. func (c *FunctionContext) GetFuncID() string {
  20. return c.instanceConf.funcID
  21. }
  22. func (c *FunctionContext) GetFuncVersion() string {
  23. return c.instanceConf.funcVersion
  24. }
  25. func (c *FunctionContext) GetUserConfValue(key string) interface{} {
  26. return c.userConfigs[key]
  27. }
  28. func (c *FunctionContext) GetUserConfMap() map[string]interface{} {
  29. return c.userConfigs
  30. }
  31. func (c *FunctionContext) SetCurrentRecord(record pulsar.Message) {
  32. c.record = record
  33. }
  34. func (c *FunctionContext) GetCurrentRecord() pulsar.Message {
  35. return c.record
  36. }
  37. func (c *FunctionContext) NewOutputMessage(topic string) pulsar.Producer {
  38. return c.outputMessage(topic)
  39. }

The following example uses several methods available via the Context object.

  1. import (
  2. "context"
  3. "fmt"
  4. "github.com/apache/pulsar/pulsar-function-go/pf"
  5. )
  6. func contextFunc(ctx context.Context) {
  7. if fc, ok := pf.FromContext(ctx); ok {
  8. fmt.Printf("function ID is:%s, ", fc.GetFuncID())
  9. fmt.Printf("function version is:%s\n", fc.GetFuncVersion())
  10. }
  11. }

For complete code, see here.

User config

When you run or update Pulsar Functions created using SDK, you can pass arbitrary key/values to them with the command line with the --user-config flag. Key/values must be specified as JSON. The following function creation command passes a user configured key/value to a function.

  1. $ bin/pulsar-admin functions create \
  2. --name word-filter \
  3. # Other function configs
  4. --user-config '{"forbidden-word":"rosebud"}'
  • Java
  • Python
  • Go

The Java SDK Context object enables you to access key/value pairs provided to Pulsar Functions via the command line (as JSON). The following example passes a key/value pair.

  1. $ bin/pulsar-admin functions create \
  2. # Other function configs
  3. --user-config '{"word-of-the-day":"verdure"}'

To access that value in a Java function:

  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. import org.slf4j.Logger;
  4. import java.util.Optional;
  5. public class UserConfigFunction implements Function<String, Void> {
  6. @Override
  7. public void apply(String input, Context context) {
  8. Logger LOG = context.getLogger();
  9. Optional<String> wotd = context.getUserConfigValue("word-of-the-day");
  10. if (wotd.isPresent()) {
  11. LOG.info("The word of the day is {}", wotd);
  12. } else {
  13. LOG.warn("No word of the day provided");
  14. }
  15. return null;
  16. }
  17. }

The UserConfigFunction function will log the string "The word of the day is verdure" every time the function is invoked (which means every time a message arrives). The word-of-the-day user config will be changed only when the function is updated with a new config value via the command line.

You can also access the entire user config map or set a default value in case no value is present:

  1. // Get the whole config map
  2. Map<String, String> allConfigs = context.getUserConfigMap();
  3. // Get value or resort to default
  4. String wotd = context.getUserConfigValueOrDefault("word-of-the-day", "perspicacious");

For all key/value pairs passed to Java functions, both the key and the value are String. To set the value to be a different type, you need to deserialize from the String type.

In Python function, you can access the configuration value like this.

  1. from pulsar import Function
  2. class WordFilter(Function):
  3. def process(self, context, input):
  4. forbidden_word = context.user_config()["forbidden-word"]
  5. # Don't publish the message if it contains the user-supplied
  6. # forbidden word
  7. if forbidden_word in input:
  8. pass
  9. # Otherwise publish the message
  10. else:
  11. return input

The Python SDK Context object enables you to access key/value pairs provided to Pulsar Functions via the command line (as JSON). The following example passes a key/value pair.

  1. $ bin/pulsar-admin functions create \
  2. # Other function configs \
  3. --user-config '{"word-of-the-day":"verdure"}'

To access that value in a Python function:

  1. from pulsar import Function
  2. class UserConfigFunction(Function):
  3. def process(self, input, context):
  4. logger = context.get_logger()
  5. wotd = context.get_user_config_value('word-of-the-day')
  6. if wotd is None:
  7. logger.warn('No word of the day provided')
  8. else:
  9. logger.info("The word of the day is {0}".format(wotd))

The Go SDK Context object enables you to access key/value pairs provided to Pulsar Functions via the command line (as JSON). The following example passes a key/value pair.

  1. $ bin/pulsar-admin functions create \
  2. --go path/to/go/binary
  3. --user-config '{"word-of-the-day":"lackadaisical"}'

To access that value in a Go function:

  1. func contextFunc(ctx context.Context) {
  2. fc, ok := pf.FromContext(ctx)
  3. if !ok {
  4. logutil.Fatal("Function context is not defined")
  5. }
  6. wotd := fc.GetUserConfValue("word-of-the-day")
  7. if wotd == nil {
  8. logutil.Warn("The word of the day is empty")
  9. } else {
  10. logutil.Infof("The word of the day is %s", wotd.(string))
  11. }
  12. }

Logger

  • Java
  • Python
  • Go

Pulsar Functions that use the Java SDK have access to an SLF4j Logger object that can be used to produce logs at the chosen log level. The following example logs either a WARNING- or INFO-level log based on whether the incoming string contains the word danger.

  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. import org.slf4j.Logger;
  4. public class LoggingFunction implements Function<String, Void> {
  5. @Override
  6. public void apply(String input, Context context) {
  7. Logger LOG = context.getLogger();
  8. String messageId = new String(context.getMessageId());
  9. if (input.contains("danger")) {
  10. LOG.warn("A warning was received in message {}", messageId);
  11. } else {
  12. LOG.info("Message {} received\nContent: {}", messageId, input);
  13. }
  14. return null;
  15. }
  16. }

If you want your function to produce logs, you need to specify a log topic when creating or running the function. The following is an example.

  1. $ bin/pulsar-admin functions create \
  2. --jar my-functions.jar \
  3. --classname my.package.LoggingFunction \
  4. --log-topic persistent://public/default/logging-function-logs \
  5. # Other function configs

All logs produced by LoggingFunction above can be accessed via the persistent://public/default/logging-function-logs topic.

Customize Function log level

Additionally, you can use the XML file, functions_log4j2.xml, to customize the function log level. To customize the function log level, create or update functions_log4j2.xml in your Pulsar conf directory (for example, /etc/pulsar/ on bare-metal, or /pulsar/conf on Kubernetes) to contain contents such as:

  1. <Configuration>
  2. <name>pulsar-functions-instance</name>
  3. <monitorInterval>30</monitorInterval>
  4. <Properties>
  5. <Property>
  6. <name>pulsar.log.appender</name>
  7. <value>RollingFile</value>
  8. </Property>
  9. <Property>
  10. <name>pulsar.log.level</name>
  11. <value>debug</value>
  12. </Property>
  13. <Property>
  14. <name>bk.log.level</name>
  15. <value>debug</value>
  16. </Property>
  17. </Properties>
  18. <Appenders>
  19. <Console>
  20. <name>Console</name>
  21. <target>SYSTEM_OUT</target>
  22. <PatternLayout>
  23. <Pattern>%d{ISO8601_OFFSET_DATE_TIME_HHMM} [%t] %-5level %logger{36} - %msg%n</Pattern>
  24. </PatternLayout>
  25. </Console>
  26. <RollingFile>
  27. <name>RollingFile</name>
  28. <fileName>${sys:pulsar.function.log.dir}/${sys:pulsar.function.log.file}.log</fileName>
  29. <filePattern>${sys:pulsar.function.log.dir}/${sys:pulsar.function.log.file}-%d{MM-dd-yyyy}-%i.log.gz</filePattern>
  30. <immediateFlush>true</immediateFlush>
  31. <PatternLayout>
  32. <Pattern>%d{ISO8601_OFFSET_DATE_TIME_HHMM} [%t] %-5level %logger{36} - %msg%n</Pattern>
  33. </PatternLayout>
  34. <Policies>
  35. <TimeBasedTriggeringPolicy>
  36. <interval>1</interval>
  37. <modulate>true</modulate>
  38. </TimeBasedTriggeringPolicy>
  39. <SizeBasedTriggeringPolicy>
  40. <size>1 GB</size>
  41. </SizeBasedTriggeringPolicy>
  42. <CronTriggeringPolicy>
  43. <schedule>0 0 0 * * ?</schedule>
  44. </CronTriggeringPolicy>
  45. </Policies>
  46. <DefaultRolloverStrategy>
  47. <Delete>
  48. <basePath>${sys:pulsar.function.log.dir}</basePath>
  49. <maxDepth>2</maxDepth>
  50. <IfFileName>
  51. <glob>*/${sys:pulsar.function.log.file}*log.gz</glob>
  52. </IfFileName>
  53. <IfLastModified>
  54. <age>30d</age>
  55. </IfLastModified>
  56. </Delete>
  57. </DefaultRolloverStrategy>
  58. </RollingFile>
  59. <RollingRandomAccessFile>
  60. <name>BkRollingFile</name>
  61. <fileName>${sys:pulsar.function.log.dir}/${sys:pulsar.function.log.file}.bk</fileName>
  62. <filePattern>${sys:pulsar.function.log.dir}/${sys:pulsar.function.log.file}.bk-%d{MM-dd-yyyy}-%i.log.gz</filePattern>
  63. <immediateFlush>true</immediateFlush>
  64. <PatternLayout>
  65. <Pattern>%d{ISO8601_OFFSET_DATE_TIME_HHMM} [%t] %-5level %logger{36} - %msg%n</Pattern>
  66. </PatternLayout>
  67. <Policies>
  68. <TimeBasedTriggeringPolicy>
  69. <interval>1</interval>
  70. <modulate>true</modulate>
  71. </TimeBasedTriggeringPolicy>
  72. <SizeBasedTriggeringPolicy>
  73. <size>1 GB</size>
  74. </SizeBasedTriggeringPolicy>
  75. <CronTriggeringPolicy>
  76. <schedule>0 0 0 * * ?</schedule>
  77. </CronTriggeringPolicy>
  78. </Policies>
  79. <DefaultRolloverStrategy>
  80. <Delete>
  81. <basePath>${sys:pulsar.function.log.dir}</basePath>
  82. <maxDepth>2</maxDepth>
  83. <IfFileName>
  84. <glob>*/${sys:pulsar.function.log.file}.bk*log.gz</glob>
  85. </IfFileName>
  86. <IfLastModified>
  87. <age>30d</age>
  88. </IfLastModified>
  89. </Delete>
  90. </DefaultRolloverStrategy>
  91. </RollingRandomAccessFile>
  92. </Appenders>
  93. <Loggers>
  94. <Logger>
  95. <name>org.apache.pulsar.functions.runtime.shaded.org.apache.bookkeeper</name>
  96. <level>${sys:bk.log.level}</level>
  97. <additivity>false</additivity>
  98. <AppenderRef>
  99. <ref>BkRollingFile</ref>
  100. </AppenderRef>
  101. </Logger>
  102. <Root>
  103. <level>${sys:pulsar.log.level}</level>
  104. <AppenderRef>
  105. <ref>${sys:pulsar.log.appender}</ref>
  106. <level>${sys:pulsar.log.level}</level>
  107. </AppenderRef>
  108. </Root>
  109. </Loggers>
  110. </Configuration>

The properties set like:

  1. <Property>
  2. <name>pulsar.log.level</name>
  3. <value>debug</value>
  4. </Property>

propagate to places where they are referenced, such as:

  1. <Root>
  2. <level>${sys:pulsar.log.level}</level>
  3. <AppenderRef>
  4. <ref>${sys:pulsar.log.appender}</ref>
  5. <level>${sys:pulsar.log.level}</level>
  6. </AppenderRef>
  7. </Root>

In the above example, debug level logging would be applied to ALL function logs. This may be more verbose than you desire. To be more selective, you can apply different log levels to different classes or modules. For example:

  1. <Logger>
  2. <name>com.example.module</name>
  3. <level>info</level>
  4. <additivity>false</additivity>
  5. <AppenderRef>
  6. <ref>${sys:pulsar.log.appender}</ref>
  7. </AppenderRef>
  8. </Logger>

You can be more specific as well, such as applying a more verbose log level to a class in the module, such as:

  1. <Logger>
  2. <name>com.example.module.className</name>
  3. <level>debug</level>
  4. <additivity>false</additivity>
  5. <AppenderRef>
  6. <ref>Console</ref>
  7. </AppenderRef>
  8. </Logger>

Each <AppenderRef> entry allows you to output the log to a target specified in the definition of the Appender.

Additivity pertains to whether log messages will be duplicated if multiple Logger entries overlap. To disable additivity, specify

  1. <additivity>false</additivity>

as shown in examples above. Disabling additivity prevents duplication of log messages when one or more <Logger> entries contain classes or modules that overlap.

The <AppenderRef> is defined in the <Appenders> section, such as:

  1. <Console>
  2. <name>Console</name>
  3. <target>SYSTEM_OUT</target>
  4. <PatternLayout>
  5. <Pattern>%d{ISO8601_OFFSET_DATE_TIME_HHMM} [%t] %-5level %logger{36} - %msg%n</Pattern>
  6. </PatternLayout>
  7. </Console>

Pulsar Functions that use the Python SDK have access to a logging object that can be used to produce logs at the chosen log level. The following example function that logs either a WARNING- or INFO-level log based on whether the incoming string contains the word danger.

  1. from pulsar import Function
  2. class LoggingFunction(Function):
  3. def process(self, input, context):
  4. logger = context.get_logger()
  5. msg_id = context.get_message_id()
  6. if 'danger' in input:
  7. logger.warn("A warning was received in message {0}".format(context.get_message_id()))
  8. else:
  9. logger.info("Message {0} received\nContent: {1}".format(msg_id, input))

If you want your function to produce logs on a Pulsar topic, you need to specify a log topic when creating or running the function. The following is an example.

  1. $ bin/pulsar-admin functions create \
  2. --py logging_function.py \
  3. --classname logging_function.LoggingFunction \
  4. --log-topic logging-function-logs \
  5. # Other function configs

All logs produced by LoggingFunction above can be accessed via the logging-function-logs topic. Additionally, you can specify the function log level through the broker XML file as described in Customize Function log level.

The following Go Function example shows different log levels based on the function input.

  1. import (
  2. "context"
  3. "github.com/apache/pulsar/pulsar-function-go/pf"
  4. log "github.com/apache/pulsar/pulsar-function-go/logutil"
  5. )
  6. func loggerFunc(ctx context.Context, input []byte) {
  7. if len(input) <= 100 {
  8. log.Infof("This input has a length of: %d", len(input))
  9. } else {
  10. log.Warnf("This input is getting too long! It has {%d} characters", len(input))
  11. }
  12. }
  13. func main() {
  14. pf.Start(loggerFunc)
  15. }

When you use logTopic related functionalities in Go Function, import github.com/apache/pulsar/pulsar-function-go/logutil, and you do not have to use the getLogger() context object.

Additionally, you can specify the function log level through the broker XML file, as described here: Customize Function log level

Pulsar admin

Pulsar Functions using the Java SDK has access to the Pulsar admin client, which allows the Pulsar admin client to manage API calls to current Pulsar clusters or external clusters (if external-pulsars is provided).

  • Java

Below is an example of how to use the Pulsar admin client exposed from the Function context.

  1. import org.apache.pulsar.client.admin.PulsarAdmin;
  2. import org.apache.pulsar.functions.api.Context;
  3. import org.apache.pulsar.functions.api.Function;
  4. /**
  5. * In this particular example, for every input message,
  6. * the function resets the cursor of the current function's subscription to a
  7. * specified timestamp.
  8. */
  9. public class CursorManagementFunction implements Function<String, String> {
  10. @Override
  11. public String process(String input, Context context) throws Exception {
  12. PulsarAdmin adminClient = context.getPulsarAdmin();
  13. if (adminClient != null) {
  14. String topic = context.getCurrentRecord().getTopicName().isPresent() ?
  15. context.getCurrentRecord().getTopicName().get() : null;
  16. String subName = context.getTenant() + "/" + context.getNamespace() + "/" + context.getFunctionName();
  17. if (topic != null) {
  18. // 1578188166 below is a random-pick timestamp
  19. adminClient.topics().resetCursor(topic, subName, 1578188166);
  20. return "reset cursor successfully";
  21. }
  22. }
  23. return null;
  24. }
  25. }

If you want your function to get access to the Pulsar admin client, you need to enable this feature by setting exposeAdminClientEnabled=true in the functions_worker.yml file. You can test whether this feature is enabled or not using the command pulsar-admin functions localrun with the flag --web-service-url.

  1. $ bin/pulsar-admin functions localrun \
  2. --jar my-functions.jar \
  3. --classname my.package.CursorManagementFunction \
  4. --web-service-url http://pulsar-web-service:8080 \
  5. # Other function configs

Metrics

Pulsar Functions allows you to deploy and manage processing functions that consume messages from and publish messages to Pulsar topics easily. It is important to ensure that the running functions are healthy at any time. Pulsar Functions can publish arbitrary metrics to the metrics interface which can be queried.

note

If a Pulsar Function uses the language-native interface for Java or Python, that function is not able to publish metrics and stats to Pulsar.

You can monitor Pulsar Functions that have been deployed with the following methods:

  • Check the metrics provided by Pulsar.

    Pulsar Functions expose the metrics that can be collected and used for monitoring the health of Java, Python, and Go functions. You can check the metrics by following the monitoring guide.

    For the complete list of the function metrics, see here.

  • Set and check your customized metrics.

    In addition to the metrics provided by Pulsar, Pulsar allows you to customize metrics for Java and Python functions. Function workers collect user-defined metrics to Prometheus automatically and you can check them in Grafana.

Here are examples of how to customize metrics for Java and Python functions.

  • Java
  • Python
  • Go

You can record metrics using the Context object on a per-key basis. For example, you can set a metric for the process-count key and a different metric for the elevens-count key every time the function processes a message.

  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. public class MetricRecorderFunction implements Function<Integer, Void> {
  4. @Override
  5. public void apply(Integer input, Context context) {
  6. // Records the metric 1 every time a message arrives
  7. context.recordMetric("hit-count", 1);
  8. // Records the metric only if the arriving number equals 11
  9. if (input == 11) {
  10. context.recordMetric("elevens-count", 1);
  11. }
  12. return null;
  13. }
  14. }

You can record metrics using the Context object on a per-key basis. For example, you can set a metric for the process-count key and a different metric for the elevens-count key every time the function processes a message. The following is an example.

  1. from pulsar import Function
  2. class MetricRecorderFunction(Function):
  3. def process(self, input, context):
  4. context.record_metric('hit-count', 1)
  5. if input == 11:
  6. context.record_metric('elevens-count', 1)

The Go SDK Context object enables you to record metrics on a per-key basis. For example, you can set a metric for the process-count key and a different metric for the elevens-count key every time the function processes a message:

  1. func metricRecorderFunction(ctx context.Context, in []byte) error {
  2. inputstr := string(in)
  3. fctx, ok := pf.FromContext(ctx)
  4. if !ok {
  5. return errors.New("get Go Functions Context error")
  6. }
  7. fctx.RecordMetric("hit-count", 1)
  8. if inputstr == "eleven" {
  9. fctx.RecordMetric("elevens-count", 1)
  10. }
  11. return nil
  12. }

Security

If you want to enable security on Pulsar Functions, first you should enable security on Functions Workers. For more details, refer to Security settings.

Pulsar Functions can support the following providers:

  • ClearTextSecretsProvider
  • EnvironmentBasedSecretsProvider

Pulsar Function supports ClearTextSecretsProvider by default.

At the same time, Pulsar Functions provides two interfaces, SecretsProvider and SecretsProviderConfigurator, allowing users to customize secret provider.

  • Java
  • Python
  • Go

You can get secret provider using the Context object. The following is an example:

  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. import org.slf4j.Logger;
  4. public class GetSecretProviderFunction implements Function<String, Void> {
  5. @Override
  6. public Void process(String input, Context context) throws Exception {
  7. Logger LOG = context.getLogger();
  8. String secretProvider = context.getSecret(input);
  9. if (!secretProvider.isEmpty()) {
  10. LOG.info("The secret provider is {}", secretProvider);
  11. } else {
  12. LOG.warn("No secret provider");
  13. }
  14. return null;
  15. }
  16. }

You can get secret provider using the Context object. The following is an example:

  1. from pulsar import Function
  2. class GetSecretProviderFunction(Function):
  3. def process(self, input, context):
  4. logger = context.get_logger()
  5. secret_provider = context.get_secret(input)
  6. if secret_provider is None:
  7. logger.warn('No secret provider')
  8. else:
  9. logger.info("The secret provider is {0}".format(secret_provider))

Currently, the feature is not available in Go.

State storage

Pulsar Functions use Apache BookKeeper as a state storage interface. Pulsar installation, including the local standalone installation, includes deployment of BookKeeper bookies.

Since Pulsar 2.1.0 release, Pulsar integrates with Apache BookKeeper table service to store the State for functions. For example, a WordCount function can store its counters state into BookKeeper table service via Pulsar Functions State API.

States are key-value pairs, where the key is a string and the value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual Pulsar Function, and shared between instances of that function.

You can access states within Pulsar Java Functions using the putState, putStateAsync, getState, getStateAsync, incrCounter, incrCounterAsync, getCounter, getCounterAsync and deleteState calls on the context object. You can access states within Pulsar Python Functions using the putState, getState, incrCounter, getCounter and deleteState calls on the context object. You can also manage states using the querystate and putstate options to pulsar-admin functions.

note

State storage is not available in Go.

API

  • Java
  • Python

Currently Pulsar Functions expose the following APIs for mutating and accessing State. These APIs are available in the Context object when you are using Java SDK functions.

incrCounter

  1. /**
  2. * Increment the builtin distributed counter referred by key
  3. * @param key The name of the key
  4. * @param amount The amount to be incremented
  5. */
  6. void incrCounter(String key, long amount);

The application can use incrCounter to change the counter of a given key by the given amount.

incrCounterAsync

  1. /**
  2. * Increment the builtin distributed counter referred by key
  3. * but dont wait for the completion of the increment operation
  4. *
  5. * @param key The name of the key
  6. * @param amount The amount to be incremented
  7. */
  8. CompletableFuture<Void> incrCounterAsync(String key, long amount);

The application can use incrCounterAsync to asynchronously change the counter of a given key by the given amount.

getCounter

  1. /**
  2. * Retrieve the counter value for the key.
  3. *
  4. * @param key name of the key
  5. * @return the amount of the counter value for this key
  6. */
  7. long getCounter(String key);

The application can use getCounter to retrieve the counter of a given key mutated by incrCounter.

Except the counter API, Pulsar also exposes a general key/value API for functions to store general key/value state.

getCounterAsync

  1. /**
  2. * Retrieve the counter value for the key, but don't wait
  3. * for the operation to be completed
  4. *
  5. * @param key name of the key
  6. * @return the amount of the counter value for this key
  7. */
  8. CompletableFuture<Long> getCounterAsync(String key);

The application can use getCounterAsync to asynchronously retrieve the counter of a given key mutated by incrCounterAsync.

putState

  1. /**
  2. * Update the state value for the key.
  3. *
  4. * @param key name of the key
  5. * @param value state value of the key
  6. */
  7. void putState(String key, ByteBuffer value);

putStateAsync

  1. /**
  2. * Update the state value for the key, but don't wait for the operation to be completed
  3. *
  4. * @param key name of the key
  5. * @param value state value of the key
  6. */
  7. CompletableFuture<Void> putStateAsync(String key, ByteBuffer value);

The application can use putStateAsync to asynchronously update the state of a given key.

getState

  1. /**
  2. * Retrieve the state value for the key.
  3. *
  4. * @param key name of the key
  5. * @return the state value for the key.
  6. */
  7. ByteBuffer getState(String key);

getStateAsync

  1. /**
  2. * Retrieve the state value for the key, but don't wait for the operation to be completed
  3. *
  4. * @param key name of the key
  5. * @return the state value for the key.
  6. */
  7. CompletableFuture<ByteBuffer> getStateAsync(String key);

The application can use getStateAsync to asynchronously retrieve the state of a given key.

deleteState

  1. /**
  2. * Delete the state value for the key.
  3. *
  4. * @param key name of the key
  5. */

Counters and binary values share the same keyspace, so this deletes either type.

Currently Pulsar Functions expose the following APIs for mutating and accessing State. These APIs are available in the Context object when you are using Python SDK functions.

incr_counter

  1. def incr_counter(self, key, amount):
  2. ""incr the counter of a given key in the managed state""

Application can use incr_counter to change the counter of a given key by the given amount. If the key does not exist, a new key is created.

get_counter

  1. def get_counter(self, key):
  2. """get the counter of a given key in the managed state"""

Application can use get_counter to retrieve the counter of a given key mutated by incrCounter.

Except the counter API, Pulsar also exposes a general key/value API for functions to store general key/value state.

put_state

  1. def put_state(self, key, value):
  2. """update the value of a given key in the managed state"""

The key is a string, and the value is arbitrary binary data.

get_state

  1. def get_state(self, key):
  2. """get the value of a given key in the managed state"""

del_counter

  1. def del_counter(self, key):
  2. """delete the counter of a given key in the managed state"""

Counters and binary values share the same keyspace, so this deletes either type.

Query State

A Pulsar Function can use the State API for storing state into Pulsar’s state storage and retrieving state back from Pulsar’s state storage. Additionally Pulsar also provides CLI commands for querying its state.

  1. $ bin/pulsar-admin functions querystate \
  2. --tenant <tenant> \
  3. --namespace <namespace> \
  4. --name <function-name> \
  5. --state-storage-url <bookkeeper-service-url> \
  6. --key <state-key> \
  7. [---watch]

If --watch is specified, the CLI will watch the value of the provided state-key.

Example

  • Java
  • Python

WordCountFunction is a very good example demonstrating on how Application can easily store state in Pulsar Functions.

  1. import org.apache.pulsar.functions.api.Context;
  2. import org.apache.pulsar.functions.api.Function;
  3. import java.util.Arrays;
  4. public class WordCountFunction implements Function<String, Void> {
  5. @Override
  6. public Void process(String input, Context context) throws Exception {
  7. Arrays.asList(input.split("\\.")).forEach(word -> context.incrCounter(word, 1));
  8. return null;
  9. }
  10. }

The logic of this WordCount function is pretty simple and straightforward:

  1. The function first splits the received String into multiple words using regex \\..
  2. For each word, the function increments the corresponding counter by 1 (via incrCounter(key, amount)).
  1. from pulsar import Function
  2. class WordCount(Function):
  3. def process(self, item, context):
  4. for word in item.split():
  5. context.incr_counter(word, 1)

The logic of this WordCount function is pretty simple and straightforward:

  1. The function first splits the received string into multiple words on space.
  2. For each word, the function increments the corresponding counter by 1 (via incr_counter(key, amount)).