This tutorial provides a hands-on look at how you can move data out of Pulsar without writing a single line of code.

It is helpful to review the concepts for Pulsar I/O with running the steps in this guide to gain a deeper understanding.

At the end of this tutorial, you are able to:

Tip

  • These instructions assume you are running Pulsar in standalone mode. However, all the commands used in this tutorial can be used in a multi-nodes Pulsar cluster without any changes.

  • All the instructions are assumed to run at the root directory of a Pulsar binary distribution.

Install Pulsar and built-in connector

Before connecting Pulsar to a database, you need to install Pulsar and the desired built-in connector.

For more information about how to install a standalone Pulsar and built-in connectors, see here.

启动单机模式 Pulsar

  1. Start Pulsar locally.

    1. bin/pulsar standalone

    All the components of a Pulsar service are start in order.

    You can curl those pulsar service endpoints to make sure Pulsar service is up running correctly.

  2. Check Pulsar binary protocol port.

    1. telnet localhost 6650
  3. Check Pulsar Function cluster.

    1. curl -s http://localhost:8080/admin/v2/worker/cluster

    Example output

    1. [{"workerId":"c-standalone-fw-localhost-6750","workerHostname":"localhost","port":6750}]
  4. Make sure a public tenant and a default namespace exist.

    1. curl -s http://localhost:8080/admin/v2/namespaces/public

    Example output

    1. ["public/default","public/functions"]
  5. All built-in connectors should be listed as available.

    1. curl -s http://localhost:8080/admin/v2/functions/connectors

    Example outoupt

    1. [{"name":"aerospike","description":"Aerospike database sink","sinkClass":"org.apache.pulsar.io.aerospike.AerospikeStringSink"},{"name":"cassandra","description":"Writes data into Cassandra","sinkClass":"org.apache.pulsar.io.cassandra.CassandraStringSink"},{"name":"kafka","description":"Kafka source and sink connector","sourceClass":"org.apache.pulsar.io.kafka.KafkaStringSource","sinkClass":"org.apache.pulsar.io.kafka.KafkaBytesSink"},{"name":"kinesis","description":"Kinesis sink connector","sinkClass":"org.apache.pulsar.io.kinesis.KinesisSink"},{"name":"rabbitmq","description":"RabbitMQ source connector","sourceClass":"org.apache.pulsar.io.rabbitmq.RabbitMQSource"},{"name":"twitter","description":"Ingest data from Twitter firehose","sourceClass":"org.apache.pulsar.io.twitter.TwitterFireHose"}]

    If an error occurs when starting Pulsar service, you may see an exception at the terminal running pulsar/standalone, or you can navigate to the logs directory under the Pulsar directory to view the logs.

Connect Pulsar to Cassandra

This section demonstrates how to connector Pulsar to Cassandra.

Tip

  • Make sure you have Docker installed. If you do not have one, see install Docker.

  • The Cassandra sink connector reads messages from Pulsar topics and writes the messages into Cassandra tables. For more information, see Cassandra sink connector.

Setup a Cassandra cluster

This example uses cassandra Docker image to start a single-node Cassandra cluster in Docker.

  1. Start a Cassandra cluster.

    1. docker run -d --rm --name=cassandra -p 9042:9042 cassandra

    Note

    Before moving to the next steps, make sure the Cassandra cluster is running.

  2. Make sure the Docker process is running.

    1. docker ps
  3. Check the Cassandra logs to make sure the Cassandra process is running as expected.

    1. docker logs cassandra
  4. Check the status of the Cassandra cluster.

    1. docker exec cassandra nodetool status

    Example output

    1. Datacenter: datacenter1
    2. =======================
    3. Status=Up/Down
    4. |/ State=Normal/Leaving/Joining/Moving
    5. -- Address Load Tokens Owns (effective) Host ID Rack
    6. UN 172.17.0.2 103.67 KiB 256 100.0% af0e4b2f-84e0-4f0b-bb14-bd5f9070ff26 rack1
  5. Use cqlsh to connect to the Cassandra cluster.

    1. $ docker exec -ti cassandra cqlsh localhost
    2. Connected to Test Cluster at localhost:9042.
    3. [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
    4. Use HELP for help.
    5. cqlsh>
  6. Create a keyspace pulsar_test_keyspace.

    1. cqlsh> CREATE KEYSPACE pulsar_test_keyspace WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};
  7. Create a table pulsar_test_table.

    1. cqlsh> USE pulsar_test_keyspace;
    2. cqlsh:pulsar_test_keyspace> CREATE TABLE pulsar_test_table (key text PRIMARY KEY, col text);

Configure a Cassandra sink

Now that we have a Cassandra cluster running locally.

In this section, you need to configure a Cassandra sink connector.

To run a Cassandra sink connector, you need to prepare a configuration file including the information that Pulsar connector runtime needs to know.

For example, how Pulsar connector can find the Cassandra cluster, what is the keyspace and the table that Pulsar connector uses for writing Pulsar messages to, and so on.

You can create a configuration file through one of the following methods.

  • JSON

    1. {
    2. "roots": "localhost:9042",
    3. "keyspace": "pulsar_test_keyspace",
    4. "columnFamily": "pulsar_test_table",
    5. "keyname": "key",
    6. "columnName": "col"
    7. }
  • YAML

    1. configs:
    2. roots: "localhost:9042"
    3. keyspace: "pulsar_test_keyspace"
    4. columnFamily: "pulsar_test_table"
    5. keyname: "key"
    6. columnName: "col"

For more information, see Cassandra sink connector.

Create a Cassandra sink

You can use the Connector Admin CLI to create a sink connector and perform other operations on them.

Run the following command to create a Cassandra sink connector with sink type cassandra and the config file examples/cassandra-sink.yml created previously.

Note

The sink-type parameter of the currently built-in connectors is determined by the setting of the name parameter specified in the pulsar-io.yaml file.

  1. bin/pulsar-admin sinks create \
  2. --tenant public \
  3. --namespace default \
  4. --name cassandra-test-sink \
  5. --sink-type cassandra \
  6. --sink-config-file examples/cassandra-sink.yml \
  7. --inputs test_cassandra

Once the command is executed, Pulsar creates the sink connector cassandra-test-sink.

This sink connector runs as a Pulsar Function and writes the messages produced in the topic test_cassandra to the Cassandra table pulsar_test_table.

Inspect a Cassandra sink

You can use the Connector Admin CLI to monitor a connector and perform other operations on it.

  • Get the information of a Cassandra sink.

    1. bin/pulsar-admin sinks get \
    2. --tenant public \
    3. --namespace default \
    4. --name cassandra-test-sink

    Example output

    1. {
    2. "tenant": "public",
    3. "namespace": "default",
    4. "name": "cassandra-test-sink",
    5. "className": "org.apache.pulsar.io.cassandra.CassandraStringSink",
    6. "inputSpecs": {
    7. "test_cassandra": {
    8. "isRegexPattern": false
    9. }
    10. },
    11. "configs": {
    12. "roots": "localhost:9042",
    13. "keyspace": "pulsar_test_keyspace",
    14. "columnFamily": "pulsar_test_table",
    15. "keyname": "key",
    16. "columnName": "col"
    17. },
    18. "parallelism": 1,
    19. "processingGuarantees": "ATLEAST_ONCE",
    20. "retainOrdering": false,
    21. "autoAck": true,
    22. "archive": "builtin://cassandra"
    23. }
  • Check the status of a Cassandra sink.

    1. bin/pulsar-admin sinks status \
    2. --tenant public \
    3. --namespace default \
    4. --name cassandra-test-sink

    Example output

    1. {
    2. "numInstances" : 1,
    3. "numRunning" : 1,
    4. "instances" : [ {
    5. "instanceId" : 0,
    6. "status" : {
    7. "running" : true,
    8. "error" : "",
    9. "numRestarts" : 0,
    10. "numReadFromPulsar" : 0,
    11. "numSystemExceptions" : 0,
    12. "latestSystemExceptions" : [ ],
    13. "numSinkExceptions" : 0,
    14. "latestSinkExceptions" : [ ],
    15. "numWrittenToSink" : 0,
    16. "lastReceivedTime" : 0,
    17. "workerId" : "c-standalone-fw-localhost-8080"
    18. }
    19. } ]
    20. }

Verify a Cassandra sink

  1. Produce some messages to the input topic of the Cassandra sink test_cassandra.

    1. for i in {0..9}; do bin/pulsar-client produce -m "key-$i" -n 1 test_cassandra; done
  2. Inspect the status of the Cassandra sink test_cassandra.

    1. bin/pulsar-admin sinks status \
    2. --tenant public \
    3. --namespace default \
    4. --name cassandra-test-sink

    You can see 10 messages are processed by the Cassandra sink test_cassandra.

    Example output

    1. {
    2. "numInstances" : 1,
    3. "numRunning" : 1,
    4. "instances" : [ {
    5. "instanceId" : 0,
    6. "status" : {
    7. "running" : true,
    8. "error" : "",
    9. "numRestarts" : 0,
    10. "numReadFromPulsar" : 10,
    11. "numSystemExceptions" : 0,
    12. "latestSystemExceptions" : [ ],
    13. "numSinkExceptions" : 0,
    14. "latestSinkExceptions" : [ ],
    15. "numWrittenToSink" : 10,
    16. "lastReceivedTime" : 1551685489136,
    17. "workerId" : "c-standalone-fw-localhost-8080"
    18. }
    19. } ]
    20. }
  3. Use cqlsh to connect to the Cassandra cluster.

    1. docker exec -ti cassandra cqlsh localhost
  4. Check the data of the Cassandra table pulsar_test_table.

    1. cqlsh> use pulsar_test_keyspace;
    2. cqlsh:pulsar_test_keyspace> select * from pulsar_test_table;
    3. key | col
    4. --------+--------
    5. key-5 | key-5
    6. key-0 | key-0
    7. key-9 | key-9
    8. key-2 | key-2
    9. key-1 | key-1
    10. key-3 | key-3
    11. key-6 | key-6
    12. key-7 | key-7
    13. key-4 | key-4
    14. key-8 | key-8

Delete a Cassandra Sink

You can use the Connector Admin CLI to delete a connector and perform other operations on it.

  1. bin/pulsar-admin sinks delete \
  2. --tenant public \
  3. --namespace default \
  4. --name cassandra-test-sink

Connect Pulsar to MySQL

This section demonstrates how to connector Pulsar to MySQL.

Tip

  • Make sure you have Docker installed. If you do not have one, see install Docker.

  • The JDBC sink connector pulls messages from Pulsar topics and persists the messages to MySQL or SQlite. For more information, see JDBC sink connector.

Setup a MySQL cluster

This example uses the MySQL 5.7 docker image to start a single-node MySQL cluster in Docker.

  1. Pull the MySQL 5.7 image from Docker.

    1. $ docker pull mysql:5.7
  2. Start MySQL.

    1. $ docker run -d -it --rm \
    2. --name pulsar-mysql \
    3. -p 3306:3306 \
    4. -e MYSQL_ROOT_PASSWORD=jdbc \
    5. -e MYSQL_USER=mysqluser \
    6. -e MYSQL_PASSWORD=mysqlpw \
    7. mysql:5.7

    Tip

    标记DescriptionThis example
    -dTo start a container in detached mode./
    -itKeep STDIN open even if not attached and allocate a terminal./
    —rmRemove the container automatically when it exits./
    -nameAssign a name to the container.This example specifies pulsar-mysql for the container.
    -pPublish the port of the container to the host.This example publishes the port 3306 of the container to the host.
    -eSet environment variables.This example sets the following variables:
    - The password for the root user is jdbc.
    - The name for the normal user is mysqluser.
    - The password for the normal user is mysqlpw.

    Tip

    For more information about Docker commands, see Docker CLI.

  3. Check if MySQL has been started successfully.

    1. $ docker logs -f pulsar-mysql

    MySQL has been started successfully if the following message appears.

    1. 2019-05-11T10:40:58.709964Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
    2. 2019-05-11T10:40:58.710155Z 0 [Warning] CA certificate ca.pem is self signed.
    3. 2019-05-11T10:40:58.711921Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
    4. 2019-05-11T10:40:58.711985Z 0 [Note] IPv6 is available.
    5. 2019-05-11T10:40:58.712695Z 0 [Note] - '::' resolves to '::';
    6. 2019-05-11T10:40:58.712742Z 0 [Note] Server socket created on IP: '::'.
    7. 2019-05-11T10:40:58.714334Z 0 [Warning] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
    8. 2019-05-11T10:40:58.723802Z 0 [Note] Event Scheduler: Loaded 0 events
    9. 2019-05-11T10:40:58.724200Z 0 [Note] mysqld: ready for connections.
    10. Version: '5.7.26' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server (GPL)
  4. Access to MySQL.

    1. $ docker exec -it pulsar-mysql /bin/bash
    2. mysql -h localhost -uroot -pjdbc
  5. Create a MySQL table pulsar_mysql_jdbc_sink.

    1. $ create database pulsar_mysql_jdbc_sink;
    2. $ use pulsar_mysql_jdbc_sink;
    3. $ create table if not exists pulsar_mysql_jdbc_sink
    4. (
    5. id INT AUTO_INCREMENT,
    6. name VARCHAR(255) NOT NULL,
    7. primary key (id)
    8. )
    9. engine=innodb;

Configure a JDBC sink

Now that we have a MySQL running locally.

In this section, you need to configure a JDBC sink connector.

  1. Add a configuration file.

    To run a JDBC sink connector, you need to prepare a YAML configuration file including the information that Pulsar connector runtime needs to know.

    For example, how Pulsar connector can find the MySQL cluster, what is the JDBC URL and the table that Pulsar connector uses for writing messages to.

    Create a pulsar-mysql-jdbc-sink.yaml file, copy the following contents to this file, and place the file in the pulsar/connectors folder.

    1. configs:
    2. userName: "root"
    3. password: "jdbc"
    4. jdbcUrl: "jdbc:mysql://127.0.0.1:3306/pulsar_mysql_jdbc_sink"
    5. tableName: "pulsar_mysql_jdbc_sink"
  2. Create a schema.

    Create a avro-schema file, copy the following contents to this file, and place the file in the pulsar/connectors folder.

    1. {
    2. "type": "AVRO",
    3. "schema": "{\"type\":\"record\",\"name\":\"Test\",\"fields\":[{\"name\":\"id\",\"type\":[\"null\",\"int\"]},{\"name\":\"name\",\"type\":[\"null\",\"string\"]}]}",
    4. "properties": {}
    5. }

    Tip

    For more information about AVRO, see Apache Avro.

  1. Upload a schema to a topic.

    This example uploads the avro-schema schema to the pulsar-mysql-jdbc-sink-topic topic.

    1. $ bin/pulsar-admin schemas upload pulsar-mysql-jdbc-sink-topic -f ./connectors/avro-schema
  2. Check if the schema has been uploaded successfully.

    1. $ bin/pulsar-admin schemas get pulsar-mysql-jdbc-sink-topic

    The schema has been uploaded successfully if the following message appears.

    1. {"name":"pulsar-mysql-jdbc-sink-topic","schema":"{\"type\":\"record\",\"name\":\"Test\",\"fields\":[{\"name\":\"id\",\"type\":[\"null\",\"int\"]},{\"name\":\"name\",\"type\":[\"null\",\"string\"]}]}","type":"AVRO","properties":{}}

Create a JDBC sink

You can use the Connector Admin CLI to create a sink connector and perform other operations on it.

This example creates a sink connector and specifies the desired information.

  1. $ bin/pulsar-admin sinks create \
  2. --archive ./connectors/pulsar-io-jdbc-2.6.1.nar \
  3. --inputs pulsar-mysql-jdbc-sink-topic \
  4. --name pulsar-mysql-jdbc-sink \
  5. --sink-config-file ./connectors/pulsar-mysql-jdbc-sink.yaml \
  6. --parallelism 1

Once the command is executed, Pulsar creates a sink connector pulsar-mysql-jdbc-sink.

This sink connector runs as a Pulsar Function and writes the messages produced in the topic pulsar-mysql-jdbc-sink-topic to the MySQL table pulsar_mysql_jdbc_sink.

Tip

标记DescriptionThis example
—archiveThe path to the archive file for the sink.pulsar-io-jdbc-2.6.1.nar
—inputsThe input topic(s) of the sink.

Multiple topics can be specified as a comma-separated list.
—nameThe name of the sink.pulsar-mysql-jdbc-sink
—sink-config-fileThe path to a YAML config file specifying the configuration of the sink.pulsar-mysql-jdbc-sink.yaml
—parallelismThe parallelism factor of the sink.

For example, the number of sink instances to run.
1

Tip

For more information about pulsar-admin sinks create options, see here.

The sink has been created successfully if the following message appears.

  1. "Created successfully"

Inspect a JDBC sink

You can use the Connector Admin CLI to monitor a connector and perform other operations on it.

  • List all running JDBC sink(s).

    1. $ bin/pulsar-admin sinks list \
    2. --tenant public \
    3. --namespace default

    Tip

    For more information about pulsar-admin sinks list options, see here.

    The result shows that only the mysql-jdbc-sink sink is running.

    1. [
    2. "pulsar-mysql-jdbc-sink"
    3. ]
  • Get the information of a JDBC sink.

    1. $ bin/pulsar-admin sinks get \
    2. --tenant public \
    3. --namespace default \
    4. --name pulsar-mysql-jdbc-sink

    Tip

    For more information about pulsar-admin sinks get options, see here.

    The result shows the information of the sink connector, including tenant, namespace, topic and so on.

    1. {
    2. "tenant": "public",
    3. "namespace": "default",
    4. "name": "pulsar-mysql-jdbc-sink",
    5. "className": "org.apache.pulsar.io.jdbc.JdbcAutoSchemaSink",
    6. "inputSpecs": {
    7. "pulsar-mysql-jdbc-sink-topic": {
    8. "isRegexPattern": false
    9. }
    10. },
    11. "configs": {
    12. "password": "jdbc",
    13. "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/pulsar_mysql_jdbc_sink",
    14. "userName": "root",
    15. "tableName": "pulsar_mysql_jdbc_sink"
    16. },
    17. "parallelism": 1,
    18. "processingGuarantees": "ATLEAST_ONCE",
    19. "retainOrdering": false,
    20. "autoAck": true
    21. }
  • Get the status of a JDBC sink

    1. $ bin/pulsar-admin sinks status \
    2. --tenant public \
    3. --namespace default \
    4. --name pulsar-mysql-jdbc-sink

    Tip

    For more information about pulsar-admin sinks status options, see here.

    The result shows the current status of sink connector, including the number of instance, running status, worker ID and so on.

    1. {
    2. "numInstances" : 1,
    3. "numRunning" : 1,
    4. "instances" : [ {
    5. "instanceId" : 0,
    6. "status" : {
    7. "running" : true,
    8. "error" : "",
    9. "numRestarts" : 0,
    10. "numReadFromPulsar" : 0,
    11. "numSystemExceptions" : 0,
    12. "latestSystemExceptions" : [ ],
    13. "numSinkExceptions" : 0,
    14. "latestSinkExceptions" : [ ],
    15. "numWrittenToSink" : 0,
    16. "lastReceivedTime" : 0,
    17. "workerId" : "c-standalone-fw-192.168.2.52-8080"
    18. }
    19. } ]
    20. }

Stop a JDBC sink

You can use the Connector Admin CLI to stop a connector and perform other operations on it.

  1. $ bin/pulsar-admin sinks stop \
  2. --tenant public \
  3. --namespace default \
  4. --name pulsar-mysql-jdbc-sink \
  5. --instance-id 0

Tip

For more information about pulsar-admin sinks stop options, see here.

The sink instance has been stopped successfully if the following message disappears.

  1. "Stopped successfully"

Restart a JDBC sink

You can use the Connector Admin CLI to restart a connector and perform other operations on it.

  1. $ bin/pulsar-admin sinks restart \
  2. --tenant public \
  3. --namespace default \
  4. --name pulsar-mysql-jdbc-sink \
  5. --instance-id 0

Tip

For more information about pulsar-admin sinks restart options, see here.

The sink instance has been started successfully if the following message disappears.

  1. "Started successfully"

Tip

  • Optionally, you can run a standalone sink connector using pulsar-admin sinks localrun options.

    Note that pulsar-admin sinks localrun options runs a sink connector locally, while pulsar-admin sinks start options starts a sink connector in a cluster.

  • For more information about pulsar-admin sinks localrun options, see here.

Update a JDBC sink

You can use the Connector Admin CLI to update a connector and perform other operations on it.

This example updates the parallelism of the pulsar-mysql-jdbc-sink sink connector to 2.

  1. $ bin/pulsar-admin sinks update \
  2. --name pulsar-mysql-jdbc-sink \
  3. --parallelism 2

Tip

For more information about pulsar-admin sinks update options, see here.

The sink connector has been updated successfully if the following message disappears.

  1. "Updated successfully"

This example double-checks the information.

  1. $ bin/pulsar-admin sinks get \
  2. --tenant public \
  3. --namespace default \
  4. --name pulsar-mysql-jdbc-sink

The result shows that the parallelism is 2.

  1. {
  2. "tenant": "public",
  3. "namespace": "default",
  4. "name": "pulsar-mysql-jdbc-sink",
  5. "className": "org.apache.pulsar.io.jdbc.JdbcAutoSchemaSink",
  6. "inputSpecs": {
  7. "pulsar-mysql-jdbc-sink-topic": {
  8. "isRegexPattern": false
  9. }
  10. },
  11. "configs": {
  12. "password": "jdbc",
  13. "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/pulsar_mysql_jdbc_sink",
  14. "userName": "root",
  15. "tableName": "pulsar_mysql_jdbc_sink"
  16. },
  17. "parallelism": 2,
  18. "processingGuarantees": "ATLEAST_ONCE",
  19. "retainOrdering": false,
  20. "autoAck": true
  21. }

Delete a JDBC sink

You can use the Connector Admin CLI to delete a connector and perform other operations on it.

This example deletes the pulsar-mysql-jdbc-sink sink connector.

  1. $ bin/pulsar-admin sinks delete \
  2. --tenant public \
  3. --namespace default \
  4. --name pulsar-mysql-jdbc-sink

Tip

For more information about pulsar-admin sinks delete options, see here.

The sink connector has been deleted successfully if the following message appears.

  1. "Deleted successfully"

This example double-checks the status of the sink connector.

  1. $ bin/pulsar-admin sinks get \
  2. --tenant public \
  3. --namespace default \
  4. --name pulsar-mysql-jdbc-sink

The results shows that the sink connector does not exist.

  1. HTTP 404 Not Found
  2. Reason: Sink pulsar-mysql-jdbc-sink doesn't exist