提示
除了最极端的情况,对于绝大多数的用例来说,单机群安装的Pulsar就能够满足要求了。 如果是创业公司或一个团队想体验下Pulsar,我们推荐使用单集群。 有关单群集的部署说明,请参阅此处。
要在Pulsar部署中使用所有内置的 Pulsar IO 连接器,下载
apache-pulsar-io-connectors
安装包,并确保将其安装到每个 broker 节点的 pulsar 目录下的connectors
目录中;如果 Pulsar Functions 运行在独立的 function worker 集群中,则将其安装到每个 function-worker节点下的pulsar 文件目录中。如果您在Pulsar的部署中想使用分层存储功能,您需要下载
apache-pulsar-offloaders
安装包,并确保把安装包解压到所有 broker 的 pulsar 文件目录下的offloaders
文件目录中 。 有关如何配置此功能的更多详细信息,可以参考分层存储手册。
A Pulsar instance consists of multiple Pulsar clusters working in unison. Clusters can be distributed across data centers or geographical regions and can replicate amongst themselves using geo-replication. Deploying a multi-cluster Pulsar instance involves the following basic steps:
- Deploying two separate ZooKeeper quorums: a local quorum for each cluster in the instance and a configuration store quorum for instance-wide tasks
- Initializing cluster metadata for each cluster
- Deploying a BookKeeper cluster of bookies in each Pulsar cluster
- Deploying brokers in each Pulsar cluster
If you’re deploying a single Pulsar cluster, see the Clusters and Brokers guide.
Running Pulsar locally or on Kubernetes?
This guide shows you how to deploy Pulsar in production in a non-Kubernetes. If you’d like to run a standalone Pulsar cluster on a single machine for development purposes, see the Setting up a local cluster guide. If you’re looking to run Pulsar on Kubernetes, see the Pulsar on Kubernetes guide, which includes sections on running Pulsar on Kubernetes on Google Kubernetes Engine and on Amazon Web Services.
System requirement
Pulsar当前在MacOS 和 Linux下可用. 如果要使用Pulsar,你需要先安装Java 8.
安装 Pulsar
开始运行Pulsar之前,请先用下列几种方式下载二进制包:
$ wget 'https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=pulsar/pulsar-2.6.1/apache-pulsar-2.6.1-bin.tar.gz' -O apache-pulsar-2.6.1-bin.tar.gz
Once the tarball is downloaded, untar it and cd
into the resulting directory:
$ tar xvfz apache-pulsar-2.6.1-bin.tar.gz
$ cd apache-pulsar-2.6.1
软件包包含的内容:
Pulsar 二进制包包含下列目录:
目录 | 内容 |
---|---|
bin | Pulsar的命令行工具,如 pulsar 和 pulsar-admin |
conf | Pulsar的配置文件,包含broker配置,ZooKeeper 配置 等等 |
examples | 一个Java JAR包,包含 Pulsar Functions的例子 |
lib | Pulsar用到的JAR 包 |
licenses | License files, in .txt form, for various components of the Pulsar codebase |
这些目录将在你运行Pulsar的时候生成:
目录 | 内容 |
---|---|
data | Zookeeper和BookKeeper使用的数据存储目录 |
instances | 为Pulsar Functions创建的组件 |
logs | 安装时创建的日志 |
Deploying ZooKeeper
每个 Pulsar 实例需要两个独立的 ZooKeeper 集群。
- Local ZooKeeper operates at the cluster level and provides cluster-specific configuration management and coordination. 每个 Pulsar 集群都需要一个专用的 ZooKeeper 集群。
- Configuration Store operates at the instance level and provides configuration management for the entire system (and thus across clusters). The configuration store quorum can be provided by an independent cluster of machines or by the same machines used by local ZooKeeper.
Deploying local ZooKeeper
ZooKeeper manages a variety of essential coordination- and configuration-related tasks for Pulsar.
Deploying a Pulsar instance requires you to stand up one local ZooKeeper cluster per Pulsar cluster.
首先,将所有 ZooKeeper 服务器添加到 conf/zookeeper.conf
指定的 quorum 配置中。 在配置文件中为每个节点添加一个 server.N
行,其中 N
是 ZooKeeper 节点的编号。 Here’s an example for a three-node cluster:
server.1=zk1.us-west.example.com:2888:3888
server.2=zk2.us-west.example.com:2888:3888
server.3=zk3.us-west.example.com:2888:3888
On each host, you need to specify the ID of the node in each node’s myid
file, which is in each server’s data/zookeeper
folder by default (this can be changed via the dataDir
parameter).
See the Multi-server setup guide in the ZooKeeper documentation for detailed info on
myid
and more.
On a ZooKeeper server at zk1.us-west.example.com
, for example, you could set the myid
value like this:
$ mkdir -p data/zookeeper
$ echo 1 > data/zookeeper/myid
On zk2.us-west.example.com
the command would be echo 2 > data/zookeeper/myid
and so on.
Once each server has been added to the zookeeper.conf
configuration and has the appropriate myid
entry, you can start ZooKeeper on all hosts (in the background, using nohup) with the pulsar-daemon
CLI tool:
$ bin/pulsar-daemon start zookeeper
Deploying the configuration store
The ZooKeeper cluster configured and started up in the section above is a local ZooKeeper cluster used to manage a single Pulsar cluster. 但是,除了本地集群之外,一个完整的 Pulsar 实例还需要 configuration store来处理一些实例级配置和协调任务。
If you’re deploying a single-cluster instance, then you will not need a separate cluster for the configuration store. If, however, you’re deploying a multi-cluster instance, then you should stand up a separate ZooKeeper cluster for configuration tasks.
单集群 Pulsar 实例
If your Pulsar instance will consist of just one cluster, then you can deploy a configuration store on the same machines as the local ZooKeeper quorum but running on different TCP ports.
To deploy a ZooKeeper configuration store in a single-cluster instance, add the same ZooKeeper servers used by the local quorom to the configuration file in conf/global_zookeeper.conf
using the same method for local ZooKeeper, but make sure to use a different port (2181 is the default for ZooKeeper). Here’s an example that uses port 2184 for a three-node ZooKeeper cluster:
clientPort=2184
server.1=zk1.us-west.example.com:2185:2186
server.2=zk2.us-west.example.com:2185:2186
server.3=zk3.us-west.example.com:2185:2186
As before, create the myid
files for each server on data/global-zookeeper/myid
.
Multi-cluster Pulsar instance
When deploying a global Pulsar instance, with clusters distributed across different geographical regions, the configuration store serves as a highly available and strongly consistent metadata store that can tolerate failures and partitions spanning whole regions.
The key here is to make sure the ZK quorum members are spread across at least 3 regions and that other regions are running as observers.
Again, given the very low expected load on the configuration store servers, we can share the same hosts used for the local ZooKeeper quorum.
For example, let’s assume a Pulsar instance with the following clusters us-west
, us-east
, us-central
, eu-central
, ap-south
. Also let’s assume, each cluster will have its own local ZK servers named such as
zk[1-3].${CLUSTER}.example.com
In this scenario we want to pick the quorum participants from few clusters and let all the others be ZK observers. For example, to form a 7 servers quorum, we can pick 3 servers from us-west
, 2 from us-central
and 2 from us-east
.
This will guarantee that writes to configuration store will be possible even if one of these regions is unreachable.
The ZK configuration in all the servers will look like:
clientPort=2184
server.1=zk1.us-west.example.com:2185:2186
server.2=zk2.us-west.example.com:2185:2186
server.3=zk3.us-west.example.com:2185:2186
server.4=zk1.us-central.example.com:2185:2186
server.5=zk2.us-central.example.com:2185:2186
server.6=zk3.us-central.example.com:2185:2186:observer
server.7=zk1.us-east.example.com:2185:2186
server.8=zk2.us-east.example.com:2185:2186
server.9=zk3.us-east.example.com:2185:2186:observer
server.10=zk1.eu-central.example.com:2185:2186:observer
server.11=zk2.eu-central.example.com:2185:2186:observer
server.12=zk3.eu-central.example.com:2185:2186:observer
server.13=zk1.ap-south.example.com:2185:2186:observer
server.14=zk2.ap-south.example.com:2185:2186:observer
server.15=zk3.ap-south.example.com:2185:2186:observer
Additionally, ZK observers will need to have:
peerType=observer
Starting the service
Once your configuration store configuration is in place, you can start up the service using pulsar-daemon
$ bin/pulsar-daemon start configuration-store
Cluster metadata initialization
Once you’ve set up the cluster-specific ZooKeeper and configuration store quorums for your instance, there is some metadata that needs to be written to ZooKeeper for each cluster in your instance. It only needs to be written once.
You can initialize this metadata using the initialize-cluster-metadata
command of the pulsar
CLI tool. Here’s an example:
$ bin/pulsar initialize-cluster-metadata \
--cluster us-west \
--zookeeper zk1.us-west.example.com:2181 \
--configuration-store zk1.us-west.example.com:2184 \
--web-service-url http://pulsar.us-west.example.com:8080/ \
--web-service-url-tls https://pulsar.us-west.example.com:8443/ \
--broker-service-url pulsar://pulsar.us-west.example.com:6650/ \
--broker-service-url-tls pulsar+ssl://pulsar.us-west.example.com:6651/
As you can see from the example above, the following needs to be specified:
- 集群名称
- 集群与本地 ZooKeeper 连接的字符串
- The configuration store connection string for the entire instance
- 集群 web 服务的 URL
- Broker 服务的 URL,用于启动集群中 broker 之间的交互
如果使用 TLS ,那么您还需要为群集指定TLS Web服务URL,为群集中的各个 broker 指定 TLS broker服务URL。
Make sure to run initialize-cluster-metadata
for each cluster in your instance.
Deploying BookKeeper
BookKeeper provides persistent message storage for Pulsar.
Each Pulsar broker needs to have its own cluster of bookies. The BookKeeper cluster shares a local ZooKeeper quorum with the Pulsar cluster.
Configuring bookies
BookKeeper bookies can be configured using the conf/bookkeeper.conf
configuration file. The most important aspect of configuring each bookie is ensuring that the zkServers
parameter is set to the connection string for the Pulsar cluster’s local ZooKeeper.
Starting up bookies
You can start up a bookie in two ways: in the foreground or as a background daemon.
To start up a bookie in the foreground, use the bookeeper
$ bin/pulsar-daemon start bookie
You can verify that the bookie is working properly using the bookiesanity
command for the BookKeeper shell:
$ bin/bookkeeper shell bookiesanity
This will create a new ledger on the local bookie, write a few entries, read them back and finally delete the ledger.
硬件条件
Bookie hosts are responsible for storing message data on disk. In order for bookies to provide optimal performance, it’s essential that they have a suitable hardware configuration. There are two key dimensions to bookie hardware capacity:
- Disk I/O capacity read/write
- Storage capacity
Message entries written to bookies are always synced to disk before returning an acknowledgement to the Pulsar broker. To ensure low write latency, BookKeeper is designed to use multiple devices:
- A journal to ensure durability. For sequential writes, it’s critical to have fast fsync operations on bookie hosts. Typically, small and fast solid-state drives (SSDs) should suffice, or hard disk drives (HDDs) with a RAIDs controller and a battery-backed write cache. Both solutions can reach fsync latency of ~0.4 ms.
- A ledger storage device is where data is stored until all consumers have acknowledged the message. Writes will happen in the background, so write I/O is not a big concern. Reads will happen sequentially most of the time and the backlog is drained only in case of consumer drain. To store large amounts of data, a typical configuration will involve multiple HDDs with a RAID controller.
Deploying brokers
Once you’ve set up ZooKeeper, initialized cluster metadata, and spun up BookKeeper bookies, you can deploy brokers.
Broker configuration
Brokers can be configured using the conf/broker.conf
configuration file.
The most important element of broker configuration is ensuring that each broker is aware of its local ZooKeeper quorum as well as the configuration store quorum. Make sure that you set the zookeeperServers
parameter to reflect the local quorum and the configurationStoreServers
parameter to reflect the configuration store quorum (although you’ll need to specify only those ZooKeeper servers located in the same cluster).
You also need to specify the name of the cluster to which the broker belongs using the clusterName
parameter. In addition, you need to match the broker and web service ports provided when initializing the cluster’s metadata (especially when using a different port from default).
这是一个示例配置:
# Local ZooKeeper servers
zookeeperServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181
# Configuration store quorum connection string.
configurationStoreServers=zk1.us-west.example.com:2184,zk2.us-west.example.com:2184,zk3.us-west.example.com:2184
clusterName=us-west
# Broker data port
brokerServicePort=6650
# Broker data port for TLS
brokerServicePortTls=6651
# Port to use to server HTTP request
webServicePort=8080
# Port to use to server HTTPS request
webServicePortTls=8443
Broker hardware
Pulsar brokers do not require any special hardware since they don’t use the local disk. Fast CPUs and 10Gbps NIC are recommended since the software can take full advantage of that.
Starting the broker service
You can start a broker in the background using nohup with the pulsar-daemon
CLI tool:
$ bin/pulsar-daemon start broker
You can also start brokers in the foreground using pulsar broker
:
$ bin/pulsar broker
Service discovery
Clients connecting to Pulsar brokers need to be able to communicate with an entire Pulsar instance using a single URL. Pulsar provides a built-in service discovery mechanism that you can set up using the instructions immediately below.
You can also use your own service discovery system if you’d like. If you use your own system, there is just one requirement: when a client performs an HTTP request to an endpoint for a Pulsar cluster, such as http://pulsar.us-west.example.com:8080
, the client needs to be redirected to some active broker in the desired cluster, whether via DNS, an HTTP or IP redirect, or some other means.
Service discovery already provided by many scheduling systems
Many large-scale deployment systems, such as Kubernetes, have service discovery systems built in. If you’re running Pulsar on such a system, you may not need to provide your own service discovery mechanism.
Service discovery setup
The service discovery mechanism included with Pulsar maintains a list of active brokers, stored in ZooKeeper, and supports lookup using HTTP and also Pulsar’s binary protocol.
To get started setting up Pulsar’s built-in service discovery, you need to change a few parameters in the conf/discovery.conf
configuration file. Set the zookeeperServers
parameter to the cluster’s ZooKeeper quorum connection string and the configurationStoreServers
setting to the configuration store quorum connection string.
# Zookeeper quorum connection string
zookeeperServers=zk1.us-west.example.com:2181,zk2.us-west.example.com:2181,zk3.us-west.example.com:2181
# Global configuration store connection string
configurationStoreServers=zk1.us-west.example.com:2184,zk2.us-west.example.com:2184,zk3.us-west.example.com:2184
To start the discovery service:
$ bin/pulsar-daemon start discovery
Admin client and verification
At this point your Pulsar instance should be ready to use. You can now configure client machines that can serve as administrative clients for each cluster. You can use the conf/client.conf
configuration file to configure admin clients.
The most important thing is that you point the serviceUrl
parameter to the correct service URL for the cluster:
serviceUrl=http://pulsar.us-west.example.com:8080/
Provisioning new tenants
Pulsar was built as a fundamentally multi-tenant system.
To allow a new tenant to use the system, we need to create a new one. You can create a new tenant using the pulsar-admin
CLI tool:
$ bin/pulsar-admin tenants create test-tenant \
--allowed-clusters us-west \
--admin-roles test-admin-role
This will allow users who identify with role test-admin-role
to administer the configuration for the tenant test
which will only be allowed to use the cluster us-west
. From now on, this tenant will be able to self-manage its resources.
Once a tenant has been created, you will need to create namespaces for topics within that tenant.
The first step is to create a namespace. A namespace is an administrative unit that can contain many topics. A common practice is to create a namespace for each different use case from a single tenant.
$ bin/pulsar-admin namespaces create test-tenant/ns1
Testing producer and consumer
Everything is now ready to send and receive messages. The quickest way to test the system is through the pulsar-perf
client tool.
Let’s use a topic in the namespace we just created. Topics are automatically created the first time a producer or a consumer tries to use them.
The topic name in this case could be:
persistent://test-tenant/ns1/my-topic
Start a consumer that will create a subscription on the topic and will wait for messages:
$ bin/pulsar-perf consume persistent://test-tenant/us-west/ns1/my-topic
Start a producer that publishes messages at a fixed rate and report stats every 10 seconds:
$ bin/pulsar-perf produce persistent://test-tenant/us-west/ns1/my-topic
To report the topic stats:
$ bin/pulsar-admin persistent stats persistent://test-tenant/us-west/ns1/my-topic