Pulsar SQL configuration and deployment
你可以配置 Presto Pulsar 连接器,并通过以下说明部署一个集群。
配置 Presto Pulsar 连接器
你可以在 ${project.root}/conf/presto/catalog/pulsar.properties
属性文件中配置 Presto Pulsar 连接器。 连接器和默认值的配置如下。
# 要在catalog中显示的连接器的名称
connector.name=pulsar
#Pulsar broker 服务URL
pulsar.broker-service-url=http://localhost:8080
# Zookeeper 集群URI
pulsar.zookeeper-uri=localhost:2181
# 一次读取的最小条目数
pulsar.entry-read-batch-size=100
# 每个查询使用的默认拆分数
pulsar.target-num-splits=4
你可以通过多个主机连接 Presto 到 Pulsar 集群。 要为 broker 配置多个主机, 添加多个 URL 到 pulsar.broker-service-url
。 要为 ZooKeeper 配置多个主机, 添加多个 URI 到 pulsar.zookeeper-uri
。 The following is an example.
pulsar.broker-service-url=http://localhost:8080,localhost:8081,localhost:8082
pulsar.zookeeper-uri=localhost1,localhost2:2181
从现有 Presto 集群查询数据
If you already have a Presto cluster, you can copy the Presto Pulsar connector plugin to your existing cluster. Download the archived plugin package with the following command.
$ wget https://archive.apache.org/dist/pulsar/pulsar-2.7.0/apache-pulsar-2.7.0-bin.tar.gz
部署一个新的分组
因为 Pulsar SQL 是由 Presto驱动的,部署的配置对 Pulsar SQL worker 是相同的。
Note
For how to set up a standalone single node environment, refer to Query data.
您可以使用相同的 CLI 参数作为 Presto 启动器。
$ ./bin/pulsar sql-worker --help
Usage: launcher [options] command
Commands: run, start, stop, restart, kill, status
Options:
-h, --help show this help message and exit
-v, --verbose Run verbosely
--etc-dir=DIR Defaults to INSTALL_PATH/etc
--launcher-config=FILE
Defaults to INSTALL_PATH/bin/launcher.properties
--node-config=FILE Defaults to ETC_DIR/node.properties
--jvm-config=FILE Defaults to ETC_DIR/jvm.config
--config=FILE Defaults to ETC_DIR/config.properties
--log-levels-file=FILE
Defaults to ETC_DIR/log.properties
--data-dir=DIR Defaults to INSTALL_PATH
--pid-file=FILE Defaults to DATA_DIR/var/run/launcher.pid
--launcher-log-file=FILE
Defaults to DATA_DIR/var/log/launcher.log (only in
daemon mode)
--server-log-file=FILE
Defaults to DATA_DIR/var/log/server.log (only in
daemon mode)
-D NAME=VALUE Set a Java system property
The default configuration for the cluster is located in ${project.root}/conf/presto
. You can customize your deployment by modifying the default configuration.
你可以设置该工作器从不同的配置目录读取,或者设置不同的目录来写入数据。
$ ./bin/pulsar sql-worker run --etc-dir /tmp/incubator-pulsar/conf/presto --data-dir /tmp/presto-1
你可以作为守护进程开始工作者。
$ ./bin sql-worker start
在多个节点上部署一个分组
You can deploy a Pulsar SQL cluster or Presto cluster on multiple nodes. The following example shows how to deploy a cluster on three-node cluster.
- 复制 Pulsar 二进制文件并分布到三个节点。
The first node runs as Presto coordinator. The minimal configuration requirement in the ${project.root}/conf/presto/config.properties
file is as follows.
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=<coordinator-url>
另两个节点作为 worker 节点,可以使用下面的配置:
coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=<coordinator-url>
修改
pulsar.broker-service-url
andpulsar.zoocheeper-uri
配置在${project.root}/conf/presto/catalog/pulsar.properties
相应地为三个节点配置文件。启动 Coordinator 节点。
$ ./bin/pulsar sql-worker run
- 启动 worker 节点。
$ ./bin/pulsar sql-worker run
- 启动 SQL CLI 并检查集群的状态。
$ ./bin/pulsar sql --server <coordinate_url>
- 检查节点的状态。
presto> SELECT * FROM system.runtime.nodes;
node_id | http_uri | node_version | coordinator | state
---------+-------------------------+--------------+-------------+--------
1 | http://192.168.2.1:8081 | testversion | true | active
3 | http://192.168.2.2:8081 | testversion | false | active
2 | http://192.168.2.3:8081 | testversion | false | active
关于Presto部署的更多信息,请参阅 Presto部署。
Note
The broker does not advance LAC, so when Pulsar SQL bypass broker to query data, it can only read entries up to the LAC that all the bookies learned. You can enable periodically write LAC on the broker by setting “bookkeeperExplicitLacIntervalInMills” in the broker.conf.