TBase源码编译安装

  • 创建tbase用户

注意:所有需要安装TBase集群的机器上都需要创建

  1. mkdir /data
  2. useradd -d /data/tbase tbase
  • 源码获取

git clone https://github.com/Tencent/TBase

  • 源码编译
  1. cd ${SOURCECODE_PATH}
  2. rm -rf ${INSTALL_PATH}/tbase_bin_v2.0
  3. chmod +x configure*
  4. ./configure --prefix=${INSTALL_PATH}/tbase_bin_v2.0 --enable-user-switch --with-openssl --with-ossp-uuid CFLAGS=-g
  5. make clean
  6. make -sj
  7. make install
  8. chmod +x contrib/pgxc_ctl/make_signature
  9. cd contrib
  10. make -sj
  11. make install

本文的使用环境中,上述两个参数如下

${SOURCECODE_PATH}=/data/tbase/TBase-master

${INSTALL_PATH}=/data/tbase/install

  • 集群安装

    • 集群规划下面以两台服务器上搭建1GTM主,1GTM备,2CN主(CN主之间对等,因此无需备CN),2DN主,2DN备的集群,该集群为具备容灾能力的最小配置
  1. 机器110.215.147.158
  2. 机器210.240.138.159

集群规划如下:

节点名称IP数据目录GTM master10.215.147.158/data/tbase/data/gtmGTM slave10.240.138.159/data/tbase/data/gtmCN110.215.147.158/data/tbase/data/coordCN210.240.138.159/data/tbase/data/coordDN1 master10.215.147.158/data/tbase/data/dn001DN1 slave10.240.138.159/data/tbase/data/dn001DN2 master10.240.138.159/data/tbase/data/dn002DN2 slave10.215.147.158/data/tbase/data/dn002

示意图TBase部署示意图

  • 机器间的ssh互信配置参考Linux ssh互信配置

  • 环境变量配置集群所有机器都需要配置

  1. [tbase@TENCENT64 ~]$ vim ~/.bashrc
  2. export TBASE_HOME=/data/tbase/install/tbase_bin_v2.0
  3. export PATH=$TBASE_HOME/bin:$PATH
  4. export LD_LIBRARY_PATH=$TBASE_HOME/lib:${LD_LIBRARY_PATH}

以上,已经配置好了所需要基础环境,可以进入到集群初始化阶段,为了方便用户,TBase提供了专用的配置和操作工具:pgxc_ctl来协助用户快速搭建并管理集群,首先需要将前文所述的节点的ip,端口,目录写入到配置文件 pgxc_ctl.conf 中。

  • 初始化pgxc_ctl.conf文件
  1. [tbase@TENCENT64 ~]$ mkdir /data/tbase/pgxc_ctl
  2. [tbase@TENCENT64 ~]$ cd /data/tbase/pgxc_ctl
  3. [tbase@TENCENT64 ~/pgxc_ctl]$ vim pgxc_ctl.conf

如下,是结合上文描述的IP,端口,数据库目录,二进制目录等规划来写的pgxc_ctl.conf文件。具体实践中只需按照自己的实际情况配置好即可.

  1. #!/bin/bash
  2.  
  3. pgxcInstallDir=/data/tbase/install/tbase_bin_v2.0
  4. pgxcOwner=tbase
  5. defaultDatabase=postgres
  6. pgxcUser=$pgxcOwner
  7. tmpDir=/tmp
  8. localTmpDir=$tmpDir
  9. configBackup=n
  10. configBackupHost=pgxc-linker
  11. configBackupDir=$HOME/pgxc
  12. configBackupFile=pgxc_ctl.bak
  13.  
  14.  
  15. #---- GTM ----------
  16. gtmName=gtm
  17. gtmMasterServer=10.215.147.158
  18. gtmMasterPort=50001
  19. gtmMasterDir=/data/tbase/data/gtm
  20. gtmExtraConfig=none
  21. gtmMasterSpecificExtraConfig=none
  22. gtmSlave=y
  23. gtmSlaveServer=10.240.138.159
  24. gtmSlavePort=50001
  25. gtmSlaveDir=/data/tbase/data/gtm
  26. gtmSlaveSpecificExtraConfig=none
  27.  
  28. #---- Coordinators -------
  29. coordMasterDir=/data/tbase/data/coord
  30. coordMasterDir=/data/tbase/data/coord
  31. coordArchLogDir=/data/tbase/data/coord_archlog
  32.  
  33. coordNames=(cn001 cn002 )
  34. coordPorts=(30004 30004 )
  35. poolerPorts=(31110 31110 )
  36. coordPgHbaEntries=(0.0.0.0/0)
  37. coordMasterServers=(10.215.147.158 10.240.138.159)
  38. coordMasterDirs=($coordMasterDir $coordMasterDir)
  39. coordMaxWALsernder=2
  40. coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder )
  41. coordSlave=n
  42. coordSlaveSync=n
  43. coordArchLogDirs=($coordArchLogDir $coordArchLogDir)
  44.  
  45. coordExtraConfig=coordExtraConfig
  46. cat > $coordExtraConfig <<EOF
  47. #================================================
  48. # Added to all the coordinator postgresql.conf
  49. # Original: $coordExtraConfig
  50.  
  51. include_if_exists = '/data/tbase/global/global_tbase.conf'
  52.  
  53. wal_level = replica
  54. wal_keep_segments = 256
  55. max_wal_senders = 4
  56. archive_mode = on
  57. archive_timeout = 1800
  58. archive_command = 'echo 0'
  59. log_truncate_on_rotation = on
  60. log_filename = 'postgresql-%M.log'
  61. log_rotation_age = 4h
  62. log_rotation_size = 100MB
  63. hot_standby = on
  64. wal_sender_timeout = 30min
  65. wal_receiver_timeout = 30min
  66. shared_buffers = 1024MB
  67. max_pool_size = 2000
  68. log_statement = 'ddl'
  69. log_destination = 'csvlog'
  70. logging_collector = on
  71. log_directory = 'pg_log'
  72. listen_addresses = '*'
  73. max_connections = 2000
  74.  
  75. EOF
  76.  
  77. coordSpecificExtraConfig=(none none)
  78. coordExtraPgHba=coordExtraPgHba
  79. cat > $coordExtraPgHba <<EOF
  80.  
  81. local all all trust
  82. host all all 0.0.0.0/0 trust
  83. host replication all 0.0.0.0/0 trust
  84. host all all ::1/128 trust
  85. host replication all ::1/128 trust
  86.  
  87.  
  88. EOF
  89.  
  90.  
  91. coordSpecificExtraPgHba=(none none)
  92. coordAdditionalSlaves=n
  93. cad1_Sync=n
  94.  
  95. #---- Datanodes ---------------------
  96. dn1MstrDir=/data/tbase/data/dn001
  97. dn2MstrDir=/data/tbase/data/dn002
  98. dn1SlvDir=/data/tbase/data/dn001
  99. dn2SlvDir=/data/tbase/data/dn002
  100. dn1ALDir=/data/tbase/data/datanode_archlog
  101. dn2ALDir=/data/tbase/data/datanode_archlog
  102.  
  103. primaryDatanode=dn001
  104. datanodeNames=(dn001 dn002)
  105. datanodePorts=(40004 40004)
  106. datanodePoolerPorts=(41110 41110)
  107. datanodePgHbaEntries=(0.0.0.0/0)
  108. datanodeMasterServers=(10.215.147.158 10.240.138.159)
  109. datanodeMasterDirs=($dn1MstrDir $dn2MstrDir)
  110. dnWALSndr=4
  111. datanodeMaxWALSenders=($dnWALSndr $dnWALSndr)
  112.  
  113. datanodeSlave=y
  114. datanodeSlaveServers=(10.240.138.159 10.215.147.158)
  115. datanodeSlavePorts=(50004 54004)
  116. datanodeSlavePoolerPorts=(51110 51110)
  117. datanodeSlaveSync=n
  118. datanodeSlaveDirs=($dn1SlvDir $dn2SlvDir)
  119. datanodeArchLogDirs=($dn1ALDir/dn001 $dn2ALDir/dn002)
  120.  
  121. datanodeExtraConfig=datanodeExtraConfig
  122. cat > $datanodeExtraConfig <<EOF
  123. #================================================
  124. # Added to all the coordinator postgresql.conf
  125. # Original: $datanodeExtraConfig
  126.  
  127. include_if_exists = '/data/tbase/global/global_tbase.conf'
  128. listen_addresses = '*'
  129. wal_level = replica
  130. wal_keep_segments = 256
  131. max_wal_senders = 4
  132. archive_mode = on
  133. archive_timeout = 1800
  134. archive_command = 'echo 0'
  135. log_directory = 'pg_log'
  136. logging_collector = on
  137. log_truncate_on_rotation = on
  138. log_filename = 'postgresql-%M.log'
  139. log_rotation_age = 4h
  140. log_rotation_size = 100MB
  141. hot_standby = on
  142. wal_sender_timeout = 30min
  143. wal_receiver_timeout = 30min
  144. shared_buffers = 1024MB
  145. max_connections = 4000
  146. max_pool_size = 4000
  147. log_statement = 'ddl'
  148. log_destination = 'csvlog'
  149. wal_buffers = 1GB
  150.  
  151. EOF
  152.  
  153. datanodeSpecificExtraConfig=(none none)
  154. datanodeExtraPgHba=datanodeExtraPgHba
  155. cat > $datanodeExtraPgHba <<EOF
  156.  
  157. local all all trust
  158. host all all 0.0.0.0/0 trust
  159. host replication all 0.0.0.0/0 trust
  160. host all all ::1/128 trust
  161. host replication all ::1/128 trust
  162.  
  163.  
  164. EOF
  165.  
  166.  
  167. datanodeSpecificExtraPgHba=(none none)
  168.  
  169. datanodeAdditionalSlaves=n
  170. walArchive=n
  • 分发二进制包在一个节点配置好配置文件后,需要预先将二进制包部署到所有节点所在的机器上,这个可以使用pgxc_ctl工具,执行deploy all命令来完成。
  1. [tbase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl
  2. /usr/bin/bash
  3. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  4. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  5. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf
  6. Finished reading configuration.
  7. ******** PGXC_CTL START ***************
  8.  
  9. Current directory: /data/tbase/pgxc_ctl
  10. PGXC deploy all
  11. Deploying Postgres-XL components to all the target servers.
  12. Prepare tarball to deploy ...
  13. Deploying to the server 10.215.147.158.
  14. Deploying to the server 10.240.138.159.
  15. Deployment done.
  16.  
  17. 登录到所有节点,check二进制包是否分发OK
  18. [tbase@TENCENT64 ~/install]$ ls /data/tbase/install/tbase_bin_v2.0
  19. bin include lib share
  • 执行init all命令,完成集群初始化命令
  1. [tbase@TENCENT64 ~]$ pgxc_ctl
  2. /usr/bin/bash
  3. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  4. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  5. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf
  6. Finished reading configuration.
  7. ******** PGXC_CTL START ***************
  8.  
  9. Current directory: /data/tbase/pgxc_ctl
  10. PGXC init all
  11. Initialize GTM master
  12. ....
  13. ....
  14. Initialize datanode slave dn001
  15. Initialize datanode slave dn002
  16. mkdir: cannot create directory '/data1/tbase': Permission denied
  17. chmod: cannot access '/data1/tbase/data/dn001': No such file or directory
  18. pg_ctl: directory "/data1/tbase/data/dn001" does not exist
  19. pg_basebackup: could not create directory "/data1/tbase": Permission denied
  • 安装错误处理一般init集群出错,终端会打印出错误日志,通过查看错误原因,更改配置即可,或者可以通过/data/tbase/pgxc_ctl/pgxc_log路径下的错误日志查看错误,排查配置文件的错误
  1. [tbase@TENCENT64 ~]$ ll ~/pgxc_ctl/pgxc_log/
  2. total 184
  3. -rw-rw-r-- 1 tbase tbase 81123 Nov 13 17:22 14105_pgxc_ctl.log
  4. -rw-rw-r-- 1 tbase tbase 2861 Nov 13 17:58 15762_pgxc_ctl.log
  5. -rw-rw-r-- 1 tbase tbase 14823 Nov 14 07:59 16671_pgxc_ctl.log
  6. -rw-rw-r-- 1 tbase tbase 2721 Nov 13 16:52 18891_pgxc_ctl.log
  7. -rw-rw-r-- 1 tbase tbase 1409 Nov 13 16:20 22603_pgxc_ctl.log
  8. -rw-rw-r-- 1 tbase tbase 60043 Nov 13 16:33 28932_pgxc_ctl.log
  9. -rw-rw-r-- 1 tbase tbase 15671 Nov 14 07:57 6849_pgxc_ctl.log

通过运行 pgxc_ctl 工具,执行clean all命令删除已经初始化的文件,修改pgxc_ctl.conf文件,重新执行init all命令重新发起初始化。

  1. [tbase@TENCENT64 ~]$ pgxc_ctl
  2. /usr/bin/bash
  3. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  4. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  5. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf
  6. Finished reading configuration.
  7. ******** PGXC_CTL START ***************
  8.  
  9. Current directory: /data/tbase/pgxc_ctl
  10. PGXC clean all
  11.  
  12.  
  13. [tbase@TENCENT64 ~]$ pgxc_ctl
  14. /usr/bin/bash
  15. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  16. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  17. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf
  18. Finished reading configuration.
  19. ******** PGXC_CTL START ***************
  20.  
  21. Current directory: /data/tbase/pgxc_ctl
  22. PGXC init all
  23. Initialize GTM master
  24. EXECUTE DIRECT ON (dn002) 'ALTER NODE dn002 WITH (TYPE=''datanode'', HOST=''10.240.138.159'', PORT=40004, PREFERRED)';
  25. EXECUTE DIRECT
  26. EXECUTE DIRECT ON (dn002) 'SELECT pgxc_pool_reload()';
  27. pgxc_pool_reload
  28. ------------------
  29. t
  30. (1 row)
  31.  
  32. Done.
  • 查看集群状态当发现上面的输出时,集群已经OK,另外也可以通过pgxc_ctl工具的monitor all命令来查看集群状态
  1. [tbase@TENCENT64 ~/pgxc_ctl]$ pgxc_ctl
  2. /usr/bin/bash
  3. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  4. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  5. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf
  6. Finished reading configuration.
  7. ******** PGXC_CTL START ***************
  8.  
  9. Current directory: /data/tbase/pgxc_ctl
  10. PGXC monitor all
  11. Running: gtm master
  12. Not running: gtm slave
  13. Running: coordinator master cn001
  14. Running: coordinator master cn002
  15. Running: datanode master dn001
  16. Running: datanode slave dn001
  17. Running: datanode master dn002
  18. Not running: datanode slave dn002

一般的如果配置的不是强同步模式,gtm salve,dn slave的故障不会影响访问。

  • 集群访问访问TBase集群和访问单机的PostgreSQL基本上无差别,我们可以通过任意一个CN访问数据库集群:例如通过连接CN节点select pgxc_node表即可查看集群的拓扑结构(当前的配置下备机不会展示在pgxc_node中),在Linux命令行下通过psql访问的具体示例如下
  1. [tbase@TENCENT64 ~/pgxc_ctl]$ psql -h 10.215.147.158 -p 30004 -d postgres -U tbase
  2. psql (PostgreSQL 10.0 TBase V2)
  3. Type "help" for help.
  4.  
  5. postgres=# \d
  6. Did not find any relations.
  7. postgres=# select * from pgxc_node;
  8. node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id | node_cluster_name
  9. -----------+-----------+-----------+----------------+----------------+------------------+------------+-------------------
  10. gtm | G | 50001 | 10.215.147.158 | t | f | 428125959 | tbase_cluster
  11. cn001 | C | 30004 | 10.215.147.158 | f | f | -264077367 | tbase_cluster
  12. cn002 | C | 30004 | 10.240.138.159 | f | f | -674870440 | tbase_cluster
  13. dn001 | D | 40004 | 10.215.147.158 | t | t | 2142761564 | tbase_cluster
  14. dn002 | D | 40004 | 10.240.138.159 | f | f | -17499968 | tbase_cluster
  15. (5 rows)
  • 使用数据库前需要创建default group以及sharding表TBase使用datanode group来增加节点的管理灵活度,要求有一个default group才能使用,因此需要预先创建;一般情况下,会将节点的所有datanode节点加入到default group里另外一方面,TBase的数据分布为了增加灵活度,加了中间逻辑层来维护数据记录到物理节点的映射,我们叫sharding,所以需要预先创建sharding,命令如下:
  1. postgres=# create default node group default_group with (dn001,dn002);
  2. CREATE NODE GROUP
  3. postgres=# create sharding group to group default_group;
  4. CREATE SHARDING GROUP
  • 创建数据库,用户,创建表,增删查改等操作至此,就可以跟使用单机数据库一样来访问数据库集群了
  1. postgres=# create database test;
  2. CREATE DATABASE
  3. postgres=# create user test with password 'test';
  4. CREATE ROLE
  5. postgres=# alter database test owner to test;
  6. ALTER DATABASE
  7. postgres=# \c test test
  8. You are now connected to database "test" as user "test".
  9. test=> create table foo(id bigint, str text) distribute by shard(id);
  10. CREATE TABLE
  11. test=> insert into foo values(1, 'tencent'), (2, 'shenzhen');
  12. COPY 2
  13. test=> select * from foo;
  14. id | str
  15. ----+----------
  16. 1 | tencent
  17. 2 | shenzhen
  18. (2 rows)
  • 停止集群通过pgxc_ctl工具的stop all命令来停止集群,stop all 后面可以加上参数 -m fast或者是-m immediate来决定如何停止各个节点。
  1. PGXC stop all -m fast
  2. Stopping all the coordinator masters.
  3. Stopping coordinator master cn001.
  4. Stopping coordinator master cn002.
  5. Done.
  6. Stopping all the datanode slaves.
  7. Stopping datanode slave dn001.
  8. Stopping datanode slave dn002.
  9. pg_ctl: PID file "/data/tbase/data/dn002/postmaster.pid" does not exist
  10. Is server running?
  11. Stopping all the datanode masters.
  12. Stopping datanode master dn001.
  13. Stopping datanode master dn002.
  14. Done.
  15. Stop GTM slave
  16. waiting for server to shut down..... done
  17. server stopped
  18. Stop GTM master
  19. waiting for server to shut down.... done
  20. server stopped
  21. PGXC monitor all
  22. Not running: gtm master
  23. Not running: gtm slave
  24. Not running: coordinator master cn001
  25. Not running: coordinator master cn002
  26. Not running: datanode master dn001
  27. Not running: datanode slave dn001
  28. Not running: datanode master dn002
  29. Not running: datanode slave dn002
  • 启动集群通过pgxc_ctl工具的start all命令来启动集群
  1. [tbase@TENCENT64 ~]$ pgxc_ctl
  2. /usr/bin/bash
  3. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  4. Installing pgxc_ctl_bash script as /data/tbase/pgxc_ctl/pgxc_ctl_bash.
  5. Reading configuration using /data/tbase/pgxc_ctl/pgxc_ctl_bash --home /data/tbase/pgxc_ctl --configuration /data/tbase/pgxc_ctl/pgxc_ctl.conf
  6. Finished reading configuration.
  7. ******** PGXC_CTL START ***************
  8.  
  9. Current directory: /data/tbase/pgxc_ctl
  10. PGXC start all
  • 结语

本文档只是给用户一个简单的指引,演示如何从源码开始,一步一步搭建一个完整的TBase集群,后续会有更多的文章来介绍TBase的特性使用,优化,问题定位等内容。