PGSQL Standby 同步从库部署

同步从库 —— 与主库保持同步提交,零复制延迟的热备从库,启用额外的服务,以及法定人数同步提交。

CAP定理指出:可用性与一致性两者相互抵触,用户必须根据自己的需求进行权衡。 高可用是一方面,而另一面则是高一致,Pigsty允许您创建高一致性的集群,确保出现故障切换时数据不丢,乃至于整个集群保持实时同步一致。

正常情况下,PostgreSQL的复制延迟在几十KB/10ms的量级,对于常规业务而言可以近似忽略不计。重要的是,当主库出现故障时,尚未完成复制的数据会丢失!当您在处理非常关键与精密的业务查询时(例如和钱打交道),复制延迟可能会成为一个问题。此外,或者在主库写入后,立刻向从库查询刚才的写入(read-your-write),也会对复制延迟非常敏感。

为了解决此类问题,需要用到同步从库

同步从库

一种简单的配置同步从库的方式是使用 pg_conf = crit 模板,该模板会自动启用同步复制与校验和,适用于和钱有关的,追求一致性的场景。

  1. #----------------------------------#
  2. # pgsql cluster: pg-test (3 nodes) #
  3. #----------------------------------#
  4. # pg-test ---> 10.10.10.3 ---> 10.10.10.1{1,2,3}
  5. pg-test: # define the new 3-node cluster pg-test
  6. hosts:
  7. 10.10.10.11: { pg_seq: 1, pg_role: primary } # primary instance, leader of cluster
  8. 10.10.10.12: { pg_seq: 2, pg_role: replica } # replica instance, follower of leader
  9. 10.10.10.13: { pg_seq: 3, pg_role: replica, pg_offline_query: true } # replica with offline access
  10. vars:
  11. pg_conf: crit.yml # Crit 模板默认启用同步复制
  12. pg_cluster: pg-test # define pgsql cluster name
  13. pg_version: 14 # test postgresql 13 with pg-test cluster
  14. pg_users:
  15. - name: test
  16. password: test
  17. pgbouncer: true
  18. roles: [ dbrole_admin ]
  19. pg_databases: # create a database and user named 'test'
  20. - name: test
  21. vip_mode: l2 # enable/disable vip (require members in same LAN)
  22. vip_address: 10.10.10.3 # virtual ip address for this cluster
  23. vip_cidrmask: 8 # cidr network mask length
  24. vip_interface: eth1 # interface to add virtual ip

使用同步提交时,强烈建议集群至少有3个实例,否则唯一的从库故障将立即导致主库不可用。

启用同步

对于已有的数据库集群,您可以在集群创建完毕后,在元节点上执行 pg edit-config <cluster.name> , 编辑集群配置文件,修改参数synchronous_mode的值为true并应用即可。

  1. $ pg edit-config pg-test
  2. ---
  3. +++
  4. -synchronous_mode: false
  5. +synchronous_mode: true
  6. synchronous_mode_strict: false
  7. Apply these changes? [y/N]: y

使用 pg list pg-test 打印集群状态,如果出现了一个 Sync Standby,则说明已经成功为集群配置了同步从库。 在出现故障时,该同步从库将作为第一顺位候选继承人接管,确保没有数据损失。

  1. $ pg list pg-test
  2. + Cluster: pg-test (7105646627234977503) +---------+----+-----------+-----------------+
  3. | Member | Host | Role | State | TL | Lag in MB | Tags |
  4. +-----------+-------------+--------------+---------+----+-----------+-----------------+
  5. | pg-test-1 | 10.10.10.11 | Leader | running | 3 | | clonefrom: true |
  6. | pg-test-2 | 10.10.10.12 | Replica | running | 3 | 0 | clonefrom: true |
  7. | pg-test-3 | 10.10.10.13 | Sync Standby | running | 3 | 0 | clonefrom: true |
  8. +-----------+-------------+--------------+---------+----+-----------+-----------------+

在PG中启用同步提交,默认会有一个从库实例被选为同步从库,而其他的实例会仍然会使用异步提交模式,以降低事务延迟,提高性能。如果您需要在整个集群范围内获得更强的一致性,可以使用法定人数同步提交

法定人数同步提交

在默认情况下,同步复制会从所有候选从库 挑选一个实例,作为同步从库,任何主库事务只有当复制到从库并Flush至磁盘上时,方视作成功提交并返回。 如果我们期望更高的数据持久化保证,例如,在一个一主三从的四实例集群中,至少有两个从库成功刷盘后才确认提交,则可以使用法定人数提交。

使用法定人数提交时,需要修改 PostgreSQL 中 synchronous_standby_names 参数的值,并配套修改Patroni中 synchronous_node_count 的值。假设三个从库分别为 pg-test-2, pg-test-3, pg-test-4 ,那么应当配置:

  • synchronous_standby_names = ANY 2 (pg-test-2, pg-test-3, pg-test-4)
  • synchronous_node_count : 2
  1. pg-test:
  2. hosts:
  3. 10.10.10.10: { pg_seq: 1, pg_role: primary } # pg-test-1
  4. 10.10.10.11: { pg_seq: 2, pg_role: replica } # pg-test-2
  5. 10.10.10.12: { pg_seq: 3, pg_role: replica } # pg-test-3
  6. 10.10.10.13: { pg_seq: 4, pg_role: replica } # pg-test-4
  7. vars:
  8. pg_cluster: pg-test

执行pg edit-config pg-test,并修改配置如下:

  1. $ pg edit-config pg-test
  2. ---
  3. +++
  4. @@ -82,10 +82,12 @@
  5. work_mem: 4MB
  6. + synchronous_standby_names: 'ANY 2 (pg-test-2, pg-test-3, pg-test-4)'
  7. -synchronous_mode: false
  8. +synchronous_mode: true
  9. +synchronous_node_count: 2
  10. synchronous_mode_strict: false
  11. Apply these changes? [y/N]: y

应用后,即可看到配置生效,出现两个Sync Standby,当集群出现Failover或扩缩容时,请相应调整这些参数以免服务不可用。

  1. + Cluster: pg-test (7080814403632534854) +---------+----+-----------+-----------------+
  2. | Member | Host | Role | State | TL | Lag in MB | Tags |
  3. +-----------+-------------+--------------+---------+----+-----------+-----------------+
  4. | pg-test-1 | 10.10.10.10 | Leader | running | 1 | | clonefrom: true |
  5. | pg-test-2 | 10.10.10.11 | Sync Standby | running | 1 | 0 | clonefrom: true |
  6. | pg-test-3 | 10.10.10.12 | Sync Standby | running | 1 | 0 | clonefrom: true |
  7. | pg-test-4 | 10.10.10.13 | Replica | running | 1 | 0 | clonefrom: true |
  8. +-----------+-------------+--------------+---------+----+-----------+-----------------+

接入同步服务

对于启用同步提交的集群,您可以在参考配置文件,在集群中额外配置 standby 服务,提供与主库完全一致的无延迟读取服务。

  1. #----------------------------------#
  2. # pgsql cluster: pg-test (3 nodes) #
  3. #----------------------------------#
  4. # pg-test ---> 10.10.10.3 ---> 10.10.10.1{1,2,3}
  5. pg-test: # define the new 3-node cluster pg-test
  6. hosts:
  7. 10.10.10.11: { pg_seq: 1, pg_role: primary } # primary instance, leader of cluster
  8. 10.10.10.12: { pg_seq: 2, pg_role: replica } # replica instance, follower of leader
  9. 10.10.10.13: { pg_seq: 3, pg_role: replica, pg_offline_query: true } # replica with offline access
  10. vars:
  11. pg_cluster: pg-test # define pgsql cluster name
  12. pg_version: 14 # test postgresql 13 with pg-test cluster
  13. pg_users:
  14. - name: test
  15. password: test
  16. pgbouncer: true
  17. roles: [ dbrole_admin ]
  18. pg_databases: # create a database and user named 'test'
  19. - name: test
  20. vip_mode: l2 # enable/disable vip (require members in same LAN)
  21. vip_address: 10.10.10.3 # virtual ip address for this cluster
  22. vip_cidrmask: 8 # cidr network mask length
  23. vip_interface: eth1 # interface to add virtual ip
  24. pg_services_extra: # extra services in addition to pg_services_default, array of service definition
  25. # standby service will route {ip|name}:5435 to sync replica's pgbouncer (5435->6432 standby)
  26. - name: standby # required, service name, the actual svc name will be prefixed with `pg_cluster`, e.g: pg-meta-standby
  27. src_ip: "*" # required, service bind ip address, `*` for all ip, `vip` for cluster `vip_address`
  28. src_port: 5435 # required, service exposed port (work as kubernetes service node port mode)
  29. dst_port: pgbouncer # optional, destination port, postgres|pgbouncer|<port_number> , pgbouncer(6432) by default
  30. check_method: http # optional, health check method: http is the only available method for now
  31. check_port: patroni # optional, health check port: patroni|pg_exporter|<port_number> , patroni(8008) by default
  32. check_url: /sync # optional, health check url path, /read-only?lag=0 by default
  33. check_code: 200 # optional, health check expected http code, 200 by default
  34. selector: "[]" # required, JMESPath to filter inventory ()
  35. selector_backup: "[? pg_role == `primary`]" # primary used as backup server for standby service (will not work because /sync for )
  36. haproxy: # optional, adhoc parameters for haproxy service provider (vip_l4 is another service provider)
  37. maxconn: 3000 # optional, max allowed front-end connection
  38. balance: roundrobin # optional, haproxy load balance algorithm (roundrobin by default, other: leastconn)
  39. default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

新定义的服务,需要重新配置负载均衡器方可生效。 您可以检查HAProxy负载均衡器流量管控页面,看看哪些实例正在承载同步服务。

  1. bin/reloadha pg-test # 将新定义的standby服务加载到LB中

最后修改 2022-06-05: add pgsql/deploy document (34a3325)