背景

1、PG物理流复制的从库,当激活后,可以开启读写,使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。

2、当异步主从发生角色切换后,主库的wal目录中可能还有没完全同步到从库的内容,因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库,使之成为新主库的只读从库。而不需要重建整个从库。

3、如果没有pg_rewind,遇到以上情况,需要完全重建从库。或者你可以使用存储层快照,回退回脑裂以前的状态。又或者可以使用文件系统快照,回退回脑裂以前的状态。

原理与修复步骤

1、使用pg_rewind功能的前提条件:必须开启full page write,必须开启wal hint或者data block checksum。

2、需要被修复的库:从激活点开始,所有的WAL必须存在pg_wal目录中。如果WAL已经被覆盖,只要有归档,拷贝到pg_wal目录即可。

3、新的主库,从激活点开始,产生的所有WAL必须存在pg_wal目录中,或者已归档,并且被修复的库可以使用restore_command访问到这部分WAL。

4、修改(source db)新主库或老主库配置,允许连接。

5、修复时,连接新主库,得到切换点。或连接老主库,同时比对当前要修复的新主库的TL与老主库进行比对,得到切换点。

6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库(或老主库)),进行回退操作(被修改或删除的BLOCK从source db获取并覆盖,新增的BLOCK,直接抹除。)回退到切换点的状态。

7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。

8、启动target db,连接source db接收WAL,或restore_command配置接收WAL,从切换点开始所有WAL,进行apply。

9、target db现在是source db的从库。

以EDB PG 11为例讲解

环境部署

《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 迁移到 PG, PPAS (支持跨版本升级)》

  1. export PS1="$USER@`/bin/hostname -s`-> "
  2. export PGPORT=4000
  3. export PGDATA=/data04/ppas11/pg_root4000
  4. export LANG=en_US.utf8
  5. export PGHOME=/usr/edb/as11
  6. export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
  7. export DATE=`date +"%Y%m%d%H%M"`
  8. export PATH=$PGHOME/bin:$PATH:.
  9. export MANPATH=$PGHOME/share/man:$MANPATH
  10. export PGHOST=127.0.0.1
  11. export PGUSER=postgres
  12. export PGDATABASE=postgres
  13. alias rm='rm -i'
  14. alias ll='ls -lh'
  15. unalias vi

1、初始化数据库集群

  1. initdb -D /data04/ppas11/pg_root4000 -E UTF8 --lc-collate=C --lc-ctype=en_US.UTF8 -U postgres -k --redwood-like

2、配置recovery.done

  1. cd $PGDATA
  2. cp $PGHOME/share/recovery.conf.sample ./
  3. mv recovery.conf.sample recovery.done
  4. vi recovery.done
  5. restore_command = 'cp /data04/ppas11/wal/%f %p'
  6. recovery_target_timeline = 'latest'
  7. standby_mode = on
  8. primary_conninfo = 'host=localhost port=4000 user=postgres'

3、配置postgresql.conf

要使用rewind功能:

必须开启full_page_writes

必须开启data_checksums或wal_log_hints

  1. postgresql.conf
  2. listen_addresses = '0.0.0.0'
  3. port = 4000
  4. max_connections = 8000
  5. superuser_reserved_connections = 13
  6. unix_socket_directories = '.,/tmp'
  7. unix_socket_permissions = 0700
  8. tcp_keepalives_idle = 60
  9. tcp_keepalives_interval = 10
  10. tcp_keepalives_count = 10
  11. shared_buffers = 16GB
  12. max_prepared_transactions = 8000
  13. maintenance_work_mem = 1GB
  14. autovacuum_work_mem = 1GB
  15. dynamic_shared_memory_type = posix
  16. vacuum_cost_delay = 0
  17. bgwriter_delay = 10ms
  18. bgwriter_lru_maxpages = 1000
  19. bgwriter_lru_multiplier = 10.0
  20. effective_io_concurrency = 0
  21. max_worker_processes = 128
  22. max_parallel_maintenance_workers = 8
  23. max_parallel_workers_per_gather = 8
  24. max_parallel_workers = 24
  25. wal_level = replica
  26. synchronous_commit = off
  27. full_page_writes = on
  28. wal_compression = on
  29. wal_buffers = 32MB
  30. wal_writer_delay = 10ms
  31. checkpoint_timeout = 25min
  32. max_wal_size = 32GB
  33. min_wal_size = 8GB
  34. checkpoint_completion_target = 0.2
  35. archive_mode = on
  36. archive_command = 'cp -n %p /data04/ppas11/wal/%f'
  37. max_wal_senders = 16
  38. wal_keep_segments = 4096
  39. max_replication_slots = 16
  40. hot_standby = on
  41. max_standby_archive_delay = 300s
  42. max_standby_streaming_delay = 300s
  43. wal_receiver_status_interval = 1s
  44. wal_receiver_timeout = 10s
  45. random_page_cost = 1.1
  46. effective_cache_size = 400GB
  47. log_destination = 'csvlog'
  48. logging_collector = on
  49. log_directory = 'log'
  50. log_filename = 'edb-%a.log'
  51. log_truncate_on_rotation = on
  52. log_rotation_age = 1d
  53. log_rotation_size = 0
  54. log_min_duration_statement = 1s
  55. log_checkpoints = on
  56. log_error_verbosity = verbose
  57. log_line_prefix = '%t '
  58. log_lock_waits = on
  59. log_statement = 'ddl'
  60. log_timezone = 'PRC'
  61. autovacuum = on
  62. log_autovacuum_min_duration = 0
  63. autovacuum_max_workers = 6
  64. autovacuum_freeze_max_age = 1200000000
  65. autovacuum_multixact_freeze_max_age = 1400000000
  66. autovacuum_vacuum_cost_delay = 0
  67. statement_timeout = 0
  68. lock_timeout = 0
  69. idle_in_transaction_session_timeout = 0
  70. vacuum_freeze_table_age = 1150000000
  71. vacuum_multixact_freeze_table_age = 1150000000
  72. datestyle = 'redwood,show_time'
  73. timezone = 'PRC'
  74. lc_messages = 'en_US.utf8'
  75. lc_monetary = 'en_US.utf8'
  76. lc_numeric = 'en_US.utf8'
  77. lc_time = 'en_US.utf8'
  78. default_text_search_config = 'pg_catalog.english'
  79. shared_preload_libraries = 'auto_explain,pg_stat_statements,$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq'
  80. edb_redwood_date = on
  81. edb_redwood_greatest_least = on
  82. edb_redwood_strings = on
  83. db_dialect = 'redwood'
  84. edb_dynatune = 66
  85. edb_dynatune_profile = oltp
  86. timed_statistics = off

4、配置pg_hba.conf,允许流复制

  1. local all all trust
  2. host all all 127.0.0.1/32 trust
  3. host all all ::1/128 trust
  4. local replication all trust
  5. host replication all 127.0.0.1/32 trust
  6. host replication all ::1/128 trust
  7. host all all 0.0.0.0/0 md5

5、配置归档目录

  1. mkdir /data04/ppas11/wal
  2. chown enterprisedb:enterprisedb /data04/ppas11/wal

6、创建从库

  1. pg_basebackup -h 127.0.0.1 -p 4000 -D /data04/ppas11/pg_root4001 -F p -c fast

7、配置从库

  1. cd /data04/ppas11/pg_root4001
  2. mv recovery.done recovery.conf
  1. vi postgresql.conf
  2. port = 4001

8、启动从库

  1. pg_ctl start -D /data04/ppas11/pg_root4001

9、压测主库

  1. pgbench -i -s 1000
  2. pgbench -M prepared -v -r -P 1 -c 24 -j 24 -T 300

10、检查归档

  1. postgres=# select * from pg_stat_archiver ;
  2. archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset
  3. ----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------
  4. 240 | 0000000100000000000000F0 | 28-JAN-19 15:08:43.276965 +08:00 | 0 | | | 28-JAN-19 15:01:17.883338 +08:00
  5. (1 row)
  6. postgres=# select * from pg_stat_archiver ;
  7. archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset
  8. ----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------
  9. 248 | 0000000100000000000000F8 | 28-JAN-19 15:08:45.120134 +08:00 | 0 | | | 28-JAN-19 15:01:17.883338 +08:00
  10. (1 row)

11、检查从库延迟

  1. postgres=# select * from pg_stat_replication ;
  2. -[ RECORD 1 ]----+---------------------------------
  3. pid | 8124
  4. usesysid | 10
  5. usename | postgres
  6. application_name | walreceiver
  7. client_addr | 127.0.0.1
  8. client_hostname |
  9. client_port | 62988
  10. backend_start | 28-JAN-19 15:07:34.084542 +08:00
  11. backend_xmin |
  12. state | streaming
  13. sent_lsn | 1/88BC2000
  14. write_lsn | 1/88BC2000
  15. flush_lsn | 1/88BC2000
  16. replay_lsn | 1/88077D48
  17. write_lag | 00:00:00.001417
  18. flush_lag | 00:00:00.002221
  19. replay_lag | 00:00:00.097657
  20. sync_priority | 0
  21. sync_state | async

例子1,从库激活后产生读写,使用pg_rewind修复从库,回退到只读从库

1、激活从库

  1. pg_ctl promote -D /data04/ppas11/pg_root4001

2、写从库

  1. pgbench -M prepared -v -r -P 1 -c 4 -j 4 -T 120 -p 4001

此时从库已经和主库不在一个时间线,无法直接变成当前主库的从库

  1. enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i time
  2. Latest checkpoint's TimeLineID: 1
  3. Latest checkpoint's PrevTimeLineID: 1
  4. Time of latest checkpoint: Mon 28 Jan 2019 03:56:38 PM CST
  5. Min recovery ending loc's timeline: 2
  6. track_commit_timestamp setting: off
  7. Date/time type storage: 64-bit integers
  8. enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i time
  9. Latest checkpoint's TimeLineID: 1
  10. Latest checkpoint's PrevTimeLineID: 1
  11. Time of latest checkpoint: Mon 28 Jan 2019 05:11:38 PM CST
  12. Min recovery ending loc's timeline: 0
  13. track_commit_timestamp setting: off
  14. Date/time type storage: 64-bit integers

3、修复从库,使之继续成为当前主库的从库

4、查看切换点

  1. cd /data04/ppas11/pg_root4001
  2. ll pg_wal/*.history
  3. -rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 pg_wal/00000002.history
  4. cat pg_wal/00000002.history
  5. 1 6/48C62000 no recovery target specified

5、从库激活时间开始产生的WAL必须全部在pg_wal目录中。

  1. -rw------- 1 enterprisedb enterprisedb 42 Jan 28 17:15 00000002.history
  2. -rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:16 000000020000000600000048
  3. ............

000000020000000600000048开始,所有的wal必须存在从库pg_wal目录中。如果已经覆盖了,必须从归档目录拷贝到从库pg_wal目录中。

6、从库激活时,主库从这个时间点开始所有的WAL还在pg_wal目录,或者从库可以使用restore_command获得(recovery.conf)。

  1. recovery.conf
  2. restore_command = 'cp /data04/ppas11/wal/%f %p'

7、pg_rewind命令帮助

https://www.postgresql.org/docs/11/app-pgrewind.html

  1. pg_rewind --help
  2. pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster.
  3. Usage:
  4. pg_rewind [OPTION]...
  5. Options:
  6. -D, --target-pgdata=DIRECTORY existing data directory to modify
  7. --source-pgdata=DIRECTORY source data directory to synchronize with
  8. --source-server=CONNSTR source server to synchronize with
  9. -n, --dry-run stop before modifying anything
  10. -P, --progress write progress messages
  11. --debug write a lot of debug messages
  12. -V, --version output version information, then exit
  13. -?, --help show this help, then exit
  14. Report bugs to <support@enterprisedb.com>.

8、停库(被修复的库,停库)

  1. pg_ctl stop -m fast -D /data04/ppas11/pg_root4001

9、尝试修复

  1. pg_rewind -n -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"
  2. servers diverged at WAL location 6/48C62000 on timeline 1
  3. rewinding from last common checkpoint at 5/5A8CD30 on timeline 1
  4. Done!

10、尝试正常,说明可以修复,实施修复

  1. pg_rewind -D /data04/ppas11/pg_root4001 --source-server="hostaddr=127.0.0.1 user=postgres port=4000"
  2. servers diverged at WAL location 6/48C62000 on timeline 1
  3. rewinding from last common checkpoint at 5/5A8CD30 on timeline 1
  4. Done!

11、已修复,改配置

  1. cd /data04/ppas11/pg_root4001
  2. vi postgresql.conf
  3. port = 4001
  1. mv recovery.done recovery.conf
  2. vi recovery.conf
  3. restore_command = 'cp /data04/ppas11/wal/%f %p'
  4. recovery_target_timeline = 'latest'
  5. standby_mode = on
  6. primary_conninfo = 'host=localhost port=4000 user=postgres'

12、删除归档中错误时间线上产生的文件否则会在启动修复后的从库后,走到00000002时间线上,这是不想看到的。

  1. mkdir /data04/ppas11/wal/error_tl_2
  2. mv /data04/ppas11/wal/00000002* /data04/ppas11/wal/error_tl_2

13、启动从库

  1. pg_ctl start -D /data04/ppas11/pg_root4001

14、建议对主库做一个检查点,从库收到检查点后,重启后不需要应用太多WAL,而是从新检查点开始恢复

  1. psql
  2. checkpoint;

15、压测主库

  1. pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000

16、查看归档状态

  1. postgres=# select * from pg_stat_archiver ;
  2. archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset
  3. ----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------
  4. 1756 | 0000000100000006000000DC | 28-JAN-19 17:41:57.562425 +08:00 | 0 | | | 28-JAN-19 15:01:17.883338 +08:00
  5. (1 row)

17、查看从库健康、延迟,观察修复后的情况

  1. postgres=# select * from pg_stat_replication ;
  2. -[ RECORD 1 ]----+--------------------------------
  3. pid | 13179
  4. usesysid | 10
  5. usename | postgres
  6. application_name | walreceiver
  7. client_addr | 127.0.0.1
  8. client_hostname |
  9. client_port | 63198
  10. backend_start | 28-JAN-19 17:47:29.85308 +08:00
  11. backend_xmin |
  12. state | catchup
  13. sent_lsn | 7/DDE80000
  14. write_lsn | 7/DC000000
  15. flush_lsn | 7/DC000000
  16. replay_lsn | 7/26A8DCB0
  17. write_lag | 00:00:18.373263
  18. flush_lag | 00:00:18.373263
  19. replay_lag | 00:00:18.373263
  20. sync_priority | 0
  21. sync_state | async

例子2,从库激活成为新主库后,老主库依旧有读写,使用pg_rewind修复老主库,将老主库降级为新主库的从库

1、激活从库

  1. pg_ctl promote -D /data04/ppas11/pg_root4001

2、写从库

  1. pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001

3、写主库

  1. pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4000

此时老主库已经和新的主库不在一个时间线

  1. enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4000|grep -i timeline
  2. Latest checkpoint's TimeLineID: 1
  3. Latest checkpoint's PrevTimeLineID: 1
  4. Min recovery ending loc's timeline: 0
  5. enterprisedb@pg11-test-> pg_controldata -D /data04/ppas11/pg_root4001|grep -i timeline
  6. Latest checkpoint's TimeLineID: 1
  7. Latest checkpoint's PrevTimeLineID: 1
  8. Min recovery ending loc's timeline: 2
  9. enterprisedb@pg11-test-> cd /data04/ppas11/pg_root4001/pg_wal
  10. enterprisedb@pg11-test-> cat 00000002.history
  11. 1 8/48DE2318 no recovery target specified
  12. enterprisedb@pg11-test-> ll *.partial
  13. -rw------- 1 enterprisedb enterprisedb 16M Jan 28 17:48 000000010000000800000048.partial

4、修复老主库,变成从库

4.1、从库激活时,老主库从这个时间点开始所有的WAL,必须全部在pg_wal目录中。

000000010000000800000048 开始的所有WAL必须存在pg_wal,如果已经覆盖了,必须从WAL归档拷贝到pg_wal目录

4.2、从库激活时间开始产生的所有WAL,老主库必须可以使用restore_command获得(recovery.conf)。

  1. recovery.conf
  2. restore_command = 'cp /data04/ppas11/wal/%f %p'

5、关闭老主库

  1. pg_ctl stop -m fast -D /data04/ppas11/pg_root4000

6、尝试修复老主库

  1. pg_rewind -n -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"
  2. servers diverged at WAL location 8/48DE2318 on timeline 1
  3. rewinding from last common checkpoint at 6/CCCEF770 on timeline 1
  4. Done!

7、尝试成功,可以修复,实施修复

  1. pg_rewind -D /data04/ppas11/pg_root4000 --source-server="hostaddr=127.0.0.1 user=postgres port=4001"

8、修复完成后,改配置

  1. cd /data04/ppas11/pg_root4000
  2. vi postgresql.conf
  3. port = 4000
  1. mv recovery.done recovery.conf
  2. vi recovery.conf
  3. restore_command = 'cp /data04/ppas11/wal/%f %p'
  4. recovery_target_timeline = 'latest'
  5. standby_mode = on
  6. primary_conninfo = 'host=localhost port=4001 user=postgres'

9、启动老主库

  1. pg_ctl start -D /data04/ppas11/pg_root4000

10、建议对新主库做一个检查点,从库收到检查点后,重启后不需要应用太多WAL,而是从新检查点开始恢复

  1. checkpoint;

11、压测新主库

  1. pgbench -M prepared -v -r -P 1 -c 16 -j 16 -T 200 -p 4001

12、查看归档状态

  1. psql -p 4001
  2. postgres=# select * from pg_stat_archiver ;
  3. archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time | stats_reset
  4. ----------------+--------------------------+----------------------------------+--------------+-----------------+------------------+----------------------------------
  5. 406 | 0000000200000009000000DB | 28-JAN-19 21:18:22.976118 +08:00 | 0 | | | 28-JAN-19 17:47:29.847488 +08:00
  6. (1 row)

13、查看从库健康、延迟

  1. psql -p 4001
  2. postgres=# select * from pg_stat_replication ;
  3. -[ RECORD 1 ]----+---------------------------------
  4. pid | 17675
  5. usesysid | 10
  6. usename | postgres
  7. application_name | walreceiver
  8. client_addr | 127.0.0.1
  9. client_hostname |
  10. client_port | 60530
  11. backend_start | 28-JAN-19 21:18:36.472197 +08:00
  12. backend_xmin |
  13. state | streaming
  14. sent_lsn | 9/E8361C18
  15. write_lsn | 9/E8361C18
  16. flush_lsn | 9/E8361C18
  17. replay_lsn | 9/D235B520
  18. write_lag | 00:00:00.000101
  19. flush_lag | 00:00:00.000184
  20. replay_lag | 00:00:03.028098
  21. sync_priority | 0
  22. sync_state | async

小结

1 适合场景

1、PG物理流复制的从库,当激活后,可以开启读写,使用pg_rewind可以将从库回退为只读从库的角色。而不需要重建整个从库。

2、当异步主从发生角色切换后,主库的wal目录中可能还有没完全同步到从库的内容,因此老的主库无法直接切换为新主库的从库。使用pg_rewind可以修复老的主库,使之成为新主库的只读从库。而不需要重建整个从库。

如果没有pg_rewind,遇到以上情况,需要完全重建从库,如果库占用空间很大,重建非常耗时,也非常耗费上游数据库的资源(读)。

2 前提

要使用rewind功能:

1、必须开启full_page_writes

2、必须开启data_checksums或wal_log_hints

  1. initdb -k 开启data_checksums

3 原理与修复流程

1、使用pg_rewind功能的前提条件:必须开启full page write,必须开启wal hint或者data block checksum。

2、需要被修复的库:从激活点开始,所有的WAL必须存在pg_wal目录中。如果WAL已经被覆盖,只要有归档,拷贝到pg_wal目录即可。

3、新的主库,从激活点开始,产生的所有WAL必须存在pg_wal目录中,或者已归档,并且被修复的库可以使用restore_command访问到这部分WAL。

4、修改(source db)新主库或老主库配置,允许连接。

5、修复时,连接新主库,得到切换点。或连接老主库,同时比对当前要修复的新主库的TL与老主库进行比对,得到切换点。

6、解析需要被修复的库的从切换点到现在所有的WAL。同时连接source db(新主库(或老主库)),进行回退操作(被修改或删除的BLOCK从source db获取并覆盖,新增的BLOCK,直接抹除。)回退到切换点的状态。

7、修改被修复库(target db)的recovery.conf, postgresql.conf配置。

8、启动target db,连接source db接收WAL,或restore_command配置接收WAL,从切换点开始所有WAL,进行apply。

9、target db现在是source db的从库。

参考

https://www.postgresql.org/docs/11/app-pgrewind.html

《PostgreSQL primary-standby failback tools : pg_rewind》

《PostgreSQL 9.5 new feature - pg_rewind fast sync Split Brain Primary & Standby》

《PostgreSQL 9.5 add pg_rewind for Fast align for PostgreSQL unaligned primary & standby》

《MTK使用 - PG,PPAS,oracle,mysql,ms sql,sybase 迁移到 PG, PPAS (支持跨版本升级)》

digoal’s 大量PostgreSQL文章入口