背景

PostgreSQL目前默认的存储引擎在更新记录时,会在堆内产生一条新版本,旧版本在不需要使用后VACUUM回收,回收旧版本前,需要先回收所有关联这个版本的所有索引POINT。

PG的索引的KEY为索引字段或表达式的值,VALUE为行号。

8.3以前,每个TUPLE版本(行号)都有对应的索引POINT,因此更新的放大比较大。

8.3开始,引入了HOT的概念,当更新记录时,如果能满足两个条件时,通过HEAP PAGE内部LINK来串起所有TUPLE版本,因此索引不变。

HOT必须满足如下两个条件:

  1. Necessary Condition A: UPDATE does not change any of the index keys
  2. Necessary Condition B: The new version should fit in the same old block HOT chains can not cross block boundary.
  3. 1、索引字段的值不变。(其中任意一个索引字段的值发生了变化,则所有索引都需要新增版本)
  4. 2、新的版本与旧的版本在同一个HEAP PAGE中。

10以后,可以使用二级索引来解决更新引入的索引放大问题:

《PostgreSQL 10.0 preview 性能增强 - 间接索引(secondary index)》

HOT 实例解说

1、创建测试表,写入10条测试数据

  1. postgres=# create table a(id int, c1 int, c2 int, c3 int);
  2. CREATE TABLE
  3. postgres=# insert into a select generate_series(1,10), random()*100, random()*100, random()*100;
  4. INSERT 0 10

2、创建索引

  1. postgres=# create index idx_a_1 on a (id);
  2. CREATE INDEX
  3. postgres=# create index idx_a_2 on a (c1);
  4. CREATE INDEX
  5. postgres=# create index idx_a_3 on a (c2);
  6. CREATE INDEX

3、通过pageinspect插件观察索引页内容

  1. postgres=# SELECT * FROM bt_metap('idx_a_1');
  2. magic | version | root | level | fastroot | fastlevel
  3. --------+---------+------+-------+----------+-----------
  4. 340322 | 2 | 1 | 0 | 1 | 0
  5. (1 row)
  6. postgres=# SELECT * FROM bt_metap('idx_a_2');
  7. magic | version | root | level | fastroot | fastlevel
  8. --------+---------+------+-------+----------+-----------
  9. 340322 | 2 | 1 | 0 | 1 | 0
  10. (1 row)
  11. postgres=# SELECT * FROM bt_metap('idx_a_3');
  12. magic | version | root | level | fastroot | fastlevel
  13. --------+---------+------+-------+----------+-----------
  14. 340322 | 2 | 1 | 0 | 1 | 0
  15. (1 row)
  1. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  2. itemoffset | ctid | itemlen | nulls | vars | data
  3. ------------+--------+---------+-------+------+-------------------------
  4. 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
  5. 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  6. 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  7. 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  8. 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  9. 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  10. 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  11. 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  12. 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  13. 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  14. (10 rows)
  15. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  16. itemoffset | ctid | itemlen | nulls | vars | data
  17. ------------+--------+---------+-------+------+-------------------------
  18. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  19. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  20. 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00
  21. 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  22. 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  23. 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  24. 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  25. 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  26. 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  27. 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  28. (10 rows)
  29. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  30. itemoffset | ctid | itemlen | nulls | vars | data
  31. ------------+--------+---------+-------+------+-------------------------
  32. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  33. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  34. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  35. 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00
  36. 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  37. 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  38. 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  39. 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  40. 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  41. 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  42. (10 rows)

4、HOT更新,(更新的字段上没有索引,并且新的版本记录在同一个HEAP PAGE上)

  1. postgres=# update a set c3=c3+1 where id=1;
  2. UPDATE 1
  3. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  4. itemoffset | ctid | itemlen | nulls | vars | data
  5. ------------+--------+---------+-------+------+-------------------------
  6. 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
  7. 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  8. 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  9. 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  10. 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  11. 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  12. 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  13. 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  14. 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  15. 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  16. (10 rows)
  17. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  18. itemoffset | ctid | itemlen | nulls | vars | data
  19. ------------+--------+---------+-------+------+-------------------------
  20. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  21. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  22. 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00
  23. 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  24. 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  25. 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  26. 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  27. 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  28. 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  29. 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  30. (10 rows)
  31. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  32. itemoffset | ctid | itemlen | nulls | vars | data
  33. ------------+--------+---------+-------+------+-------------------------
  34. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  35. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  36. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  37. 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00
  38. 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  39. 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  40. 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  41. 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  42. 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  43. 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  44. (10 rows)
  1. postgres=# update a set c3=c3 where id=1 returning ctid,*;
  2. ctid | id | c1 | c2 | c3
  3. --------+----+----+----+----
  4. (0,13) | 1 | 13 | 20 | 15
  5. (1 row)
  6. UPDATE 1
  7. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  8. itemoffset | ctid | itemlen | nulls | vars | data
  9. ------------+--------+---------+-------+------+-------------------------
  10. 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
  11. 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  12. 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  13. 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  14. 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  15. 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  16. 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  17. 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  18. 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  19. 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  20. (10 rows)
  21. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  22. itemoffset | ctid | itemlen | nulls | vars | data
  23. ------------+--------+---------+-------+------+-------------------------
  24. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  25. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  26. 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00
  27. 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  28. 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  29. 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  30. 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  31. 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  32. 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  33. 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  34. (10 rows)
  35. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  36. itemoffset | ctid | itemlen | nulls | vars | data
  37. ------------+--------+---------+-------+------+-------------------------
  38. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  39. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  40. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  41. 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00
  42. 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  43. 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  44. 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  45. 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  46. 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  47. 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  48. (10 rows)
  1. postgres=# update a set c3=c3 where id=1;
  2. UPDATE 1
  3. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  4. itemoffset | ctid | itemlen | nulls | vars | data
  5. ------------+--------+---------+-------+------+-------------------------
  6. 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
  7. 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  8. 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  9. 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  10. 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  11. 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  12. 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  13. 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  14. 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  15. 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  16. (10 rows)
  17. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  18. itemoffset | ctid | itemlen | nulls | vars | data
  19. ------------+--------+---------+-------+------+-------------------------
  20. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  21. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  22. 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00
  23. 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  24. 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  25. 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  26. 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  27. 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  28. 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  29. 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  30. (10 rows)
  31. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  32. itemoffset | ctid | itemlen | nulls | vars | data
  33. ------------+--------+---------+-------+------+-------------------------
  34. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  35. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  36. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  37. 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00
  38. 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  39. 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  40. 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  41. 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  42. 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  43. 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  44. (10 rows)

5、HOT更新,有索引的字段被更新,但是值不变。并且新版本在同一个PAGE里。

  1. postgres=# update a set c2=c2 where id=1 returning ctid,*;
  2. ctid | id | c1 | c2 | c3
  3. --------+----+----+----+----
  4. (0,14) | 1 | 13 | 20 | 15
  5. (1 row)
  6. UPDATE 1
  7. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  8. itemoffset | ctid | itemlen | nulls | vars | data
  9. ------------+--------+---------+-------+------+-------------------------
  10. 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
  11. 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  12. 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  13. 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  14. 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  15. 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  16. 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  17. 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  18. 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  19. 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  20. (10 rows)
  21. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  22. itemoffset | ctid | itemlen | nulls | vars | data
  23. ------------+--------+---------+-------+------+-------------------------
  24. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  25. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  26. 3 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00
  27. 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  28. 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  29. 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  30. 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  31. 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  32. 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  33. 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  34. (10 rows)
  35. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  36. itemoffset | ctid | itemlen | nulls | vars | data
  37. ------------+--------+---------+-------+------+-------------------------
  38. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  39. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  40. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  41. 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00
  42. 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  43. 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  44. 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  45. 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  46. 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  47. 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  48. (10 rows)

以上HOT更新,所有索引都没有发生变化。

6、NON-HOT更新,更新了索引字段的值,所有索引都发生了变化,至少发生了3个索引IO。

  1. postgres=# update a set c2=c2+1 where id=1 returning ctid,*;
  2. ctid | id | c1 | c2 | c3
  3. --------+----+----+----+----
  4. (0,15) | 1 | 13 | 21 | 15
  5. (1 row)
  6. UPDATE 1
  7. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  8. itemoffset | ctid | itemlen | nulls | vars | data
  9. ------------+--------+---------+-------+------+-------------------------
  10. 1 | (0,15) | 16 | f | f | 01 00 00 00 00 00 00 00
  11. 2 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
  12. 3 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  13. 4 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  14. 5 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  15. 6 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  16. 7 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  17. 8 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  18. 9 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  19. 10 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  20. 11 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  21. (11 rows)
  22. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  23. itemoffset | ctid | itemlen | nulls | vars | data
  24. ------------+--------+---------+-------+------+-------------------------
  25. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  26. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  27. 3 | (0,15) | 16 | f | f | 0d 00 00 00 00 00 00 00
  28. 4 | (0,1) | 16 | f | f | 0d 00 00 00 00 00 00 00
  29. 5 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  30. 6 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  31. 7 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  32. 8 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  33. 9 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  34. 10 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  35. 11 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  36. (11 rows)
  37. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  38. itemoffset | ctid | itemlen | nulls | vars | data
  39. ------------+--------+---------+-------+------+-------------------------
  40. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  41. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  42. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  43. 4 | (0,1) | 16 | f | f | 14 00 00 00 00 00 00 00
  44. 5 | (0,15) | 16 | f | f | 15 00 00 00 00 00 00 00
  45. 6 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  46. 7 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  47. 8 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  48. 9 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  49. 10 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  50. 11 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  51. (11 rows)

7、垃圾回收,首先回收索引垃圾版本,最后回收表的垃圾版本。

  1. postgres=# vacuum verbose a;
  2. INFO: vacuuming "public.a"
  3. INFO: scanned index "idx_a_1" to remove 1 row versions
  4. DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
  5. INFO: scanned index "idx_a_2" to remove 1 row versions
  6. DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
  7. INFO: scanned index "idx_a_3" to remove 1 row versions
  8. DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
  9. INFO: "a": removed 1 row versions in 1 pages
  10. DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
  11. INFO: index "idx_a_1" now contains 10 row versions in 2 pages
  12. DETAIL: 1 index row versions were removed.
  13. 0 index pages have been deleted, 0 are currently reusable.
  14. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
  15. INFO: index "idx_a_2" now contains 10 row versions in 2 pages
  16. DETAIL: 1 index row versions were removed.
  17. 0 index pages have been deleted, 0 are currently reusable.
  18. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
  19. INFO: index "idx_a_3" now contains 10 row versions in 2 pages
  20. DETAIL: 1 index row versions were removed.
  21. 0 index pages have been deleted, 0 are currently reusable.
  22. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
  23. INFO: "a": found 5 removable, 10 nonremovable row versions in 1 out of 1 pages
  24. DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 716311280
  25. There were 4 unused item pointers.
  26. Skipped 0 pages due to buffer pins, 0 frozen pages.
  27. 0 pages are entirely empty.
  28. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
  29. VACUUM

8、垃圾回收后,索引的垃圾版本被清除。

  1. postgres=# SELECT * FROM bt_page_items('idx_a_1',1);
  2. itemoffset | ctid | itemlen | nulls | vars | data
  3. ------------+--------+---------+-------+------+-------------------------
  4. 1 | (0,15) | 16 | f | f | 01 00 00 00 00 00 00 00
  5. 2 | (0,2) | 16 | f | f | 02 00 00 00 00 00 00 00
  6. 3 | (0,3) | 16 | f | f | 03 00 00 00 00 00 00 00
  7. 4 | (0,4) | 16 | f | f | 04 00 00 00 00 00 00 00
  8. 5 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  9. 6 | (0,6) | 16 | f | f | 06 00 00 00 00 00 00 00
  10. 7 | (0,7) | 16 | f | f | 07 00 00 00 00 00 00 00
  11. 8 | (0,8) | 16 | f | f | 08 00 00 00 00 00 00 00
  12. 9 | (0,9) | 16 | f | f | 09 00 00 00 00 00 00 00
  13. 10 | (0,10) | 16 | f | f | 0a 00 00 00 00 00 00 00
  14. (10 rows)
  15. postgres=# SELECT * FROM bt_page_items('idx_a_2',1);
  16. itemoffset | ctid | itemlen | nulls | vars | data
  17. ------------+--------+---------+-------+------+-------------------------
  18. 1 | (0,5) | 16 | f | f | 05 00 00 00 00 00 00 00
  19. 2 | (0,7) | 16 | f | f | 06 00 00 00 00 00 00 00
  20. 3 | (0,15) | 16 | f | f | 0d 00 00 00 00 00 00 00
  21. 4 | (0,9) | 16 | f | f | 0e 00 00 00 00 00 00 00
  22. 5 | (0,4) | 16 | f | f | 20 00 00 00 00 00 00 00
  23. 6 | (0,3) | 16 | f | f | 24 00 00 00 00 00 00 00
  24. 7 | (0,6) | 16 | f | f | 28 00 00 00 00 00 00 00
  25. 8 | (0,10) | 16 | f | f | 2a 00 00 00 00 00 00 00
  26. 9 | (0,2) | 16 | f | f | 3f 00 00 00 00 00 00 00
  27. 10 | (0,8) | 16 | f | f | 55 00 00 00 00 00 00 00
  28. (10 rows)
  29. postgres=# SELECT * FROM bt_page_items('idx_a_3',1);
  30. itemoffset | ctid | itemlen | nulls | vars | data
  31. ------------+--------+---------+-------+------+-------------------------
  32. 1 | (0,6) | 16 | f | f | 09 00 00 00 00 00 00 00
  33. 2 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
  34. 3 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
  35. 4 | (0,15) | 16 | f | f | 15 00 00 00 00 00 00 00
  36. 5 | (0,4) | 16 | f | f | 22 00 00 00 00 00 00 00
  37. 6 | (0,10) | 16 | f | f | 2b 00 00 00 00 00 00 00
  38. 7 | (0,3) | 16 | f | f | 30 00 00 00 00 00 00 00
  39. 8 | (0,9) | 16 | f | f | 33 00 00 00 00 00 00 00
  40. 9 | (0,7) | 16 | f | f | 46 00 00 00 00 00 00 00
  41. 10 | (0,2) | 16 | f | f | 4b 00 00 00 00 00 00 00
  42. (10 rows)

性能测试

1、创建测试表32个索引。

  1. do language plpgsql $$
  2. declare
  3. sql text;
  4. begin
  5. sql := 'create table a (id int primary key,';
  6. for i in 1..32 loop
  7. sql := sql||'c'||i||' int default random()*1000,';
  8. end loop;
  9. sql := rtrim(sql,',');
  10. sql := sql||') with (fillfactor=80)';
  11. execute sql;
  12. for i in 2..32 loop
  13. execute 'create index idx_a_c'||i||' on a (c'||i||')';
  14. end loop;
  15. end;
  16. $$;

2、写入1000万记录

  1. insert into a (id) select generate_series(1,10000000);

3、non-hot更新

  1. vi test_non_hot.sql
  2. \set id random(1,10000000)
  3. update a set c2=c2+random()*100-100 where id=:id;

4、HOT更新

  1. vi test_hot1.sql
  2. \set id random(1,10000000)
  3. update a set c1=c1+random()*100-100 where id=:id;
  4. vi test_hot2.sql
  5. \set id random(1,10000000)
  6. update a set c2=c2 where id=:id;

5、性能对比

5.1、HOT

  1. pgbench -M prepared -n -r -P 1 -f ./test_hot1.sql -c 28 -j 28 -T 120
  2. transaction type: ./test_hot1.sql
  3. scaling factor: 1
  4. query mode: prepared
  5. number of clients: 28
  6. number of threads: 28
  7. duration: 120 s
  8. number of transactions actually processed: 9139010
  9. latency average = 0.368 ms
  10. latency stddev = 0.187 ms
  11. tps = 76157.798606 (including connections establishing)
  12. tps = 76174.469712 (excluding connections establishing)
  13. script statistics:
  14. - statement latencies in milliseconds:
  15. 0.001 \set id random(1,10000000)
  16. 0.366 update a set c1=c1+random()*100-100 where id=:id;
  17. Total DISK READ : 0.00 B/s | Total DISK WRITE : 13.14 M/s
  18. Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 13.82 M/s
  19. TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
  20. 45828 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 1.06 % postgres: postgres postgres 127.0.0.1(41326) UPDATE
  21. 45810 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.99 % postgres: postgres postgres 127.0.0.1(41290) UPDATE
  22. 45821 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.98 % postgres: postgres postgres 127.0.0.1(41312) idle
  23. 45820 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.96 % postgres: postgres postgres 127.0.0.1(41310) UPDATE
  24. 45822 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.96 % postgres: postgres postgres 127.0.0.1(41314) UPDATE
  25. 45819 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.94 % postgres: postgres postgres 127.0.0.1(41308) UPDATE
  26. 45806 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.92 % postgres: postgres postgres 127.0.0.1(41282) idle
  27. 45824 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.90 % postgres: postgres postgres 127.0.0.1(41318) UPDATE
  28. 45827 be/4 postgres 0.00 B/s 54.58 K/s 0.00 % 0.89 % postgres: postgres postgres 127.0.0.1(41324) UPDATE
  29. 45814 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.89 % postgres: postgres postgres 127.0.0.1(41298) UPDATE
  30. 45818 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.88 % postgres: postgres postgres 127.0.0.1(41306) idle
  31. 45823 be/4 postgres 0.00 B/s 15.59 K/s 0.00 % 0.87 % postgres: postgres postgres 127.0.0.1(41316) UPDATE
  32. 45805 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.87 % postgres: postgres postgres 127.0.0.1(41280) UPDATE
  33. 45826 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.86 % postgres: postgres postgres 127.0.0.1(41322) UPDATE
  34. 45809 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.86 % postgres: postgres postgres 127.0.0.1(41288) UPDATE
  35. 45808 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.85 % postgres: postgres postgres 127.0.0.1(41286) UPDATE
  36. 45825 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.85 % postgres: postgres postgres 127.0.0.1(41320) UPDATE
  37. 45804 be/4 postgres 0.00 B/s 23.39 K/s 0.00 % 0.84 % postgres: postgres postgres 127.0.0.1(41278) UPDATE
  38. 49040 be/4 postgres 0.00 B/s 12.85 M/s 0.00 % 0.84 % postgres: wal writer process
  39. 45816 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.66 % postgres: postgres postgres 127.0.0.1(41302) idle
  40. 45829 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.64 % postgres: postgres postgres 127.0.0.1(41328) UPDATE
  41. 45803 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.62 % postgres: postgres postgres 127.0.0.1(41276) BIND
  42. 45795 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.59 % postgres: postgres postgres 127.0.0.1(41274) UPDATE
  43. 45807 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.58 % postgres: postgres postgres 127.0.0.1(41284) UPDATE
  44. 45812 be/4 postgres 0.00 B/s 15.59 K/s 0.00 % 0.56 % postgres: postgres postgres 127.0.0.1(41294) UPDATE
  45. 45811 be/4 postgres 0.00 B/s 15.59 K/s 0.00 % 0.55 % postgres: postgres postgres 127.0.0.1(41292) UPDATE
  46. 45817 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.55 % postgres: postgres postgres 127.0.0.1(41304) UPDATE
  47. 45815 be/4 postgres 0.00 B/s 7.80 K/s 0.00 % 0.55 % postgres: postgres postgres 127.0.0.1(41300) UPDATE
  48. 45813 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.51 % postgres: postgres postgres 127.0.0.1(41296) UPDATE

5.2、NON-HOT

  1. pgbench -M prepared -n -r -P 1 -f ./test_non_hot.sql -c 28 -j 28 -T 120
  2. transaction type: ./test_non_hot.sql
  3. scaling factor: 1
  4. query mode: prepared
  5. number of clients: 28
  6. number of threads: 28
  7. duration: 120 s
  8. number of transactions actually processed: 6472445
  9. latency average = 0.519 ms
  10. latency stddev = 0.707 ms
  11. tps = 53922.273197 (including connections establishing)
  12. tps = 53933.908671 (excluding connections establishing)
  13. script statistics:
  14. - statement latencies in milliseconds:
  15. 0.002 \set id random(1,10000000)
  16. 0.517 update a set c2=c2+random()*100-100 where id=:id;
  17. Total DISK READ : 0.00 B/s | Total DISK WRITE : 191.66 M/s
  18. Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 142.11 M/s
  19. TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
  20. 49040 be/4 postgres 0.00 B/s 136.56 M/s 0.00 % 7.17 % postgres: wal writer process
  21. 45997 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.60 % postgres: postgres postgres 127.0.0.1(41384) BIND
  22. 45983 be/4 postgres 0.00 B/s 1903.25 K/s 0.00 % 1.56 % postgres: postgres postgres 127.0.0.1(41356) UPDATE
  23. 45977 be/4 postgres 0.00 B/s 1829.75 K/s 0.00 % 1.54 % postgres: postgres postgres 127.0.0.1(41344) UPDATE
  24. 45984 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.53 % postgres: postgres postgres 127.0.0.1(41358) UPDATE
  25. 45985 be/4 postgres 0.00 B/s 1616.99 K/s 0.00 % 1.50 % postgres: postgres postgres 127.0.0.1(41360) UPDATE
  26. 45986 be/4 postgres 0.00 B/s 1748.52 K/s 0.00 % 1.49 % postgres: postgres postgres 127.0.0.1(41362) UPDATE
  27. 45995 be/4 postgres 0.00 B/s 1616.99 K/s 0.00 % 1.47 % postgres: postgres postgres 127.0.0.1(41380) UPDATE
  28. 45988 be/4 postgres 0.00 B/s 1910.99 K/s 0.00 % 1.46 % postgres: postgres postgres 127.0.0.1(41366) UPDATE
  29. 45979 be/4 postgres 0.00 B/s 1763.99 K/s 0.00 % 1.46 % postgres: postgres postgres 127.0.0.1(41348) UPDATE
  30. 45976 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.45 % postgres: postgres postgres 127.0.0.1(41342) UPDATE
  31. 45982 be/4 postgres 0.00 B/s 1887.78 K/s 0.00 % 1.43 % postgres: postgres postgres 127.0.0.1(41354) UPDATE
  32. 45987 be/4 postgres 0.00 B/s 2019.31 K/s 0.00 % 1.42 % postgres: postgres postgres 127.0.0.1(41364) UPDATE
  33. 45992 be/4 postgres 0.00 B/s 1616.99 K/s 0.00 % 1.40 % postgres: postgres postgres 127.0.0.1(41374) UPDATE
  34. 45994 be/4 postgres 0.00 B/s 1941.94 K/s 0.00 % 1.38 % postgres: postgres postgres 127.0.0.1(41378) UPDATE
  35. 45990 be/4 postgres 0.00 B/s 1794.94 K/s 0.00 % 1.36 % postgres: postgres postgres 127.0.0.1(41370) UPDATE
  36. 45975 be/4 postgres 0.00 B/s 1934.20 K/s 0.00 % 1.35 % postgres: postgres postgres 127.0.0.1(41340) UPDATE
  37. 45974 be/4 postgres 0.00 B/s 1910.99 K/s 0.00 % 1.35 % postgres: postgres postgres 127.0.0.1(41338) UPDATE
  38. 45980 be/4 postgres 0.00 B/s 2003.83 K/s 0.00 % 1.28 % postgres: postgres postgres 127.0.0.1(41350) UPDATE
  39. 45991 be/4 postgres 0.00 B/s 1748.52 K/s 0.00 % 1.13 % postgres: postgres postgres 127.0.0.1(41372) UPDATE
  40. 45996 be/4 postgres 0.00 B/s 2003.83 K/s 0.00 % 1.04 % postgres: postgres postgres 127.0.0.1(41382) UPDATE
  41. 45993 be/4 postgres 0.00 B/s 1918.73 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41376) UPDATE
  42. 45972 be/4 postgres 0.00 B/s 1903.25 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41334) UPDATE
  43. 45978 be/4 postgres 0.00 B/s 1740.78 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41346) UPDATE
  44. 45998 be/4 postgres 0.00 B/s 1841.36 K/s 0.00 % 1.02 % postgres: postgres postgres 127.0.0.1(41386) UPDATE
  45. 45981 be/4 postgres 0.00 B/s 1818.15 K/s 0.00 % 1.01 % postgres: postgres postgres 127.0.0.1(41352) UPDATE
  46. 45989 be/4 postgres 0.00 B/s 1895.52 K/s 0.00 % 1.01 % postgres: postgres postgres 127.0.0.1(41368) UPDATE
  47. 45973 be/4 postgres 0.00 B/s 1941.94 K/s 0.00 % 0.99 % postgres: postgres postgres 127.0.0.1(41336) idle
  48. 45961 be/4 postgres 0.00 B/s 1872.31 K/s 0.00 % 0.96 % postgres: postgres postgres 127.0.0.1(41332) UPDATE
  49. 49039 be/4 postgres 0.00 B/s 4.37 M/s 0.00 % 0.00 % postgres: writer process
  50. 49036 be/4 postgres 0.00 B/s 3.87 K/s 0.00 % 0.00 % postgres: logger process

使用HOT技术,使得TPS从 53922 提升到了 76157 。IO资源消耗从 192MB/s 降低到了 14MB/s 。

小结

HOT、二级索引、zheap存储引擎,都可以解决更新引入的索引放大问题。

本文介绍了HOT,HOT必须满足如下两个条件:

  1. Necessary Condition A: UPDATE does not change any of the index keys
  2. Necessary Condition B: The new version should fit in the same old block HOT chains can not cross block boundary.
  3. 1、索引字段的值不变。(其中任意一个索引字段的值发生了变化,则所有索引都需要新增版本)
  4. 2、新的版本与旧的版本在同一个HEAP PAGE中。

参考

《PostgreSQL 11 preview - Surjective indexes - 索引HOT增强(表达式)update评估》

HOT Inside - The Technical Architecture

《PostgreSQL 10.0 preview 性能增强 - 间接索引(secondary index)》

https://github.com/postgres/postgres/blob/master/src/backend/access/heap/README.HOT

digoal’s 大量PostgreSQL文章入口