DM 分库分表合并场景

本文介绍如何在分库分表合并场景中使用 Data Migration (DM)。使用场景中,三个上游 MySQL 实例的分库和分表数据需要同步至下游 TiDB 集群。

上游实例

假设上游库结构如下:

  • 实例 1

    | Schema | Tables | |:———|:———| | user | information, log_north, log_bak | | store_01 | sale_01, sale_02 | | store_02 | sale_01, sale_02 |

  • 实例 2

    | Schema | Tables | |:———|:———| | user | information, log_east, log_bak | | store_01 | sale_01, sale_02 | | store_02 | sale_01, sale_02 |

  • 实例 3

    | Schema | Tables | |:———|:———| | user | information, log_south, log_bak | | store_01 | sale_01, sale_02 | | store_02 | sale_01, sale_02 |

同步需求

  1. 合并三个实例中的 user.information 表至下游 TiDB 中的 user.information 表。
  2. 合并三个实例中的 user.log_{north|south|east} 表至下游TiDB中的 user.log_{north|south|east} 表。
  3. 合并三个实例中的 store_{01|02}.sale_{01|02} 表至下游TiDB中的 store.sale 表。
  4. 过滤掉三个实例的 user.log_{north|south|east} 表的所有删除操作。
  5. 过滤掉三个实例的 user.information 表的所有删除操作。
  6. 过滤掉三个实例的 store_{01|02}.sale_{01|02} 表的所有删除操作。
  7. 过滤掉三个实例的 user.log_bak 表。
  8. 因为 store_{01|02}.sale_{01|02} 表带有 bigint 型的自增主键,将其合并至 TiDB 时会引发冲突。您需要有方案修改相应自增主键以避免冲突。

下游实例

假设同步后下游库结构如下:

Schema Tables
user information, log_north, log_east, log_south
store sale

同步方案

  • 要满足同步需求 #1 和 #2,配置 Table routing 规则 如下:

    1. routes:
    2. ...
    3. user-route-rule:
    4. schema-pattern: "user"
    5. target-schema: "user"
  • 要满足同步需求 #3,配置 table routing 规则 如下:

    1. routes:
    2. ...
    3. store-route-rule:
    4. schema-pattern: "store_*"
    5. target-schema: "store"
    6. sale-route-rule:
    7. schema-pattern: "store_*"
    8. table-pattern: "sale_*"
    9. target-schema: "store"
    10. target-table: "sale"
  • 要满足同步需求 #4 和 #5,配置 Binlog event filter 规则 如下:

    1. filters:
    2. ...
    3. user-filter-rule:
    4. schema-pattern: "user"
    5. events: ["truncate table", "drop table", "delete", "drop database"]
    6. action: Ignore

    注意:

    同步需求 #4、#5 和 #7 的操作意味着过滤掉所有对 user 库的删除操作,所以此处配置了库级别的过滤规则。但是 user 库以后加入表的删除操作也都会被过滤。

  • 要满足同步需求 #6,配置 Binlog event filter 规则 如下:

    1. filters:
    2. ...
    3. sale-filter-rule:
    4. schema-pattern: "store_*"
    5. table-pattern: "sale_*"
    6. events: ["truncate table", "drop table", "delete"]
    7. action: Ignore
    8. store-filter-rule:
    9. schema-pattern: "store_*"
    10. events: ["drop database"]
    11. action: Ignore
  • 要满足同步需求 #7,配置 Black & white table lists 如下:

    1. black-white-list:
    2. log-bak-ignored:
    3. ignore-tables:
    4. - db-name: "user"
    5. tbl-name: "log_bak"
  • 要满足同步需求 #8,配置 column mapping 规则 如下:

    1. column-mappings:
    2. instance-1-sale:
    3. schema-pattern: "store_*"
    4. table-pattern: "sale_*"
    5. expression: "partition id"
    6. source-column: "id"
    7. target-column: "id"
    8. arguments: ["1", "store", "sale", "_"]
    9. instance-2-sale:
    10. schema-pattern: "store_*"
    11. table-pattern: "sale_*"
    12. expression: "partition id"
    13. source-column: "id"
    14. target-column: "id"
    15. arguments: ["2", "store", "sale", "_"]
    16. instance-3-sale:
    17. schema-pattern: "store_*"
    18. table-pattern: "sale_*"
    19. expression: "partition id"
    20. source-column: "id"
    21. target-column: "id"
    22. arguments: ["3", "store", "sale", "_"]

同步任务配置

同步任务的完整配置如下。详情请参阅 Data Migration 任务配置文件

  1. name: "shard_merge"
  2. task-mode: all
  3. meta-schema: "dm_meta"
  4. remove-meta: false
  5. target-database:
  6. host: "192.168.0.1"
  7. port: 4000
  8. user: "root"
  9. password: ""
  10. mysql-instances:
  11. -
  12. source-id: "instance-1"
  13. route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
  14. filter-rules: ["user-filter-rule", "store-filter-rule", "sale-filter-rule"]
  15. column-mapping-rules: ["instance-1-sale"]
  16. black-white-list: "log-bak-ignored"
  17. mydumper-config-name: "global"
  18. loader-config-name: "global"
  19. syncer-config-name: "global"
  20. -
  21. source-id: "instance-2"
  22. route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
  23. filter-rules: ["user-filter-rule", "store-filter-rule", "sale-filter-rule"]
  24. column-mapping-rules: ["instance-2-sale"]
  25. black-white-list: "log-bak-ignored"
  26. mydumper-config-name: "global"
  27. loader-config-name: "global"
  28. syncer-config-name: "global"
  29. -
  30. source-id: "instance-3"
  31. route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
  32. filter-rules: ["user-filter-rule", "store-filter-rule", "sale-filter-rule"]
  33. column-mapping-rules: ["instance-3-sale"]
  34. black-white-list: "log-bak-ignored"
  35. mydumper-config-name: "global"
  36. loader-config-name: "global"
  37. syncer-config-name: "global"
  38. # 所有实例共享的其他通用配置
  39. routes:
  40. user-route-rule:
  41. schema-pattern: "user"
  42. target-schema: "user"
  43. store-route-rule:
  44. schema-pattern: "store_*"
  45. target-schema: "store"
  46. sale-route-rule:
  47. schema-pattern: "store_*"
  48. table-pattern: "sale_*"
  49. target-schema: "store"
  50. target-table: "sale"
  51. filters:
  52. user-filter-rule:
  53. schema-pattern: "user"
  54. events: ["truncate table", "drop table", "delete", "drop database"]
  55. action: Ignore
  56. sale-filter-rule:
  57. schema-pattern: "store_*"
  58. table-pattern: "sale_*"
  59. events: ["truncate table", "drop table", "delete"]
  60. action: Ignore
  61. store-filter-rule:
  62. schema-pattern: "store_*"
  63. events: ["drop database"]
  64. action: Ignore
  65. black-white-list:
  66. log-bak-ignored:
  67. ignore-tables:
  68. - db-name: "user"
  69. tbl-name: "log_bak"
  70. column-mappings:
  71. instance-1-sale:
  72. schema-pattern: "store_*"
  73. table-pattern: "sale_*"
  74. expression: "partition id"
  75. source-column: "id"
  76. target-column: "id"
  77. arguments: ["1", "store", "sale", "_"]
  78. instance-2-sale:
  79. schema-pattern: "store_*"
  80. table-pattern: "sale_*"
  81. expression: "partition id"
  82. source-column: "id"
  83. target-column: "id"
  84. arguments: ["2", "store", "sale", "_"]
  85. instance-3-sale:
  86. schema-pattern: "store_*"
  87. table-pattern: "sale_*"
  88. expression: "partition id"
  89. source-column: "id"
  90. target-column: "id"
  91. arguments: ["3", "store", "sale", "_"]
  92. mydumpers:
  93. global:
  94. threads: 4
  95. chunk-filesize: 64
  96. skip-tz-utc: true
  97. loaders:
  98. global:
  99. pool-size: 16
  100. dir: "./dumped_data"
  101. syncers:
  102. global:
  103. worker-count: 16
  104. batch: 100
  105. max-retry: 100