Data Check in the Sharding Scenario

sync-diff-inspector supports data check in the sharding scenario. Assume that you use the TiDB Data Migration tool to replicate data from multiple MySQL instances into TiDB, you can use sync-diff-inspector to check upstream and downstream data.

For scenarios where the number of upstream sharded tables is small and the naming rules of sharded tables do not have a pattern as shown below, you can use Datasource config to configure table-0, set corresponding rules and configure the tables that have the mapping relationship between the upstream and downstream databases. This configuration method requires setting all sharded tables.

shard-table-replica-1

Below is a complete example of the sync-diff-inspector configuration.

  1. # Diff Configuration.
  2. ######################### Global config #########################
  3. # The number of goroutines created to check data. The number of connections between upstream and downstream databases are slightly greater than this value
  4. check-thread-count = 4
  5. # If enabled, SQL statements is exported to fix inconsistent tables
  6. export-fix-sql = true
  7. # Only compares the table structure instead of the data
  8. check-struct-only = false
  9. ######################### Datasource config #########################
  10. [data-sources.mysql1]
  11. host = "127.0.0.1"
  12. port = 3306
  13. user = "root"
  14. password = ""
  15. route-rules = ["rule1"]
  16. [data-sources.mysql2]
  17. host = "127.0.0.1"
  18. port = 3306
  19. user = "root"
  20. password = ""
  21. route-rules = ["rule2"]
  22. [data-sources.tidb0]
  23. host = "127.0.0.1"
  24. port = 4000
  25. user = "root"
  26. password = ""
  27. ########################### Routes ###########################
  28. [routes.rule1]
  29. schema-pattern = "test" # Matches the schema name of the data source. Supports the wildcards "*" and "?"
  30. table-pattern = "table-[1-2]" # Matches the table name of the data source. Supports the wildcards "*" and "?"
  31. target-schema = "test" # The name of the schema in the target database
  32. target-table = "table-0" # The name of the target table
  33. [routes.rule2]
  34. schema-pattern = "test" # Matches the schema name of the data source. Supports the wildcards "*" and "?"
  35. table-pattern = "table-3" # Matches the table name of the data source. Supports the wildcards "*" and "?"
  36. target-schema = "test" # The name of the schema in the target database
  37. target-table = "table-0" # The name of the target table
  38. ######################### Task config #########################
  39. [task]
  40. output-dir = "./output"
  41. source-instances = ["mysql1", "mysql2"]
  42. target-instance = "tidb0"
  43. # The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.'
  44. target-check-tables = ["test.table-0"]

You can use table-rules for configuration when there are a large number of upstream sharded tables and the naming rules of all sharded tables have a pattern, as shown below:

shard-table-replica-2

Below is a complete example of the sync-diff-inspector configuration.

  1. # Diff Configuration.
  2. ######################### Global config #########################
  3. # The number of goroutines created to check data. The number of connections between upstream and downstream databases are slightly greater than this value.
  4. check-thread-count = 4
  5. # If enabled, SQL statements is exported to fix inconsistent tables.
  6. export-fix-sql = true
  7. # Only compares the table structure instead of the data.
  8. check-struct-only = false
  9. ######################### Datasource config #########################
  10. [data-sources.mysql1]
  11. host = "127.0.0.1"
  12. port = 3306
  13. user = "root"
  14. password = ""
  15. [data-sources.mysql2]
  16. host = "127.0.0.1"
  17. port = 3306
  18. user = "root"
  19. password = ""
  20. [data-sources.tidb0]
  21. host = "127.0.0.1"
  22. port = 4000
  23. user = "root"
  24. password = ""
  25. ########################### Routes ###########################
  26. [routes.rule1]
  27. schema-pattern = "test" # Matches the schema name of the data source. Supports the wildcards "*" and "?"
  28. table-pattern = "table-*" # Matches the table name of the data source. Supports the wildcards "*" and "?"
  29. target-schema = "test" # The name of the schema in the target database
  30. target-table = "table-0" # The name of the target table
  31. ######################### Task config #########################
  32. [task]
  33. output-dir = "./output"
  34. source-instances = ["mysql1", "mysql2"]
  35. target-instance = "tidb0"
  36. # The tables of downstream databases to be compared. Each table needs to contain the schema name and the table name, separated by '.'
  37. target-check-tables = ["test.table-0"]

Note

If test.table-0 exists in the upstream database, the downstream database also compares this table.