Migrate Data from TiDB to MySQL-compatible Databases
This document describes how to migrate data from TiDB clusters to MySQL-compatible databases, such as Aurora, MySQL, and MariaDB. The whole process contains four steps:
- Set up the environment.
- Migrate full data.
- Migrate incremental data.
- Migrate services to the MySQL-compatible cluster.
Step 1. Set up the environment
Deploy a TiDB cluster upstream.
Deploy a TiDB cluster by using TiUP Playground. For more information, refer to Deploy and Maintain an Online TiDB Cluster Using TiUP.
# Create a TiDB cluster
tiup playground --db 1 --pd 1 --kv 1 --tiflash 0 --ticdc 1
# View cluster status
tiup status
Deploy a MySQL instance downstream.
In a lab environment, you can use Docker to quickly deploy a MySQL instance by running the following command:
docker run --name some-mysql -e MYSQL_ROOT_PASSWORD=my-secret-pw -p 3306:3306 -d mysql
In a production environment, you can deploy a MySQL instance by following instructions in Installing MySQL.
Simulate service workload.
In the lab environment, you can use
go-tpc
to write data to the TiDB cluster upstream. This is to generate event changes in the TiDB cluster. Run the following command to create a database namedtpcc
in the TiDB cluster, and then use TiUP bench to write data to this database.tiup bench tpcc -H 127.0.0.1 -P 4000 -D tpcc --warehouses 4 prepare
tiup bench tpcc -H 127.0.0.1 -P 4000 -D tpcc --warehouses 4 run --time 300s
For more details about
go-tpc
, refer to How to Run TPC-C Test on TiDB.
Step 2. Migrate full data
After setting up the environment, you can use Dumpling to export the full data from the upstream TiDB cluster.
Note
In production clusters, performing a backup with GC disabled might affect cluster performance. It is recommended that you complete this step in off-peak hours.
Disable Garbage Collection (GC).
To ensure that newly written data is not deleted during incremental migration, you should disable GC for the upstream cluster before exporting full data. In this way, history data is not deleted. For TiDB v4.0.0 and later versions, Dumpling might automatically extend the GC time. Nevertheless, manually disabling GC is still necessary because the GC process might begin after Dumpling exits, leading to the failure of incremental changes migration.
Run the following command to disable GC:
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE;
Query OK, 0 rows affected (0.01 sec)
To verify that the change takes effect, query the value of
tidb_gc_enable
:MySQL [test]> SELECT @@global.tidb_gc_enable;
+-------------------------+:
| @@global.tidb_gc_enable |
+-------------------------+
| 0 |
+-------------------------+
1 row in set (0.00 sec)
Back up data.
Export data in SQL format using Dumpling:
tiup dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o ./dumpling_output -r 200000 -F256MiB
After finishing exporting data, run the following command to check the metadata.
Pos
in the metadata is the TSO of the export snapshot and can be recorded as the BackupTS.cat dumpling_output/metadata
Started dump at: 2022-06-28 17:49:54
SHOW MASTER STATUS:
Log: tidb-binlog
Pos: 434217889191428107
GTID:
Finished dump at: 2022-06-28 17:49:57
Restore data.
Use MyLoader (an open-source tool) to import data to the downstream MySQL instance. For details about how to install and use MyLoader, see MyDumpler/MyLoader. Note that you need to use MyLoader v0.10 or earlier versions. Higher versions cannot process metadata files exported by Dumpling.
Run the following command to import full data exported by Dumpling to MySQL:
myloader -h 127.0.0.1 -P 3306 -d ./dumpling_output/
(Optional) Validate data.
You can use sync-diff-inspector to check data consistency between upstream and downstream at a certain time.
sync_diff_inspector -C ./config.yaml
For details about how to configure the sync-diff-inspector, see Configuration file description. In this document, the configuration is as follows:
# Diff Configuration.
######################### Datasource config #########################
[data-sources]
[data-sources.upstream]
host = "127.0.0.1" # Replace the value with the IP address of your upstream cluster
port = 4000
user = "root"
password = ""
snapshot = "434217889191428107" # Set snapshot to the actual backup time (BackupTS in the "Back up data" section in [Step 2. Migrate full data](#step-2-migrate-full-data))
[data-sources.downstream]
host = "127.0.0.1" # Replace the value with the IP address of your downstream cluster
port = 3306
user = "root"
password = ""
######################### Task config #########################
[task]
output-dir = "./output"
source-instances = ["upstream"]
target-instance = "downstream"
target-check-tables = ["*.*"]
Step 3. Migrate incremental data
Deploy TiCDC.
After finishing full data migration, deploy and configure a TiCDC cluster to replicate incremental data. In production environments, deploy TiCDC as instructed in Deploy TiCDC. In this document, a TiCDC node has been started upon the creation of the test cluster. Therefore, you can skip the step of deploying TiCDC and proceed with the next step to create a changefeed.
Create a changefeed.
In the upstream cluster, run the following command to create a changefeed from the upstream to the downstream clusters:
tiup cdc:v<CLUSTER_VERSION> cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="mysql://root:@127.0.0.1:3306" --changefeed-id="upstream-to-downstream" --start-ts="434217889191428107"
In this command, the parameters are as follows:
--server
: IP address of any node in the TiCDC cluster--sink-uri
: URI of the downstream cluster--changefeed-id
: changefeed ID, must be in the format of a regular expression,^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$
--start-ts
: start timestamp of the changefeed, must be the backup time (or BackupTS in the “Back up data” section in Step 2. Migrate full data)
For more information about the changefeed configurations, see Task configuration file.
Enable GC.
In incremental migration using TiCDC, GC only removes history data that is replicated. Therefore, after creating a changefeed, you need to run the following command to enable GC. For details, see What is the complete behavior of TiCDC garbage collection (GC) safepoint.
To enable GC, run the following command:
MySQL [test]> SET GLOBAL tidb_gc_enable=TRUE;
Query OK, 0 rows affected (0.01 sec)
To verify that the change takes effect, query the value of
tidb_gc_enable
:MySQL [test]> SELECT @@global.tidb_gc_enable;
+-------------------------+
| @@global.tidb_gc_enable |
+-------------------------+
| 1 |
+-------------------------+
1 row in set (0.00 sec)
Step 4. Migrate services
After creating a changefeed, data written to the upstream cluster is replicated to the downstream cluster with low latency. You can migrate read traffic to the downstream cluster gradually. Observe the read traffic for a period. If the downstream cluster is stable, you can migrate write traffic to the downstream cluster as well in the following steps:
Stop write services in the upstream cluster. Make sure that all upstream data are replicated to downstream before stopping the changefeed.
# Stop the changefeed from the upstream cluster to the downstream cluster
tiup cdc cli changefeed pause -c "upstream-to-downstream" --pd=http://172.16.6.122:2379
# View the changefeed status
tiup cdc cli changefeed list
[
{
"id": "upstream-to-downstream",
"summary": {
"state": "stopped", # Ensure that the status is stopped
"tso": 434218657561968641,
"checkpoint": "2022-06-28 18:38:45.685", # This time should be later than the time of stopping writing
"error": null
}
}
]
After migrating writing services to the downstream cluster, observe for a period. If the downstream cluster is stable, you can discard the upstream cluster.