Exporting data from Vitess

Since VTGate supports the MySQL protocol, in many cases it is possible to use existing client utilities when connecting to Vitess. This includes using logical dump tools such as mysqldump, in certain cases.

This guide provides instructions on the required options when using these tools against a VTGate server for the purposes of exporting data from Vitess. It is recommended to follow the Backup and Restore guide for regular backups, since this method is performed directly on the tablet servers and is more efficient and safer for databases of any significant size. The dump methods that follow are typically not suitable for production backups, because Vitess does not implement all the locking constructs across a sharded database that are necessary to do a consistent logical backup while writing to the database. As a result, you will only be guaranteed to get a 100% consistent dump using these tools if you are sure that you are not writing to the database while running the dump.

mysqldump

The default invocation of mysqldump attempts to execute statements which are not supported by Vitess, such as attempting to lock tables and dump GTID coordinates. The following options are required when using the mysqldump binary from MySQL 5.7 to export data from the commerce keyspace:

  • --lock-tables=off: VTGate currently prohibits the syntax LOCK TABLES and UNLOCK TABLES.
  • --set-gtid-purged=OFF: mysqldump attemps to dump GTID coordinates of a server, but in the case of VTGate this does not make sense since it could be routing to multiple servers.
  • --no-tablespaces: This option disables dumping InnoDB tables by tablespace. This functionality is not yet supported by Vitess.
  • ....: Additional mysqldump options like: -u <user>, -p <password>, -h <database server hostname>.

For example to export the commerce keyspace using the mysqldump binary from MySQL 5.7:

  1. $ mysqldump --set-gtid-purged=OFF --no-tablespaces .... commerce > commerce.sql

Vitess’ support for LOCK and UNLOCK statements is currently syntax-only. As a result, Vitess will simply ignore LOCK and UNLOCK statements without taking any underlying action. It is therefore unsafe to perform a locking mysqldump against a database that is actively being written to, and you should pause writes completely while performing the dump; or be willing to deal with any data inconsistencies that result.

NOTE: You will be limited by the Vitess row limits in the size of the tables that you can dump using this method. The default Vitess row limit is determined by the vttablet option -queryserver-config-max-result-size and defaults to 10000 rows. So for an unsharded database, you will not be able to dump tables with more than 10000 rows, or N x 10000 rows if the table is fully sharded across N shards. Note that you should not blindly raise your row limits just because of this, it is an important Vitess operability and reliability feature. If you have large tables to dump, look into using go-mydumper instead.

To restore dump files created by mysqldump, replay it against a Vitess server or other MySQL server using the mysql command line client.

go-mydumper

Alternatively, you can use a slight modification of the go-mydumper tool to export logical dumps of a Vitess keyspace. go-mydumper has the advantage of being multi-threaded, and so can run faster on a database that has many tables. For a database with just one or a handful of large tables, go-mydumper may not be that much faster than mysqldump.

For information on the Vitess-compatible fork of go-mydumper, see https://github.com/aquarapid/go-mydumper . Examples and instructions are available in the README.md in that repo. You will need to be able to compile golang binaries to use this tool.

go-mydumper creates multiple files for each backup. To restore a backup, you can use the mysql commandline client, but using the myloader tool as described in the go-mydumper repo above is easier and can be faster, since the loader is also multithreaded.