Table Filter

The TiDB migration tools operate on all the databases by default, but oftentimes only a subset is needed. For example, you only want to work with the schemas in the form of foo* and bar* and nothing else.

Since TiDB 4.0, all TiDB migration tools share a common filter syntax to define subsets. This document describes how to use the table filter feature.

Usage

CLI

Table filters can be applied to the tools using multiple -f or --filter command line parameters. Each filter is in the form of db.table, where each part can be a wildcard (further explained in the next section). The following lists the example usage.

  • BR:

    1. tiup br backup full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup'
    1. tiup br restore full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup'
  • Dumpling:

    1. tiup dumpling -f 'foo*.*' -f 'bar*.*' -P 3306 -o /tmp/data/
  • TiDB Lightning:

    1. tiup tidb-lightning -f 'foo*.*' -f 'bar*.*' -d /tmp/data/ --backend tidb
  • TiDB Lightning:

    1. tiup tidb-lightning -f 'foo*.*' -f 'bar*.*' -d /tmp/data/ --backend tidb

TOML configuration files

Table filters in TOML files are specified as array of strings. The following lists the example usage.

  • TiDB Lightning:

    1. [mydumper]
    2. filter = ['foo*.*', 'bar*.*']
  • TiCDC:

    1. [filter]
    2. rules = ['foo*.*', 'bar*.*']
    3. [[sink.dispatchers]]
    4. matcher = ['db1.*', 'db2.*', 'db3.*']
    5. dispatcher = 'ts'

Syntax

Plain table names

Each table filter rule consists of a “schema pattern” and a “table pattern”, separated by a dot (.). Tables whose fully-qualified name matches the rules are accepted.

  1. db1.tbl1
  2. db2.tbl2
  3. db3.tbl3

A plain name must only consist of valid identifier characters, such as:

  • digits (0 to 9)
  • letters (a to z, A to Z)
  • $
  • _
  • non ASCII characters (U+0080 to U+10FFFF)

All other ASCII characters are reserved. Some punctuations have special meanings, as described in the next section.

Wildcards

Each part of the name can be a wildcard symbol described in fnmatch(3):

  • * — matches zero or more characters
  • ? — matches one character
  • [a-z] — matches one character between “a” and “z” inclusively
  • [!a-z] — matches one character except “a” to “z”.
  1. db[0-9].tbl[0-9a-f][0-9a-f]
  2. data.*
  3. *.backup_*

“Character” here means a Unicode code point, such as:

  • U+00E9 (é) is 1 character.
  • U+0065 U+0301 (é) are 2 characters.
  • U+1F926 U+1F3FF U+200D U+2640 U+FE0F (🤦🏿‍♀️) are 5 characters.

File import

To import a file as the filter rule, include an @ at the beginning of the rule to specify the file name. The table filter parser treats each line of the imported file as additional filter rules.

For example, if a file config/filter.txt has the following content:

  1. employees.*
  2. *.WorkOrder

the following two invocations are equivalent:

  1. tiup dumpling -f '@config/filter.txt'
  2. tiup dumpling -f 'employees.*' -f '*.WorkOrder'

A filter file cannot further import another file.

Comments and blank lines

Inside a filter file, leading and trailing white-spaces of every line are trimmed. Furthermore, blank lines (empty strings) are ignored.

A leading # marks a comment and is ignored. # not at start of line is considered syntax error.

  1. # this line is a comment
  2. db.table # but this part is not comment and may cause error

Exclusion

An ! at the beginning of the rule means the pattern after it is used to exclude tables from being processed. This effectively turns the filter into a block list.

  1. *.*
  2. #^ note: must add the *.* to include all tables first
  3. !*.Password
  4. !employees.salaries

Escape character

To turn a special character into an identifier character, precede it with a backslash \.

  1. db\.with\.dots.*

For simplicity and future compatibility, the following sequences are prohibited:

  • \ at the end of the line after trimming whitespaces (use [ ] to match a literal whitespace at the end).
  • \ followed by any ASCII alphanumeric character ([0-9a-zA-Z]). In particular, C-like escape sequences like \0, \r, \n and \t currently are meaningless.

Quoted identifier

Besides \, special characters can also be suppressed by quoting using " or ` .

  1. "db.with.dots"."tbl\1"
  2. `db.with.dots`.`tbl\2`

The quotation mark can be included within an identifier by doubling itself.

  1. "foo""bar".`foo``bar`
  2. # equivalent to:
  3. foo\"bar.foo\`bar

Quoted identifiers cannot span multiple lines.

It is invalid to partially quote an identifier:

  1. "this is "invalid*.*

Regular expression

In case very complex rules are needed, each pattern can be written as a regular expression delimited with /:

  1. /^db\d{2,}$/./^tbl\d{2,}$/

These regular expressions use the Go dialect. The pattern is matched if the identifier contains a substring matching the regular expression. For instance, /b/ matches db01.

Table Filter - 图1

Note

Every / in the regular expression must be escaped as \/, including inside […]. You cannot place an unescaped / between \Q…\E.

Multiple rules

Table Filter - 图2

Note

This section is not applicable to TiDB Cloud. Currently, TiDB Cloud only supports one table filter rule.

When a table name matches none of the rules in the filter list, the default behavior is to ignore such unmatched tables.

To build a block list, an explicit *.* must be used as the first rule, otherwise all tables will be excluded.

  1. # every table will be filtered out
  2. tiup dumpling -f '!*.Password'
  3. # only the "Password" table is filtered out, the rest are included.
  4. tiup dumpling -f '*.*' -f '!*.Password'

In a filter list, if a table name matches multiple patterns, the last match decides the outcome. For instance:

  1. # rule 1
  2. employees.*
  3. # rule 2
  4. !*.dep*
  5. # rule 3
  6. *.departments

The filtered outcome is as follows:

Table nameRule 1Rule 2Rule 3Outcome
irrelevant.tableDefault (reject)
employees.employeesRule 1 (accept)
employees.dept_empRule 2 (reject)
employees.departmentsRule 3 (accept)
else.departmentsRule 3 (accept)

Table Filter - 图3

Note

In TiDB tools, the system schemas are always excluded in the default configuration. The system schemas are:

  • INFORMATION_SCHEMA
  • PERFORMANCE_SCHEMA
  • METRICS_SCHEMA
  • INSPECTION_SCHEMA
  • mysql
  • sys