Set Operations

TiDB supports three set operations using the UNION, EXCEPT, and INTERSECT operators. The smallest unit of a set is a SELECT statement.

UNION operator

In mathematics, the union of two sets A and B consists of all elements that are in A or in B. For example:

  1. SELECT 1 UNION SELECT 2;
  2. +---+
  3. | 1 |
  4. +---+
  5. | 2 |
  6. | 1 |
  7. +---+
  8. 2 rows in set (0.00 sec)

TiDB supports both UNION DISTINCT and UNION ALL operators. UNION DISTINCT removes duplicate records from the result set, while UNION ALL keeps all records including duplicates. UNION DISTINCT is used by default in TiDB.

  1. CREATE TABLE t1 (a int);
  2. CREATE TABLE t2 (a int);
  3. INSERT INTO t1 VALUES (1),(2);
  4. INSERT INTO t2 VALUES (1),(3);

Examples for UNION DISTINCT and UNION ALL queries are respectively as follows:

  1. SELECT * FROM t1 UNION DISTINCT SELECT * FROM t2;
  2. +---+
  3. | a |
  4. +---+
  5. | 1 |
  6. | 2 |
  7. | 3 |
  8. +---+
  9. 3 rows in set (0.00 sec)
  10. SELECT * FROM t1 UNION ALL SELECT * FROM t2;
  11. +---+
  12. | a |
  13. +---+
  14. | 1 |
  15. | 2 |
  16. | 1 |
  17. | 3 |
  18. +---+
  19. 4 rows in set (0.00 sec)

EXCEPT operator

If A and B are two sets, EXCEPT returns the difference set of A and B which consists of elements that are in A but not in B.

  1. SELECT * FROM t1 EXCEPT SELECT * FROM t2;
  2. +---+
  3. | a |
  4. +---+
  5. | 2 |
  6. +---+
  7. 1 rows in set (0.00 sec)

EXCEPT ALL operator is not yet supported.

INTERSECT operator

In mathematics, the intersection of two sets A and B consists of all elements that are both in A and B, and no other elements.

  1. SELECT * FROM t1 INTERSECT SELECT * FROM t2;
  2. +---+
  3. | a |
  4. +---+
  5. | 1 |
  6. +---+
  7. 1 rows in set (0.00 sec)

INTERSECT ALL operator is not yet supported. INTERSECT operator has higher precedence over EXCEPT and UNION operators.

  1. SELECT * FROM t1 UNION ALL SELECT * FROM t1 INTERSECT SELECT * FROM t2;
  2. +---+
  3. | a |
  4. +---+
  5. | 1 |
  6. | 1 |
  7. | 2 |
  8. +---+
  9. 3 rows in set (0.00 sec)

Parentheses

TiDB supports using parentheses to specify the precedence of set operations. Expressions in parentheses are processed first.

  1. (SELECT * FROM t1 UNION ALL SELECT * FROM t1) INTERSECT SELECT * FROM t2;
  2. +---+
  3. | a |
  4. +---+
  5. | 1 |
  6. +---+
  7. 1 rows in set (0.00 sec)

Use ORDER BY and LIMIT

TiDB supports using ORDER BY or LIMIT clause in set operations. These two clauses must be at the end of the entire statement.

  1. (SELECT * FROM t1 UNION ALL SELECT * FROM t1 INTERSECT SELECT * FROM t2) ORDER BY a LIMIT 2;
  2. +---+
  3. | a |
  4. +---+
  5. | 1 |
  6. | 1 |
  7. +---+
  8. 2 rows in set (0.00 sec)