Join Expressions
Join expressions, also called "joins", combine the results of two or more table expressions based on conditions on the values of particular columns (i.e., equality columns).
Join expressions define a data source in the FROM
sub-clause of simple SELECT
clauses, or as parameter to TABLE
. Joins are a particular kind of table expression.
Tip:
New in v19.1: The cost-based optimizer supports hint syntax to force the use of a specific join algorithm. For more information, see Join hints.
Synopsis
Parameters
Parameter | Description |
---|---|
joined_table | Another join expression. |
table_ref | A table expression. |
a_expr | A scalar expression to use as ON join condition. |
name | A column name to use as USING join condition |
Supported join types
CockroachDB supports the following join types:
Inner joins
Only the rows from the left and right operand that match the condition are returned.
<table expr> [ INNER ] JOIN <table expr> ON <val expr>
<table expr> [ INNER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL [ INNER ] JOIN <table expr>
<table expr> CROSS JOIN <table expr>
Left outer joins
For every left row where there is no match on the right, NULL
values are returned for the columns on the right.
<table expr> LEFT [ OUTER ] JOIN <table expr> ON <val expr>
<table expr> LEFT [ OUTER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL LEFT [ OUTER ] JOIN <table expr>
Right outer joins
For every right row where there is no match on the left, NULL
values are returned for the columns on the left.
<table expr> RIGHT [ OUTER ] JOIN <table expr> ON <val expr>
<table expr> RIGHT [ OUTER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL RIGHT [ OUTER ] JOIN <table expr>
Full outer joins
For every row on one side of the join where there is no match on the other side, NULL
values are returned for the columns on the non-matching side.
<table expr> FULL [ OUTER ] JOIN <table expr> ON <val expr>
<table expr> FULL [ OUTER ] JOIN <table expr> USING(<colname>, <colname>, ...)
<table expr> NATURAL FULL [ OUTER ] JOIN <table expr>
Supported join conditions
CockroachDB supports the following conditions to match rows in a join:
- No condition with
CROSS JOIN
: each row on the left is consideredto match every row on the right. ON
predicates: a Boolean scalar expressionis evaluated to determine whether the operand rows match.USING
: the named columns are compared pairwise from the left andright rows; left and right rows are considered to match if thecolumns are equal pairwise.NATURAL
: generates an implicitUSING
condition using all thecolumn names that are present in both the left and right tableexpressions.
Warning:NATURAL
is supported for compatibility with PostgreSQL; its use in new applications is discouraged because its results can silently change in unpredictable ways when new columns are added to one of the join operands.
Join algorithms
CockroachDB supports the following algorithms for performing a join:
Merge joins
By default, CockroachDB uses merge joins whenever possible because they are more performant than hash joins, computationally and in terms of memory. A merge join requires both tables to be indexed on the equality columns and the indexes must have the same ordering. When these conditions are not met, CockroachDB resorts to the slower hash joins. Merge joins can be used only with distributed query processing.
Merge joins are performed on the indexed columns of two tables as follows:
- CockroachDB checks for indexes on the equality columns and that they are ordered the same (i.e.,
ASC
orDESC
). - CockroachDB takes one row from each table and compares them.
- For inner joins:
- If the rows are equal, CockroachDB returns the rows.
- If there are multiple matches, the cartesian product of the matches is returned.
- If the rows are not equal, CockroachDB discards the lower-value row and repeats the process with the next row until all rows are processed.
- For outer joins:
- If the rows are equal, CockroachDB returns the rows.
- If there are multiple matches, the cartesian product of the matches is returned.
- If the rows are not equal, CockroachDB returns
NULL
for the non-matching column and repeats the process with the next row until all rows are processed.
- For inner joins:
Hash joins
If a merge join cannot be used, CockroachDB uses a hash join. Hash joins are computationally expensive and require additional memory.
Hash joins are performed on two tables as follows:
- CockroachDB reads both tables and attempts to pick the smaller table.
- CockroachDB creates an in-memory hash table on the smaller table. If the hash table is too large, it will spill over to disk storage (which could affect performance).
- CockroachDB then scans the large table, looking up each row in the hash table.
Lookup joins
The cost-based optimizer decides when it would be beneficial to use a lookup join. Lookup joins are used when there is a large imbalance in size between the two tables, as it only reads the smaller table and then looks up matches in the larger table. A lookup join requires that the right-hand (i.e., larger) table is indexed on the equality column.
Lookup joins are performed on two tables as follows:
- CockroachDB reads each row in the small table.
- CockroachDB then scans (or "looks up") the larger table for matches to the smaller table and outputs the matching rows.
You can override the use of lookup joins using join hints.
Performance best practices
Note:
CockroachDBs is currently undergoing major changes to evolve and improve the performance of queries using joins. The restrictions and workarounds listed in this section will be lifted or made unnecessary over time.
- Joins over interleaved tables are usually (but not always) processed more effectively than over non-interleaved tables.
- When no indexes can be used to satisfy a join, CockroachDB may load all the rows in memory that satisfy the condition one of the join operands before starting to return result rows. This may cause joins to fail if the join condition or other
WHERE
clauses are insufficiently selective. - Outer joins (i.e., left outer joins, right outer joins, and full outer joins) are generally processed less efficiently than inner joins. Use inner joins whenever possible. Full outer joins are the least optimized.
- Use
EXPLAIN
over queries containing joins to verify that indexes are used. - Use indexes for faster joins.
See also
- Cost-based Optimizer: Join Hints
- Scalar Expressions
- Table Expressions
- Simple
SELECT
Clause - Selection Queries
EXPLAIN
- Performance Best Practices - Overview
- SQL join operation (Wikipedia))
- CockroachDB's first implementation of SQL joins (CockroachDB Blog)
- On the Way to Better SQL Joins in CockrochDB (CockroachDB Blog)