Joins
Table of Contents
- Cross Joins
- Inner Joins
- Left Outer Joins
- Right Outer Joins
- Full Outer Joins
- Join Conditions
- Available Join Algorithms
- Limitations
Cross Joins
Referencing two tables results in a CROSS JOIN
.
The result is computed by creating every possible combination (Cartesian Product) of their rows (t1 * t2 * t3 * tn
) and then applying the given query operation on it (WHERE
clause, SELECT
list, ORDER BY
clause, …):
cr> select articles.name as article, colors.name as color, price
... from articles cross join colors
... where price > 5000.0
... order by price, color, article;
+------------------------------+---------------+----------+
| article | color | price |
+------------------------------+---------------+----------+
| Infinite Improbability Drive | Antique White | 19999.99 |
| Infinite Improbability Drive | Gold | 19999.99 |
| Infinite Improbability Drive | Midnight Blue | 19999.99 |
| Infinite Improbability Drive | Olive Drab | 19999.99 |
| Starship Titanic | Antique White | 50000.0 |
| Starship Titanic | Gold | 50000.0 |
| Starship Titanic | Midnight Blue | 50000.0 |
| Starship Titanic | Olive Drab | 50000.0 |
+------------------------------+---------------+----------+
SELECT 8 rows in set (... sec)
Cross Joins can be done explicitly using the CROSS JOIN
statement as shown in the example above, or implicitly by just specifying two or more tables in the FROM
list:
cr> select articles.name as article, colors.name as color, price
... from articles, colors
... where price > 5000.0
... order by price, color, article;
+------------------------------+---------------+----------+
| article | color | price |
+------------------------------+---------------+----------+
| Infinite Improbability Drive | Antique White | 19999.99 |
| Infinite Improbability Drive | Gold | 19999.99 |
| Infinite Improbability Drive | Midnight Blue | 19999.99 |
| Infinite Improbability Drive | Olive Drab | 19999.99 |
| Starship Titanic | Antique White | 50000.0 |
| Starship Titanic | Gold | 50000.0 |
| Starship Titanic | Midnight Blue | 50000.0 |
| Starship Titanic | Olive Drab | 50000.0 |
+------------------------------+---------------+----------+
SELECT 8 rows in set (... sec)
Inner Joins
Inner Joins require each record of one table to have matching records on the other table:
cr> select s.id, s.table_name, t.number_of_shards
... from sys.shards s, information_schema.tables t
... where s.table_name = t.table_name
... and s.table_name = 'employees'
... order by s.id;
+----+------------+------------------+
| id | table_name | number_of_shards |
+----+------------+------------------+
| 0 | employees | 4 |
| 1 | employees | 4 |
| 2 | employees | 4 |
| 3 | employees | 4 |
+----+------------+------------------+
SELECT 4 rows in set (... sec)
Left Outer Joins
Left outer join returns tuples for all matching records of the left and right relation like Inner join. Additionally it returns tuples for all other records from left that don’t match any record on the right by using null values for the columns of the right relation:
cr> select e.name || ' ' || e.surname as employee, coalesce(d.name, '') as manager_of_department
... from employees e left join departments d
... on e.id = d.manager_id
... order by e.id;
+--------------------+-----------------------+
| employee | manager_of_department |
+--------------------+-----------------------+
| John Doe | Administration |
| John Smith | IT |
| Sean Lee | |
| Rebecca Sean | |
| Tim Ducan | |
| Robert Duval | |
| Clint Johnson | |
| Sarrah Mcmillan | |
| David Limb | |
| David Bowe | |
| Smith Clark | Marketing |
| Ted Kennedy | |
| Ronald Reagan | |
| Franklin Rossevelt | |
| Sam Malone | |
| Marry Georgia | |
| Tim Doe | Human Resources |
| Tim Malone | Purchasing |
+--------------------+-----------------------+
SELECT 18 rows in set (... sec)
Right Outer Joins
Right outer join returns tuples for all matching records of the right and left relation like Inner join. Additionally it returns tuples for all other records from right that don’t match any record on the left by using null values for the columns of the left relation:
cr> select e.name || ' ' || e.surname as employee, d.name as manager_of_department
... from employees e right join departments d
... on e.id = d.manager_id
... order by d.id;
+-------------+-----------------------+
| employee | manager_of_department |
+-------------+-----------------------+
| John Doe | Administration |
| Smith Clark | Marketing |
| Tim Malone | Purchasing |
| Tim Doe | Human Resources |
| | Shipping |
| John Smith | IT |
+-------------+-----------------------+
SELECT 6 rows in set (... sec)
Full Outer Joins
Full outer join returns tuples for all matching records of the left and right relation like Inner join. Additionally it returns tuples for all other records from left that don’t match any record on the right by using null values for the columns of the right relation. Additionally it returns tuples for all other records from right that don’t match any record on the left by using null values for the columns of the left relation:
cr> select e.name || ' ' || e.surname as employee, coalesce(d.name, '') as manager_of_department
... from employees e full join departments d
... on e.id = d.manager_id
... order by e.id;
+--------------------+-----------------------+
| employee | manager_of_department |
+--------------------+-----------------------+
| John Doe | Administration |
| John Smith | IT |
| Sean Lee | |
| Rebecca Sean | |
| Tim Ducan | |
| Robert Duval | |
| Clint Johnson | |
| Sarrah Mcmillan | |
| David Limb | |
| David Bowe | |
| Smith Clark | Marketing |
| Ted Kennedy | |
| Ronald Reagan | |
| Franklin Rossevelt | |
| Sam Malone | |
| Marry Georgia | |
| Tim Doe | Human Resources |
| Tim Malone | Purchasing |
| | Shipping |
+--------------------+-----------------------+
SELECT 19 rows in set (... sec)
Join Conditions
CrateDB supports all operators and scalar functions as join conditions in the WHERE
clause.
Example with within
scalar function:
cr> select photos.name, countries.name
... from countries, photos
... where within(location, geo)
... order by countries.name, photos.name;
+--------------+---------+
| name | name |
+--------------+---------+
| Eiffel Tower | France |
| Berlin Wall | Germany |
+--------------+---------+
SELECT 2 rows in set (... sec)
Available Join Algorithms
Nested Loop Join Algorithm
The nested loop algorithm evaluates the join conditions on every record of the left table with every record of the right table in a distributed manner (for each shard of the used tables). The right table is scanned once for every row in the left table.
This is the default algorithm used for all types of joins.
Block Hash Join Algorithm
The performance of Equi-Joins#Equi-join) is substantially improved by using the Hash Join algorithm. At first one relation is scanned and loaded into a hash table using the attributes of the join conditions as hash keys. Once the hash table is build, the second relation is scanned and the join condition values of every row are hashed and matched against the hash table.
In order to built a hash table even if the first relation wouldn’t fit into the available memory, only a certain block size of a relation is loaded at once. The whole operation will be repeated with the next block of the first relation once scanning the second relation has finished.
This optimisation cannot be applied unless the join is an INNER join and the join condition obeys the following rules:
- contains at least one
EQUAL
operator- contains no
OR
operator- every argument of a
EQUAL
operator can only references fields from one relation
The Hash Join algorithm is faster but has a bigger memory footprint. As such it can explicitly be disabled on demand when memory is scarce using the session setting enable_hashjoin
:
SET enable_hashjoin=false
Limitations
- Joining more than 2 tables can result in execution plans which perform poorly as there is no query optimizer in place yet.