6.1.4. Joins
Joins combine data from two sources into a single set. This is done on a row-by-row basis and usually involves checking a join condition in order to determine which rows should be merged and appear in the resulting dataset. There are several types (INNER
, OUTER
) and classes (qualified, natural, etc.) of joins, each with its own syntax and rules.
Since joins can be chained, the datasets involved in a join may themselves be joined sets.
Syntax
SELECT
...
FROM <source>
[<joins>]
[...]
<source> ::=
{ table
| view
| selectable-stored-procedure [(<args>)]
| <derived-table>
| <common-table-expression>
} [[AS] alias]
<joins> ::= <join> [<join> ...]
<join> ::=
[<join-type>] JOIN <source> <join-condition>
| NATURAL [<join-type>] JOIN <source>
| {CROSS JOIN | ,} <source>
<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
<join-condition> ::= ON <condition> | USING (<column-list>)
Argument | Description |
---|---|
table | Name of a table |
view | name of a view |
selectable-stored-procedure | Name of a selectable stored procedure |
args | Selectable stored procedure input parameter[s] |
derived-table | Reference, by name, to a derived table |
common-table-expression | Reference, by name, to a common table expression (CTE) |
alias | An alias for a data source (table, view, procedure, CTE, derived table) |
condition | Join condition (criterion) |
column-list | The list of columns used for an equi-join |
Inner vs. outer joins
A join always combines data rows from two sets (usually referred to as the left set and the right set). By default, only rows that meet the join condition (i.e., that match at least one row in the other set when the join condition is applied) make it into the result set. This default type of join is called an inner join. Suppose we have the following two tables:
ID | S |
---|---|
87 | Just some text |
235 | Silence |
CODE | X |
---|---|
-23 | 56.7735 |
87 | 416.0 |
If we join these tables like this:
select *
from A
join B on A.id = B.code;
then the result set will be:
ID | S | CODE | X |
---|---|---|---|
87 | Just some text | 87 | 416.0 |
The first row of A
has been joined with the second row of B
because together they met the condition “A.id = B.code
”. The other rows from the source tables have no match in the opposite set and are therefore not included in the join. Remember, this is an INNER
join. We can make that fact explicit by writing:
select *
from A
inner join B on A.id = B.code;
However, since INNER
is the default, this is rarely done.
It is perfectly possible that a row in the left set matches several rows from the right set or vice versa. In that case, all those combinations are included, and we can get results like:
ID | S | CODE | X |
---|---|---|---|
87 | Just some text | 87 | 416.0 |
87 | Just some text | 87 | -1.0 |
-23 | Don’t know | -23 | 56.7735 |
-23 | Still don’t know | -23 | 56.7735 |
-23 | I give up | -23 | 56.7735 |
Sometimes we want (or need) all the rows of one or both of the sources to appear in the joined set, regardless of whether they match a record in the other source. This is where outer joins come in. A LEFT
outer join includes all the records from the left set, but only matching records from the right set. In a RIGHT
outer join it’s the other way around. FULL
outer joins include all the records from both sets. In all outer joins, the “holes” (the places where an included source record doesn’t have a match in the other set) are filled up with NULL
s.
In order to make an outer join, you must specify LEFT
, RIGHT
or FULL
, optionally followed by the keyword OUTER
.
Below are the results of the various outer joins when applied to our original tables A
and B
:
select *
from A
left [outer] join B on A.id = B.code;
ID | S | CODE | X |
---|---|---|---|
87 | Just some text | 87 | 416.0 |
235 | Silence | <null> | <null> |
select *
from A
right [outer] join B on A.id = B.code
ID | S | CODE | X |
---|---|---|---|
<null> | <null> | -23 | 56.7735 |
87 | Just some text | 87 | 416.0 |
select *
from A
full [outer] join B on A.id = B.code
ID | S | CODE | X |
---|---|---|---|
<null> | <null> | -23 | 56.7735 |
87 | Just some text | 87 | 416.0 |
235 | Silence | <null> | <null> |
Qualified joins
Qualified joins specify conditions for the combining of rows. This happens either explicitly in an ON
clause or implicitly in a USING
clause.
Syntax
<qualified-join> ::= [<join-type>] JOIN <source> <join-condition>
<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
<join-condition> ::= ON <condition> | USING (<column-list>)
Explicit-condition joins
Most qualified joins have an ON
clause, with an explicit condition that can be any valid boolean expression but usually involves some comparison between the two sources involved.
Quite often, the condition is an equality test (or a number of AND
ed equality tests) using the “=
” operator. Joins like these are called equi-joins. (The examples in the section on inner and outer joins were al equi-joins.)
Examples of joins with an explicit condition:
/* Select all Detroit customers who made a purchase
in 2013, along with the purchase details: */
select * from customers c
join sales s on s.cust_id = c.id
where c.city = 'Detroit' and s.year = 2013;
/* Same as above, but include non-buying customers: */
select * from customers c
left join sales s on s.cust_id = c.id
where c.city = 'Detroit' and s.year = 2013;
/* For each man, select the women who are taller than he.
Men for whom no such woman exists are not included. */
select m.fullname as man, f.fullname as woman
from males m
join females f on f.height > m.height;
/* Select all pupils with their class and mentor.
Pupils without a mentor are also included.
Pupils without a class are not included. */
select p.firstname, p.middlename, p.lastname,
c.name, m.name
from pupils p
join classes c on c.id = p.class
left join mentors m on m.id = p.mentor;
Named columns joins
Equi-joins often compare columns that have the same name in both tables. If this is the case, we can also use the second type of qualified join: the named columns join.
Named columns joins are not supported in Dialect 1 databases. |
Named columns joins have a USING
clause which states just the column names. So instead of this:
select * from flotsam f
join jetsam j
on f.sea = j.sea
and f.ship = j.ship;
we can also write:
select * from flotsam
join jetsam using (sea, ship)
which is considerably shorter. The result set is a little different though — at least when using “SELECT *
”:
The explicit-condition join — with the
ON
clause — will contain each of the columnsSEA
andSHIP
twice: once from tableFLOTSAM
, and once from tableJETSAM
. Obviously, they will have the same values.The named columns join — with the
USING
clause — will contain these columns only once.
If you want all the columns in the result set of the named columns join, set up your query like this:
select f.*, j.*
from flotsam f
join jetsam j using (sea, ship);
This will give you the exact same result set as the explicit-condition join.
For an OUTER
named columns join, there’s an additional twist when using “SELECT *
” or an unqualified column name from the USING
list:
If a row from one source set doesn’t have a match in the other but must still be included because of the LEFT
, RIGHT
or FULL
directive, the merged column in the joined set gets the non-NULL
value. That is fair enough, but now you can’t tell whether this value came from the left set, the right set, or both. This can be especially deceiving when the value came from the right hand set, because “*
” always shows combined columns in the left hand part — even in the case of a RIGHT
join.
Whether this is a problem or not depends on the situation. If it is, use the “a.*, b.*
” approach shown above, with a
and b
the names or aliases of the two sources. Or better yet, avoid “*
” altogether in your serious queries and qualify all column names in joined sets. This has the additional benefit that it forces you to think about which data you want to retrieve and where from.
It is your responsibility to make sure that the column names in the USING
list are of compatible types between the two sources. If the types are compatible but not equal, the engine converts them to the type with the broadest range of values before comparing the values. This will also be the data type of the merged column that shows up in the result set if “SELECT *
” or the unqualified column name is used. Qualified columns on the other hand will always retain their original data type.
Natural joins
Taking the idea of the named columns join a step further, a natural join performs an automatic equi-join on all the columns that have the same name in the left and right table. The data types of these columns must be compatible.
Natural joins are not supported in Dialect 1 databases. |
Syntax
<natural-join> ::= NATURAL [<join-type>] JOIN <source>
<join-type> ::= INNER | {LEFT | RIGHT | FULL} [OUTER]
Given these two tables:
create table TA (
a bigint,
s varchar(12),
ins_date date
);
create table TB (
a bigint,
descr varchar(12),
x float,
ins_date date
);
a natural join on TA
and TB
would involve the columns a
and ins_date
, and the following two statements would have the same effect:
select * from TA
natural join TB;
select * from TA
join TB using (a, ins_date);
Like all joins, natural joins are inner joins by default, but you can turn them into outer joins by specifying LEFT
, RIGHT
or FULL
before the JOIN
keyword.
Caution: if there are no columns with the same name in the two source relations, a CROSS JOIN
is performed. We’ll get to this type of join in a minute.
A Note on Equality
This note about equality and inequality operators applies everywhere in Firebird’s SQL language, not just in |
The “=
” operator, which is explicitly used in many conditional joins and implicitly in named column joins and natural joins, only matches values to values. According to the SQL standard, NULL
is not a value and hence two NULL
s are neither equal nor unequal to one another. If you need NULL
s to match each other in a join, use the IS NOT DISTINCT FROM
operator. This operator returns true if the operands have the same value or if they are both NULL
.
select *
from A join B
on A.id is not distinct from B.code;
Likewise, in the — extremely rare — cases where you want to join on inequality, use IS DISTINCT FROM
, not “<>
”, if you want NULL
to be considered different from any value and two NULL
s considered equal:
select *
from A join B
on A.id is distinct from B.code;
Cross joins
A cross join produces the full set product of the two data sources. This means that it successfully matches every row in the left source to every row in the right source.
Syntax
<cross-join> ::= {CROSS JOIN | ,} <source>
Please notice that the comma syntax is deprecated! It is only supported to keep legacy code working and may disappear in some future version.
Cross-joining two sets is equivalent to joining them on a tautology (a condition that is always true). The following two statements have the same effect:
select * from TA
cross join TB;
select * from TA
join TB on 1 = 1;
Cross joins are inner joins, because they only include matching records – it just so happens that every record matches! An outer cross join, if it existed, wouldn’t add anything to the result, because what outer joins add are non-matching records, and these don’t exist in cross joins.
Cross joins are seldom useful, except if you want to list all the possible combinations of two or more variables. Suppose you are selling a product that comes in different sizes, different colors and different materials. If these variables are each listed in a table of their own, this query would return all the combinations:
select m.name, s.size, c.name
from materials m
cross join sizes s
cross join colors c;
Ambiguous field names in joins
Firebird rejects unqualified field names in a query if these field names exist in more than one dataset involved in a join. This is even true for inner equi-joins where the field name figures in the ON
clause like this:
select a, b, c
from TA
join TB on TA.a = TB.a;
There is one exception to this rule: with named columns joins and natural joins, the unqualified field name of a column taking part in the matching process may be used legally and refers to the merged column of the same name. For named columns joins, these are the columns listed in the USING
clause. For natural joins, they are the columns that have the same name in both relations. But please notice again that, especially in outer joins, plain colname
isn’t always the same as left.colname
or right.colname
. Types may differ, and one of the qualified columns may be NULL
while the other isn’t. In that case, the value in the merged, unqualified column may mask the fact that one of the source values is absent.
Joins with stored procedures
If a join is performed with a stored procedure that is not correlated with other data streams via input parameters, there are no oddities. If correlation is involved, an unpleasant quirk reveals itself. The problem is that the optimizer denies itself any way to determine the interrelationships of the input parameters of the procedure from the fields in the other streams:
SELECT *
FROM MY_TAB
JOIN MY_PROC(MY_TAB.F) ON 1 = 1;
Here, the procedure will be executed before a single record has been retrieved from the table, MY_TAB
. The isc_no_cur_rec error
error (no current record for fetch operation) is raised, interrupting the execution.
The solution is to use syntax that specifies the join order explicitly:
SELECT *
FROM MY_TAB
LEFT JOIN MY_PROC(MY_TAB.F) ON 1 = 1;
This forces the table to be read before the procedure and everything works correctly.
This quirk has been recognised as a bug in the optimizer and will be fixed in the next version of Firebird. |