Syntax
Summary
A summary of PRQL syntax
Syntax | Usage | Example |
---|---|---|
| | Pipelines | from employees | select first_name |
= | Assigns & Aliases | from e = employees derive total = (sum salary) |
: | Named args & Parameters | interp low:0 1600 sat_score |
[] | Lists | select [id, amount] |
() | Precedence & Parentheses | derive celsius = (fahrenheit - 32) / 1.8 |
‘’ & “” | Strings | derive name = ‘Mary’ |
| ||
# | Comments | # A comment |
@ | Dates & Times | @2021-01-01 |
== | Expressions | filter a == b and c != d and e > f |
== | Self-equality in join | join s=salaries [==id] |
-> | Function definitions | func add a b -> a + b |
+ /- | Sort order | sort [-amount, +date] |
?? | Coalesce | amount ?? 0 |
Pipes
Pipes — the connection between transforms that make up a pipeline — can be either line breaks or a pipe character (|
).
In almost all situations, line-breaks pipe the result of a line’s transform into the transform on the following line. For example, the filter
transform operates on the result of from employees
(which is just the employees
table), and the select
transform operates on the result of the filter
transform.
PRQL
from employees
filter department == "Product"
select [first_name, last_name]
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
In the place of a line-break, it’s also possible to use the |
character to pipe results, such that this is equivalent:
PRQL
from employees | filter department == "Product" | select [first_name, last_name]
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
A line-break doesn’t create a pipeline in a couple of cases:
- within a list (e.g. the
derive
examples below), - when the following line is a new statement, which starts with a keyword of
func
,table
orfrom
.
Lists
Lists are represented with []
, and can span multiple lines. A final trailing comma is optional.
PRQL
from numbers
derive [x = 1, y = 2]
derive [
a = x,
b = y
]
derive [
c = a,
d = b,
]
SQL
SELECT
*,
1 AS x,
2 AS y,
1 AS a,
2 AS b,
1 AS c,
2 AS d
FROM
numbers
Most transforms can take either a list or a single item, so these are equivalent:
PRQL
from employees
select [first_name]
SQL
SELECT
first_name
FROM
employees
PRQL
from employees
select first_name
SQL
SELECT
first_name
FROM
employees
Expressions
PRQL is made up of expressions, like 2 + 3
or ((1 + x) * y)
. In the example below, note the use of expressions to calculate the alias circumference
and in the filter
transform.
PRQL
from foo
select [
circumference = diameter * 3.14159,
color,
]
filter circumference > 10 and color != "red"
SQL
WITH table_1 AS (
SELECT
diameter * 3.14159 AS circumference,
color
FROM
foo
)
SELECT
circumference,
color
FROM
table_1 AS table_0
WHERE
circumference > 10
AND color <> 'red'
Precedence and Parentheses
Parentheses — ()
— are used to give precedence to inner expressions.
Note
We realize some of the finer points here are not intuitive. We are considering approaches to make this more intuitive — even at the cost of requiring more syntax in some circumstances. And we’re planning to make the error messages much better, so the compiler is there to help out.
Parentheses are required around:
- Any nested function call containing a pipe, either the
|
symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline. - Any function call that isn’t a single item in a list or a pipeline, like
sum distance
inround 0 (sum distance)
1. - A minus sign in a function argument, like in
add (-1) (-3)
- Inner transforms for
group
,window
, and other transforms.
Parentheses are not required around expressions which use operators but no function call, like foo + bar
.
Here’s a full rundown of times this applier:
PRQL
from employees
# Requires parentheses, because it's contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (round 1+1 distance)
derive [
# Requires parentheses, because it contains a pipe
is_far = (distance | in 100..),
# The left value of the range requires parentheses,
# because of the minus sign
is_negative = (distance | in (-100..0)),
# ...this is equivalent
is_negative = (distance | in (-100)..0),
# Doesn't require parentheses, because it's in a list (confusing, see footnote)!
average_distance = average distance,
]
# Requires parentheses because of the minus sign
sort (-distance)
# A list is fine too
sort [-distance]
SQL
SELECT
*,
distance BETWEEN 0 AND 20 AS is_proximate,
SUM(distance) OVER () AS total_distance,
MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
distance / 40 AS travel_time,
ROUND(distance, 2) AS distance_rounded_2_dp,
distance >= 100 AS is_far,
distance BETWEEN -100 AND 0,
distance BETWEEN -100 AND 0 AS is_negative,
AVG(distance) OVER () AS average_distance
FROM
employees
ORDER BY
distance DESC
This doesn’t work, for example (though it should provide a much better error message):
PRQL
from employees
derive total_distance = sum distance
Error
Error:
╭─[:2:29]
│
2 │ derive total_distance = sum distance
· ────┬───
· ╰───── Unknown name distance
───╯
For a more formal definition, refer to this precedence table. Because function calls have the lowest precedence, nested function calls or arguments that start or end with an operator require parenthesis.
Group | Operators | Precedence | Associativity |
---|---|---|---|
identifier dot | . | 1 | |
unary | - + ! == | 2 | |
range | .. | 3 | |
mul | * / % | 4 | left-to-right |
add | + - | 5 | left-to-right |
compare | == != <= >= < > | 6 | left-to-right |
coalesce | ?? | 7 | left-to-right |
and | and | 8 | left-to-right |
or | or | 9 | left-to-right |
function call | 10 |
Inner Transforms
Parentheses are also used for transforms (such as group
and window
) that pass their result to an “inner transform”. The example below applies the aggregate
pipeline to each group of unique title
and country
values:
PRQL
from employees
group [title, country] (
aggregate [
average salary,
ct = count
]
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
Comments
Comments are represented by #
.
PRQL
from employees # Comment 1
# Comment 2
aggregate [average salary]
SQL
SELECT
AVG(salary)
FROM
employees
There’s no distinct multiline comment syntax.
Quoted identifiers
To use identifiers that are otherwise invalid, surround them with backticks. Depending on the dialect, these will remain as backticks or be converted to double-quotes.
PRQL
prql target:sql.mysql
from employees
select `first name`
SQL
SELECT
`first name`
FROM
employees
PRQL
prql target:sql.postgres
from employees
select `first name`
SQL
SELECT
"first name"
FROM
employees
PRQL
from `dir/*.parquet`
SQL
SELECT
*
FROM
"dir/*.parquet"
BigQuery also uses backticks to surround project & dataset names (even if valid identifiers) in the SELECT
statement:
PRQL
prql target:sql.bigquery
from `project-foo.dataset.table`
join `project-bar.dataset.table` [==col_bax]
SQL
SELECT
`project-foo.dataset.table`.*,
`project-bar.dataset.table`.*
FROM
`project-foo.dataset.table`
JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax
Quoting schemas
This is currently not great and we are working on improving it; see
https://github.com/PRQL/prql/issues/1535 for progress.
If supplying a schema without a column — for example in a from
or join
transform, that also needs to be a quoted identifier:
PRQL
from `music.albums`
SQL
SELECT
*
FROM
music.albums
Parameters
PRQL will retain parameters like $1
in SQL output, which can then be supplied to the SQL query as a prepared query:
PRQL
from employees
filter id == $1
SQL
SELECT
*
FROM
employees
WHERE
id = $1
Numbers
Numbers can contain underscores between numbers; which can make reading large numbers easier:
PRQL
from numbers
select [
small = 1.000_000_1,
big = 5_000_000,
]
SQL
SELECT
1.0000001 AS small,
5000000 AS big
FROM
numbers
Keywords
At the moment, PRQL uses only four keywords:
prql
let
func
case
To use these names as columns or relations, use backticks: `case`
.
It may seem that transforms are also keywords, but they are normal function within std namespace:
PRQL
std.from my_table
std.select [from = my_table.a, take = my_table.b]
std.take 3
SQL
SELECT
a AS "from",
b AS take
FROM
my_table
LIMIT
3
1: or, technically, it’s on the right side of an assignment in a list…