Group
Partitions the rows into groups and applies a pipeline to each of the groups.
group {key_columns} (pipeline)
The partitioning of groups are determined by the key_column
s (first argument).
The most conventional use of group
is with aggregate
:
PRQL
from employees
group {title, country} (
aggregate {
average salary,
ct = count salary
}
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
In concept, a transform in context of a group
does the same transformation to the group as it would to the table — for example finding the employee who joined first across the whole table:
PRQL
from employees
sort join_date
take 1
SQL
SELECT
*
FROM
employees
ORDER BY
join_date
LIMIT
1
To find the employee who joined first in each department, it’s exactly the same pipeline, but within a group
expression:
PRQL
from employees
group role (
sort join_date # taken from above
take 1
)
SQL
WITH table_0 AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_0
WHERE
_expr_0 <= 1