Group

Partitions the rows into groups and applies a pipeline to each of the groups.

  1. group {key_columns} (pipeline)

The partitioning of groups are determined by the key_columns (first argument).

The most conventional use of group is with aggregate:

PRQL

  1. from employees
  2. group {title, country} (
  3. aggregate {
  4. average salary,
  5. ct = count salary
  6. }
  7. )

SQL

  1. SELECT
  2. title,
  3. country,
  4. AVG(salary),
  5. COUNT(*) AS ct
  6. FROM
  7. employees
  8. GROUP BY
  9. title,
  10. country

In concept, a transform in context of a group does the same transformation to the group as it would to the table — for example finding the employee who joined first across the whole table:

PRQL

  1. from employees
  2. sort join_date
  3. take 1

SQL

  1. SELECT
  2. *
  3. FROM
  4. employees
  5. ORDER BY
  6. join_date
  7. LIMIT
  8. 1

To find the employee who joined first in each department, it’s exactly the same pipeline, but within a group expression:

PRQL

  1. from employees
  2. group role (
  3. sort join_date # taken from above
  4. take 1
  5. )

SQL

  1. WITH table_0 AS (
  2. SELECT
  3. *,
  4. ROW_NUMBER() OVER (
  5. PARTITION BY role
  6. ORDER BY
  7. join_date
  8. ) AS _expr_0
  9. FROM
  10. employees
  11. )
  12. SELECT
  13. *
  14. FROM
  15. table_0
  16. WHERE
  17. _expr_0 <= 1