How do I: remove duplicates?
- PRQL
- SQL
- PRQL
- SQL
Remove duplicates from each group?
- PRQL
- SQL
- PRQL
- SQL
- PRQL
- SQL

How do I: remove duplicates?

PRQL doesn’t have a specific distinct keyword. Instead duplicate tuples in a relation can be removed by using group and take 1:

PRQL

from employees
select department
group employees.* (
  take 1
)

SQL

SELECT
  DISTINCT department
FROM
  employees

This also works with a wildcard:

PRQL

from employees
group employees.* (take 1)

SQL

SELECT
  DISTINCT *
FROM
  employees

Remove duplicates from each group?

To select a single row from each group group can be combined with sort and take:

PRQL

# youngest employee from each department
from employees
group department (
  sort age
  take 1
)

SQL

WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY department
      ORDER BY
        age
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1

Note that we can’t always compile to DISTINCT; when the columns in the group aren’t all the available columns, we need to use a window function:

PRQL

from employees
group {first_name, last_name} (take 1)

SQL

WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY first_name, last_name) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1

When compiling to Postgres or DuckDB dialect, such queries will be compiled to DISTINCT ON, which is the most performant option.

PRQL

prql target:sql.postgres
from employees
group department (
  sort age
  take 1
)

SQL

SELECT
  DISTINCT ON (department) *
FROM
  employees
ORDER BY
  department,
  age

5. Remove duplicates?