PRQL Language Book

Pipelined Relational Query Language, pronounced “Prequel”.

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.

This book serves as a tutorial and reference guide on the language and the broader project. It currently has three sections, navigated by links on the left:

  • Tutorial — A friendly & accessible guide for learning PRQL. It has a gradual increase of difficulty and requires only basic understanding of programming languages. Knowledge of SQL is beneficial, because of many comparisons to SQL, but not required.
  • Reference — In-depth information about the PRQL language. Includes justifications for language design decisions and formal specifications for parts of the language.
  • Project — General information about the project, tooling and development.

Examples of PRQL with a comparison to the generated SQL. PRQL queries can be as simple as:

PRQL

  1. from tracks
  2. filter artist == "Bob Marley" # Each line transforms the previous result
  3. aggregate { # `aggregate` reduces each column to a value
  4. plays = sum plays,
  5. longest = max length,
  6. shortest = min length, # Trailing commas are allowed
  7. }

SQL

  1. SELECT
  2. COALESCE(SUM(plays), 0) AS plays,
  3. MAX(length) AS longest,
  4. MIN(length) AS shortest
  5. FROM
  6. tracks
  7. WHERE
  8. artist = 'Bob Marley'

…and here’s a larger example:

PRQL

  1. from employees
  2. filter start_date > @2021-01-01 # Clear date syntax
  3. derive { # `derive` adds columns / variables
  4. gross_salary = salary + (tax ?? 0), # Terse coalesce
  5. gross_cost = gross_salary + benefits, # Variables can use other variables
  6. }
  7. filter gross_cost > 0
  8. group {title, country} ( # `group` runs a pipeline over each group
  9. aggregate { # `aggregate` reduces each group to a value
  10. average gross_salary,
  11. sum_gross_cost = sum gross_cost, # `=` sets a column name
  12. }
  13. )
  14. filter sum_gross_cost > 100_000 # `filter` replaces both of SQL's `WHERE` & `HAVING`
  15. derive id = f"{title}_{country}" # F-strings like Python
  16. derive country_code = s"LEFT(country, 2)" # S-strings permit SQL as an escape hatch
  17. sort {sum_gross_cost, -country} # `-country` means descending order
  18. take 1..20 # Range expressions (also valid as `take 20`)

SQL

  1. WITH table_1 AS (
  2. SELECT
  3. title,
  4. country,
  5. salary + COALESCE(tax, 0) + benefits AS _expr_1,
  6. salary + COALESCE(tax, 0) AS _expr_2
  7. FROM
  8. employees
  9. WHERE
  10. start_date > DATE '2021-01-01'
  11. ),
  12. table_0 AS (
  13. SELECT
  14. title,
  15. country,
  16. AVG(_expr_2) AS _expr_0,
  17. COALESCE(SUM(_expr_1), 0) AS sum_gross_cost
  18. FROM
  19. table_1
  20. WHERE
  21. _expr_1 > 0
  22. GROUP BY
  23. title,
  24. country
  25. )
  26. SELECT
  27. title,
  28. country,
  29. _expr_0,
  30. sum_gross_cost,
  31. CONCAT(title, '_', country) AS id,
  32. LEFT(country, 2) AS country_code
  33. FROM
  34. table_0
  35. WHERE
  36. sum_gross_cost > 100000
  37. ORDER BY
  38. sum_gross_cost,
  39. country DESC
  40. LIMIT
  41. 20