Queries with HAVING
The SELECT
statement has two different sections which are used for conditions: WHERE
and HAVING
. The WHERE
section is used more often and contains conditions which will be applied to each row. If a query contains aggregate functions, such as MAX
or SUM
, the SELECT
statement may also contain GROUP BY
and HAVING
sections. The conditions of the HAVING
section are applied after grouping the SQL query results. Typically the conditions of the HAVING
section always contain aggregate functions, while conditions in the WHERE
section may only contain aggregate functions inside a subquery.
When you write a query which contains aggregate functions, Pony needs to determine if the resulting SQL will contain the GROUP BY
and HAVING
sections and where it should put each condition from the Python query. If a condition contains an aggregate function, Pony places the condition into the HAVING
section. Otherwise it places the condition into the WHERE
section.
Consider the following query, which returns the tuples (Group
, count_of_students):
select((s.group, count(s)) for s in Student
if s.group.dept.number == 44 and avg(s.gpa) > 4)
In this query we have two conditions. The first condition is s.group.dept.number == 44
. Since it doesn`t include an aggregate function, Pony will place this condition into the WHERE
section. The second condition avg(s.gpa) > 4
contains the aggregate function avg
and will be placed into the HAVING
section.
Another question is what columns Pony should add to the GROUP BY
section. According to the SQL standard, any non-aggregated column which placed into the SELECT
statement should be added to the GROUP BY
section too. Let’s consider the following query:
SELECT A, B, C, SUM(D), MAX(E), COUNT(F)
FROM T1
WHERE ...
GROUP BY ...
HAVING ...
According to the SQL standard, we need to include the columns A
, B
and C
into the GROUP BY
section, because these columns appear in the SELECT
list and don’t wrapped with any aggregate function. Pony does exactly this. If your aggregated Pony query returns a tuple with several expressions, any non-aggregated expression will be placed into the GROUP BY
section. Let’s consider the same Pony query again:
select((s.group, count(s)) for s in Student
if s.group.dept.number == 44 and avg(s.gpa) > 4)
This query returns the tuples (Group
, count_of_students). The first element of the tuple, the Group
instance, is not aggregated, so it will be placed into the GROUP BY
section:
SELECT "s"."group", COUNT(DISTINCT "s"."id")
FROM "Student" "s", "Group" "group-1"
WHERE "group-1"."dept" = 44
AND "s"."group" = "group-1"."number"
GROUP BY "s"."group"
HAVING AVG("s"."gpa") > 4
The s.group
expression was placed into the GROUP BY
section, and the condition avg(s.gpa) > 4
was placed into the HAVING
section of the query.
Sometimes the condition which should be placed into the HAVING
section contains some non-aggregated columns. Such columns will be added to the GROUP BY
section, because according to the SQL standard it is forbidden to use a non-aggregated column inside the HAVING
section, if it was not added to the GROUP BY
list.
Another example:
select((item.order, item.order.total_price,
sum(item.price * item.quantity))
for item in OrderItem
if item.order.total_price < sum(item.price * item.quantity))
This query has the following condition: item.order.total_price < sum(item.price * item.quantity)
, which contains an aggregate function and should be added to the HAVING
section. But the part item.order.total_price
is not aggregated. Hence, it will be added to the GROUP BY
section in order to satisfy the SQL requirements.