- Aggregation
- Count the number of facilities
- Count the number of expensive facilities
- Count the number of recommendations each member makes.
- List the total slots booked per facility
- List the total slots booked per facility in a given month
- List the total slots booked per facility per month
- Find the count of members who have made at least one booking
- List facilities with more than 1000 slots booked
- Find the total revenue of each facility
- Find facilities with a total revenue less than 1000
- Output the facility id that has the highest number of slots booked
- List the total slots booked per facility per month, part 2
- List the total hours booked per named facility
- List each member’s first booking after September 1st 2012
- Produce a list of member names, with each row containing the total member count
- Produce a numbered list of members
- Output the facility id that has the highest number of slots booked, again
- Rank members by (rounded) hours used
- Find the top three revenue generating facilities
- Classify facilities by value
Aggregation
Aggregation is one of those capabilities that really make you appreciate the power of relational database systems. It allows you to move beyond merely persisting your data, into the realm of asking truly interesting questions that can be used to inform decision making. This category covers aggregation at length, making use of standard grouping as well as more recent window functions.
Count the number of facilities
For our first foray into aggregates, we’re going to stick to something simple. We want to know how many facilities exist - simply produce a total count.
SELECT COUNT(facid) FROM facilities;
query = Facility.select(fn.COUNT(Facility.facid))
count = query.scalar()
# OR:
count = Facility.select().count()
Count the number of expensive facilities
Produce a count of the number of facilities that have a cost to guests of 10 or more.
SELECT COUNT(facid) FROM facilities WHERE guestcost >= 10
query = Facility.select(fn.COUNT(Facility.facid)).where(Facility.guestcost >= 10)
count = query.scalar()
# OR:
# count = Facility.select().where(Facility.guestcost >= 10).count()
Count the number of recommendations each member makes.
Produce a count of the number of recommendations each member has made. Order by member ID.
SELECT recommendedby, COUNT(memid) FROM members
WHERE recommendedby IS NOT NULL
GROUP BY recommendedby
ORDER BY recommendedby
query = (Member
.select(Member.recommendedby, fn.COUNT(Member.memid))
.where(Member.recommendedby.is_null(False))
.group_by(Member.recommendedby)
.order_by(Member.recommendedby))
List the total slots booked per facility
Produce a list of the total number of slots booked per facility. For now, just produce an output table consisting of facility id and slots, sorted by facility id.
SELECT facid, SUM(slots) FROM bookings GROUP BY facid ORDER BY facid;
query = (Booking
.select(Booking.facid, fn.SUM(Booking.slots))
.group_by(Booking.facid)
.order_by(Booking.facid))
List the total slots booked per facility in a given month
Produce a list of the total number of slots booked per facility in the month of September 2012. Produce an output table consisting of facility id and slots, sorted by the number of slots.
SELECT facid, SUM(slots)
FROM bookings
WHERE (date_trunc('month', starttime) = '2012-09-01'::dates)
GROUP BY facid
ORDER BY SUM(slots)
query = (Booking
.select(Booking.facility, fn.SUM(Booking.slots))
.where(fn.date_trunc('month', Booking.starttime) == datetime.date(2012, 9, 1))
.group_by(Booking.facility)
.order_by(fn.SUM(Booking.slots)))
List the total slots booked per facility per month
Produce a list of the total number of slots booked per facility per month in the year of 2012. Produce an output table consisting of facility id and slots, sorted by the id and month.
SELECT facid, date_part('month', starttime), SUM(slots)
FROM bookings
WHERE date_part('year', starttime) = 2012
GROUP BY facid, date_part('month', starttime)
ORDER BY facid, date_part('month', starttime)
month = fn.date_part('month', Booking.starttime)
query = (Booking
.select(Booking.facility, month, fn.SUM(Booking.slots))
.where(fn.date_part('year', Booking.starttime) == 2012)
.group_by(Booking.facility, month)
.order_by(Booking.facility, month))
Find the count of members who have made at least one booking
Find the total number of members who have made at least one booking.
SELECT COUNT(DISTINCT memid) FROM bookings
-- OR --
SELECT COUNT(1) FROM (SELECT DISTINCT memid FROM bookings) AS _
query = Booking.select(fn.COUNT(Booking.member.distinct()))
# OR:
query = Booking.select(Booking.member).distinct()
count = query.count() # count() wraps in SELECT COUNT(1) FROM (...)
List facilities with more than 1000 slots booked
Produce a list of facilities with more than 1000 slots booked. Produce an output table consisting of facility id and hours, sorted by facility id.
SELECT facid, SUM(slots) FROM bookings
GROUP BY facid
HAVING SUM(slots) > 1000
ORDER BY facid;
query = (Booking
.select(Booking.facility, fn.SUM(Booking.slots))
.group_by(Booking.facility)
.having(fn.SUM(Booking.slots) > 1000)
.order_by(Booking.facility))
Find the total revenue of each facility
Produce a list of facilities along with their total revenue. The output table should consist of facility name and revenue, sorted by revenue. Remember that there’s a different cost for guests and members!
SELECT f.name, SUM(b.slots * (
CASE WHEN b.memid = 0 THEN f.guestcost ELSE f.membercost END)) AS revenue
FROM bookings AS b
INNER JOIN facilities AS f ON b.facid = f.facid
GROUP BY f.name
ORDER BY revenue;
revenue = fn.SUM(Booking.slots * Case(None, (
(Booking.member == 0, Facility.guestcost),
), Facility.membercost))
query = (Facility
.select(Facility.name, revenue.alias('revenue'))
.join(Booking)
.group_by(Facility.name)
.order_by(SQL('revenue')))
Find facilities with a total revenue less than 1000
Produce a list of facilities with a total revenue less than 1000. Produce an output table consisting of facility name and revenue, sorted by revenue. Remember that there’s a different cost for guests and members!
SELECT f.name, SUM(b.slots * (
CASE WHEN b.memid = 0 THEN f.guestcost ELSE f.membercost END)) AS revenue
FROM bookings AS b
INNER JOIN facilities AS f ON b.facid = f.facid
GROUP BY f.name
HAVING SUM(b.slots * ...) < 1000
ORDER BY revenue;
# Same definition as previous example.
revenue = fn.SUM(Booking.slots * Case(None, (
(Booking.member == 0, Facility.guestcost),
), Facility.membercost))
query = (Facility
.select(Facility.name, revenue.alias('revenue'))
.join(Booking)
.group_by(Facility.name)
.having(revenue < 1000)
.order_by(SQL('revenue')))
Output the facility id that has the highest number of slots booked
Output the facility id that has the highest number of slots booked.
SELECT facid, SUM(slots) FROM bookings
GROUP BY facid
ORDER BY SUM(slots) DESC
LIMIT 1
query = (Booking
.select(Booking.facility, fn.SUM(Booking.slots))
.group_by(Booking.facility)
.order_by(fn.SUM(Booking.slots).desc())
.limit(1))
# Retrieve multiple scalar values by calling scalar() with as_tuple=True.
facid, nslots = query.scalar(as_tuple=True)
List the total slots booked per facility per month, part 2
Produce a list of the total number of slots booked per facility per month in the year of 2012. In this version, include output rows containing totals for all months per facility, and a total for all months for all facilities. The output table should consist of facility id, month and slots, sorted by the id and month. When calculating the aggregated values for all months and all facids, return null values in the month and facid columns.
Postgres ONLY.
SELECT facid, date_part('month', starttime), SUM(slots)
FROM booking
WHERE date_part('year', starttime) = 2012
GROUP BY ROLLUP(facid, date_part('month', starttime))
ORDER BY facid, date_part('month', starttime)
month = fn.date_part('month', Booking.starttime)
query = (Booking
.select(Booking.facility,
month.alias('month'),
fn.SUM(Booking.slots))
.where(fn.date_part('year', Booking.starttime) == 2012)
.group_by(fn.ROLLUP(Booking.facility, month))
.order_by(Booking.facility, month))
List the total hours booked per named facility
Produce a list of the total number of hours booked per facility, remembering that a slot lasts half an hour. The output table should consist of the facility id, name, and hours booked, sorted by facility id.
SELECT f.facid, f.name, SUM(b.slots) * .5
FROM facilities AS f
INNER JOIN bookings AS b ON (f.facid = b.facid)
GROUP BY f.facid, f.name
ORDER BY f.facid
query = (Facility
.select(Facility.facid, Facility.name, fn.SUM(Booking.slots) * .5)
.join(Booking)
.group_by(Facility.facid, Facility.name)
.order_by(Facility.facid))
List each member’s first booking after September 1st 2012
Produce a list of each member name, id, and their first booking after September 1st 2012. Order by member ID.
SELECT m.surname, m.firstname, m.memid, min(b.starttime) as starttime
FROM members AS m
INNER JOIN bookings AS b ON b.memid = m.memid
WHERE starttime >= '2012-09-01'
GROUP BY m.surname, m.firstname, m.memid
ORDER BY m.memid;
query = (Member
.select(Member.surname, Member.firstname, Member.memid,
fn.MIN(Booking.starttime).alias('starttime'))
.join(Booking)
.where(Booking.starttime >= datetime.date(2012, 9, 1))
.group_by(Member.surname, Member.firstname, Member.memid)
.order_by(Member.memid))
Produce a list of member names, with each row containing the total member count
Produce a list of member names, with each row containing the total member count. Order by join date.
Postgres ONLY (as written).
SELECT COUNT(*) OVER(), firstname, surname
FROM members ORDER BY joindate
query = (Member
.select(fn.COUNT(Member.memid).over(), Member.firstname,
Member.surname)
.order_by(Member.joindate))
Produce a numbered list of members
Produce a monotonically increasing numbered list of members, ordered by their date of joining. Remember that member IDs are not guaranteed to be sequential.
Postgres ONLY (as written).
SELECT row_number() OVER (ORDER BY joindate), firstname, surname
FROM members ORDER BY joindate;
query = (Member
.select(fn.row_number().over(order_by=[Member.joindate]),
Member.firstname, Member.surname)
.order_by(Member.joindate))
Output the facility id that has the highest number of slots booked, again
Output the facility id that has the highest number of slots booked. Ensure that in the event of a tie, all tieing results get output.
Postgres ONLY (as written).
SELECT facid, total FROM (
SELECT facid, SUM(slots) AS total,
rank() OVER (order by SUM(slots) DESC) AS rank
FROM bookings
GROUP BY facid
) AS ranked WHERE rank = 1
rank = fn.rank().over(order_by=[fn.SUM(Booking.slots).desc()])
subq = (Booking
.select(Booking.facility, fn.SUM(Booking.slots).alias('total'),
rank.alias('rank'))
.group_by(Booking.facility))
# Here we use a plain Select() to create our query.
query = (Select(columns=[subq.c.facid, subq.c.total])
.from_(subq)
.where(subq.c.rank == 1)
.bind(db)) # We must bind() it to the database.
# To iterate over the query results:
for facid, total in query.tuples():
print(facid, total)
Rank members by (rounded) hours used
Produce a list of members, along with the number of hours they’ve booked in facilities, rounded to the nearest ten hours. Rank them by this rounded figure, producing output of first name, surname, rounded hours, rank. Sort by rank, surname, and first name.
Postgres ONLY (as written).
SELECT firstname, surname,
((SUM(bks.slots)+10)/20)*10 as hours,
rank() over (order by ((sum(bks.slots)+10)/20)*10 desc) as rank
FROM members AS mems
INNER JOIN bookings AS bks ON mems.memid = bks.memid
GROUP BY mems.memid
ORDER BY rank, surname, firstname;
hours = ((fn.SUM(Booking.slots) + 10) / 20) * 10
query = (Member
.select(Member.firstname, Member.surname, hours.alias('hours'),
fn.rank().over(order_by=[hours.desc()]).alias('rank'))
.join(Booking)
.group_by(Member.memid)
.order_by(SQL('rank'), Member.surname, Member.firstname))
Find the top three revenue generating facilities
Produce a list of the top three revenue generating facilities (including ties). Output facility name and rank, sorted by rank and facility name.
Postgres ONLY (as written).
SELECT name, rank FROM (
SELECT f.name, RANK() OVER (ORDER BY SUM(
CASE WHEN memid = 0 THEN slots * f.guestcost
ELSE slots * f.membercost END) DESC) AS rank
FROM bookings
INNER JOIN facilities AS f ON bookings.facid = f.facid
GROUP BY f.name) AS subq
WHERE rank <= 3
ORDER BY rank;
total_cost = fn.SUM(Case(None, (
(Booking.member == 0, Booking.slots * Facility.guestcost),
), (Booking.slots * Facility.membercost)))
subq = (Facility
.select(Facility.name,
fn.RANK().over(order_by=[total_cost.desc()]).alias('rank'))
.join(Booking)
.group_by(Facility.name))
query = (Select(columns=[subq.c.name, subq.c.rank])
.from_(subq)
.where(subq.c.rank <= 3)
.order_by(subq.c.rank)
.bind(db)) # Here again we used plain Select, and call bind().
Classify facilities by value
Classify facilities into equally sized groups of high, average, and low based on their revenue. Order by classification and facility name.
Postgres ONLY (as written).
SELECT name,
CASE class WHEN 1 THEN 'high' WHEN 2 THEN 'average' ELSE 'low' END
FROM (
SELECT f.name, ntile(3) OVER (ORDER BY SUM(
CASE WHEN memid = 0 THEN slots * f.guestcost ELSE slots * f.membercost
END) DESC) AS class
FROM bookings INNER JOIN facilities AS f ON bookings.facid = f.facid
GROUP BY f.name
) AS subq
ORDER BY class, name;
cost = fn.SUM(Case(None, (
(Booking.member == 0, Booking.slots * Facility.guestcost),
), (Booking.slots * Facility.membercost)))
subq = (Facility
.select(Facility.name,
fn.NTILE(3).over(order_by=[cost.desc()]).alias('klass'))
.join(Booking)
.group_by(Facility.name))
klass_case = Case(subq.c.klass, [(1, 'high'), (2, 'average')], 'low')
query = (Select(columns=[subq.c.name, klass_case])
.from_(subq)
.order_by(subq.c.klass, subq.c.name)
.bind(db))