Querying the Database
Once you’ve loaded your database with data, you’ll need a way to query it. For the MP3 application you’ll need a slightly more sophisticated query function than you wrote in Chapter 3. This time around you want not only to be able to select rows matching particular criteria but also to limit the results to particular columns, to limit the results to unique rows, and perhaps to sort the rows by particular columns. In keeping with the spirit of relational database theory, the result of a query will be a new table
object containing the desired rows and columns.
The query function you’ll write, select
, is loosely modeled on the SELECT
statement from Structured Query Language (SQL). It’ll take five keyword parameters: :from
, :columns
, :where
, :distinct
, and :order-by
. The :from
argument is the table
object you want to query. The :columns
argument specifies which columns should be included in the result. The value should be a list of column names, a single column name, or a **T**
, the default, meaning return all columns. The :where
argument, if provided, should be a function that accepts a row and returns true if it should be included in the results. In a moment, you’ll write two functions, matching
and in
, that return functions appropriate for use as :where
arguments. The :order-by
argument, if supplied, should be a list of column names; the results will be sorted by the named columns. As with the :columns
argument, you can specify a single column using just the name, which is equivalent to a one-item list containing the same name. Finally, the :distinct
argument is a boolean that says whether to eliminate duplicate rows from the results. The default value for :distinct
is **NIL**
.
Here are some examples of using select
:
;; Select all rows where the :artist column is "Green Day"
(select :from *mp3s* :where (matching *mp3s* :artist "Green Day"))
;; Select a sorted list of artists with songs in the genre "Rock"
(select
:columns :artist
:from *mp3s*
:where (matching *mp3s* :genre "Rock")
:distinct t
:order-by :artist)
The implementation of select
with its immediate helper functions looks like this:
(defun select (&key (columns t) from where distinct order-by)
(let ((rows (rows from))
(schema (schema from)))
(when where
(setf rows (restrict-rows rows where)))
(unless (eql columns 't)
(setf schema (extract-schema (mklist columns) schema))
(setf rows (project-columns rows schema)))
(when distinct
(setf rows (distinct-rows rows schema)))
(when order-by
(setf rows (sorted-rows rows schema (mklist order-by))))
(make-instance 'table :rows rows :schema schema)))
(defun mklist (thing)
(if (listp thing) thing (list thing)))
(defun extract-schema (column-names schema)
(loop for c in column-names collect (find-column c schema)))
(defun find-column (column-name schema)
(or (find column-name schema :key #'name)
(error "No column: ~a in schema: ~a" column-name schema)))
(defun restrict-rows (rows where)
(remove-if-not where rows))
(defun project-columns (rows schema)
(map 'vector (extractor schema) rows))
(defun distinct-rows (rows schema)
(remove-duplicates rows :test (row-equality-tester schema)))
(defun sorted-rows (rows schema order-by)
(sort (copy-seq rows) (row-comparator order-by schema)))
Of course, the really interesting part of select
is how you implement the functions extractor
, row-equality-tester
, and row-comparator
.
As you can tell by how they’re used, each of these functions must return a function. For instance, project-columns
uses the value returned by extractor
as the function argument to **MAP**
. Since the purpose of project-columns
is to return a set of rows with only certain column values, you can infer that extractor
returns a function that takes a row as an argument and returns a new row containing only the columns specified in the schema it’s passed. Here’s how you can implement it:
(defun extractor (schema)
(let ((names (mapcar #'name schema)))
#'(lambda (row)
(loop for c in names collect c collect (getf row c)))))
Note how you can do the work of extracting the names from the schema outside the body of the closure: since the closure will be called many times, you want it to do as little work as possible each time it’s called.
The functions row-equality-tester
and row-comparator
are implemented in a similar way. To decide whether two rows are equivalent, you need to apply the appropriate equality predicate for each column to the appropriate column values. Recall from Chapter 22 that the **LOOP**
clause always
will return **NIL**
as soon as a pair of values fails their test or will cause the **LOOP**
to return **T**
.
(defun row-equality-tester (schema)
(let ((names (mapcar #'name schema))
(tests (mapcar #'equality-predicate schema)))
#'(lambda (a b)
(loop for name in names and test in tests
always (funcall test (getf a name) (getf b name))))))
Ordering two rows is a bit more complex. In Lisp, comparator functions return true if their first argument should be sorted ahead of the second and **NIL**
otherwise. Thus, a **NIL**
can mean that the second argument should be sorted ahead of the first or that they’re equivalent. You want your row comparators to behave the same way: return **T**
if the first row should be sorted ahead of the second and **NIL**
otherwise.
Thus, to compare two rows, you should compare the values from the columns you’re sorting by, in order, using the appropriate comparator for each column. First call the comparator with the value from the first row as the first argument. If the comparator returns true, that means the first row should definitely be sorted ahead of the second row, so you can immediately return **T**
.
But if the column comparator returns **NIL**
, then you need to determine whether that’s because the second value should sort ahead of the first value or because they’re equivalent. So you should call the comparator again with the arguments reversed. If the comparator returns true this time, it means the second column value sorts ahead of the first and thus the second row ahead of the first row, so you can return **NIL**
immediately. Otherwise, the column values are equivalent, and you need to move onto the next column. If you get through all the columns without one row’s value ever winning the comparison, then the rows are equivalent, and you return **NIL**
. A function that implements this algorithm looks like this:
(defun row-comparator (column-names schema)
(let ((comparators (mapcar #'comparator (extract-schema column-names schema))))
#'(lambda (a b)
(loop
for name in column-names
for comparator in comparators
for a-value = (getf a name)
for b-value = (getf b name)
when (funcall comparator a-value b-value) return t
when (funcall comparator b-value a-value) return nil
finally (return nil)))))