The Compiler
The basic architecture of the compiler consists of three layers. First you’ll implement a class html-compiler
that has one slot that holds an adjustable vector that’s used to accumulate ops representing the calls made to the generic functions in the backend interface during the execution of process
.
You’ll then implement methods on the generic functions in the backend interface that will store the sequence of actions in the vector. Each op is represented by a list consisting of a keyword naming the operation and the arguments passed to the function that generated the op. The function sexp->ops
implements the first phase of the compiler, compiling a list of FOO forms by calling process
on each form with an instance of html-compiler
.
This vector of ops stored by the compiler is then passed to a function that optimizes it, merging consecutive raw-string
ops into a single op that emits the combined string in one go. The optimization function can also, optionally, strip out ops that are needed only for pretty printing, which is mostly important because it allows you to merge more raw-string
ops.
Finally, the optimized ops vector is passed to a third function, generate-code
, that returns a list of Common Lisp expressions that will actually output the HTML. When *pretty*
is true, generate-code
generates code that uses the methods specialized on html-pretty-printer
to output pretty HTML. When *pretty*
is **NIL**
, it generates code that writes directly to the stream *html-output*
.
The macro html
actually generates a body that contains two expansions, one generated with *pretty*
bound to **T**
and one with *pretty*
bound to **NIL**
. Which expansion is used is determined by the runtime value of *pretty*
. Thus, every function that contains a call to html
will contain code to generate both pretty and compact output.
The other significant difference between the compiler and the interpreter is that the compiler can embed Lisp forms in the code it generates. To take advantage of that, you need to modify the process
function so it calls the embed-code
and embed-value
functions when asked to process an expression that’s not a FOO form. Since all self-evaluating objects are valid FOO forms, the only forms that won’t be passed to process-sexp-html
are lists that don’t match the syntax for FOO cons forms and non-keyword symbols, the only atoms that aren’t self-evaluating. You can assume that any non-FOO cons is code to be run inline and all symbols are variables whose value you should embed.
(defun process (processor form)
(cond
((sexp-html-p form) (process-sexp-html processor form))
((consp form) (embed-code processor form))
(t (embed-value processor form))))
Now let’s look at the compiler code. First you should define two functions that slightly abstract the vector you’ll use to save ops in the first two phases of compilation.
(defun make-op-buffer () (make-array 10 :adjustable t :fill-pointer 0))
(defun push-op (op ops-buffer) (vector-push-extend op ops-buffer))
Next you can define the html-compiler
class and the methods specialized on it to implement the backend interface.
(defclass html-compiler ()
((ops :accessor ops :initform (make-op-buffer))))
(defmethod raw-string ((compiler html-compiler) string &optional newlines-p)
(push-op `(:raw-string ,string ,newlines-p) (ops compiler)))
(defmethod newline ((compiler html-compiler))
(push-op '(:newline) (ops compiler)))
(defmethod freshline ((compiler html-compiler))
(push-op '(:freshline) (ops compiler)))
(defmethod indent ((compiler html-compiler))
(push-op `(:indent) (ops compiler)))
(defmethod unindent ((compiler html-compiler))
(push-op `(:unindent) (ops compiler)))
(defmethod toggle-indenting ((compiler html-compiler))
(push-op `(:toggle-indenting) (ops compiler)))
(defmethod embed-value ((compiler html-compiler) value)
(push-op `(:embed-value ,value ,*escapes*) (ops compiler)))
(defmethod embed-code ((compiler html-compiler) code)
(push-op `(:embed-code ,code) (ops compiler)))
With those methods defined, you can implement the first phase of the compiler, sexp->ops
.
(defun sexp->ops (body)
(loop with compiler = (make-instance 'html-compiler)
for form in body do (process compiler form)
finally (return (ops compiler))))
During this phase you don’t need to worry about the value of *pretty*
: just record all the functions called by process
. Here’s what sexp->ops
makes of a simple FOO form:
HTML> (sexp->ops '((:p "Foo")))
#((:FRESHLINE) (:RAW-STRING "<p" NIL) (:RAW-STRING ">" NIL)
(:RAW-STRING "Foo" T) (:RAW-STRING "</p>" NIL) (:FRESHLINE))
The next phase, optimize-static-output
, takes a vector of ops and returns a new vector containing the optimized version. The algorithm is simple—for each :raw-string
op, it writes the string to a temporary string buffer. Thus, consecutive :raw-string
ops will build up a single string containing the concatenation of the strings that need to be emitted. Whenever you encounter an op other than a :raw-string
op, you convert the built-up string into a sequence of alternating :raw-string
and :newline
ops with the helper function compile-buffer
and then add the next op. This function is also where you strip out the pretty printing ops if *pretty*
is **NIL**
.
(defun optimize-static-output (ops)
(let ((new-ops (make-op-buffer)))
(with-output-to-string (buf)
(flet ((add-op (op)
(compile-buffer buf new-ops)
(push-op op new-ops)))
(loop for op across ops do
(ecase (first op)
(:raw-string (write-sequence (second op) buf))
((:newline :embed-value :embed-code) (add-op op))
((:indent :unindent :freshline :toggle-indenting)
(when *pretty* (add-op op)))))
(compile-buffer buf new-ops)))
new-ops))
(defun compile-buffer (buf ops)
(loop with str = (get-output-stream-string buf)
for start = 0 then (1+ pos)
for pos = (position #\Newline str :start start)
when (< start (length str))
do (push-op `(:raw-string ,(subseq str start pos) nil) ops)
when pos do (push-op '(:newline) ops)
while pos))
The last step is to translate the ops into the corresponding Common Lisp code. This phase also pays attention to the value of *pretty*
. When *pretty*
is true, it generates code that invokes the backend generic functions on *html-pretty-printer*
, which will be bound to an instance of html-pretty-printer
. When *pretty*
is **NIL**
, it generates code that writes directly to *html-output*
, the stream to which the pretty printer would send its output.
The actual function, generate-code
, is trivial.
(defun generate-code (ops)
(loop for op across ops collect (apply #'op->code op)))
All the work is done by methods on the generic function op->code
specializing the op
argument with an **EQL**
specializer on the name of the op.
(defgeneric op->code (op &rest operands))
(defmethod op->code ((op (eql :raw-string)) &rest operands)
(destructuring-bind (string check-for-newlines) operands
(if *pretty*
`(raw-string *html-pretty-printer* ,string ,check-for-newlines)
`(write-sequence ,string *html-output*))))
(defmethod op->code ((op (eql :newline)) &rest operands)
(if *pretty*
`(newline *html-pretty-printer*)
`(write-char #\Newline *html-output*)))
(defmethod op->code ((op (eql :freshline)) &rest operands)
(if *pretty*
`(freshline *html-pretty-printer*)
(error "Bad op when not pretty-printing: ~a" op)))
(defmethod op->code ((op (eql :indent)) &rest operands)
(if *pretty*
`(indent *html-pretty-printer*)
(error "Bad op when not pretty-printing: ~a" op)))
(defmethod op->code ((op (eql :unindent)) &rest operands)
(if *pretty*
`(unindent *html-pretty-printer*)
(error "Bad op when not pretty-printing: ~a" op)))
(defmethod op->code ((op (eql :toggle-indenting)) &rest operands)
(if *pretty*
`(toggle-indenting *html-pretty-printer*)
(error "Bad op when not pretty-printing: ~a" op)))
The two most interesting op->code
methods are the ones that generate code for the :embed-value
and :embed-code
ops. In the :embed-value
method, you can generate slightly different code depending on the value of the escapes
operand since if escapes
is **NIL**
, you don’t need to generate a call to escape
. And when both *pretty*
and escapes
are **NIL**
, you can generate code that uses **PRINC**
to emit the value directly to the stream.
(defmethod op->code ((op (eql :embed-value)) &rest operands)
(destructuring-bind (value escapes) operands
(if *pretty*
(if escapes
`(raw-string *html-pretty-printer* (escape (princ-to-string ,value) ,escapes) t)
`(raw-string *html-pretty-printer* (princ-to-string ,value) t))
(if escapes
`(write-sequence (escape (princ-to-string ,value) ,escapes) *html-output*)
`(princ ,value *html-output*)))))
Thus, something like this:
HTML> (let ((x 10)) (html (:p x)))
<p>10</p>
NIL
works because html
translates (:p x)
into something like this:
(progn
(write-sequence "<p>" *html-output*)
(write-sequence (escape (princ-to-string x) "<>&") *html-output*)
(write-sequence "</p>" *html-output*))
When that code replaces the call to html
in the context of the **LET**
, you get the following:
(let ((x 10))
(progn
(write-sequence "<p>" *html-output*)
(write-sequence (escape (princ-to-string x) "<>&") *html-output*)
(write-sequence "</p>" *html-output*)))
and the reference to x
in the generated code turns into a reference to the lexical variable from the **LET**
surrounding the html
form.
The :embed-code
method, on the other hand, is interesting because it’s so trivial. Because process
passed the form to embed-code
, which stashed it in the :embed-code
op, all you have to do is pull it out and return it.
(defmethod op->code ((op (eql :embed-code)) &rest operands)
(first operands))
This allows code like this to work:
HTML> (html (:ul (dolist (x '(foo bar baz)) (html (:li x)))))
<ul>
<li>FOO</li>
<li>BAR</li>
<li>BAZ</li>
</ul>
NIL
The outer call to html
expands into code that does something like this:
(progn
(write-sequence "<ul>" *html-output*)
(dolist (x '(foo bar baz)) (html (:li x)))
(write-sequence "</ul>" *html-output*))))
Then if you expand the call to html
in the body of the **DOLIST**
, you’ll get something like this:
(progn
(write-sequence "<ul>" *html-output*)
(dolist (x '(foo bar baz))
(progn
(write-sequence "<li>" *html-output*)
(write-sequence (escape (princ-to-string x) "<>&") *html-output*)
(write-sequence "</li>" *html-output*)))
(write-sequence "</ul>" *html-output*))
This code will, in fact, generate the output you saw.