The Compiler

The basic architecture of the compiler consists of three layers. First you’ll implement a class html-compiler that has one slot that holds an adjustable vector that’s used to accumulate ops representing the calls made to the generic functions in the backend interface during the execution of process.

You’ll then implement methods on the generic functions in the backend interface that will store the sequence of actions in the vector. Each op is represented by a list consisting of a keyword naming the operation and the arguments passed to the function that generated the op. The function sexp->ops implements the first phase of the compiler, compiling a list of FOO forms by calling process on each form with an instance of html-compiler.

This vector of ops stored by the compiler is then passed to a function that optimizes it, merging consecutive raw-string ops into a single op that emits the combined string in one go. The optimization function can also, optionally, strip out ops that are needed only for pretty printing, which is mostly important because it allows you to merge more raw-string ops.

Finally, the optimized ops vector is passed to a third function, generate-code, that returns a list of Common Lisp expressions that will actually output the HTML. When *pretty* is true, generate-code generates code that uses the methods specialized on html-pretty-printer to output pretty HTML. When *pretty* is **NIL**, it generates code that writes directly to the stream *html-output*.

The macro html actually generates a body that contains two expansions, one generated with *pretty* bound to **T** and one with *pretty* bound to **NIL**. Which expansion is used is determined by the runtime value of *pretty*. Thus, every function that contains a call to html will contain code to generate both pretty and compact output.

The other significant difference between the compiler and the interpreter is that the compiler can embed Lisp forms in the code it generates. To take advantage of that, you need to modify the process function so it calls the embed-code and embed-value functions when asked to process an expression that’s not a FOO form. Since all self-evaluating objects are valid FOO forms, the only forms that won’t be passed to process-sexp-html are lists that don’t match the syntax for FOO cons forms and non-keyword symbols, the only atoms that aren’t self-evaluating. You can assume that any non-FOO cons is code to be run inline and all symbols are variables whose value you should embed.

  1. (defun process (processor form)
  2. (cond
  3. ((sexp-html-p form) (process-sexp-html processor form))
  4. ((consp form) (embed-code processor form))
  5. (t (embed-value processor form))))

Now let’s look at the compiler code. First you should define two functions that slightly abstract the vector you’ll use to save ops in the first two phases of compilation.

  1. (defun make-op-buffer () (make-array 10 :adjustable t :fill-pointer 0))
  2. (defun push-op (op ops-buffer) (vector-push-extend op ops-buffer))

Next you can define the html-compiler class and the methods specialized on it to implement the backend interface.

  1. (defclass html-compiler ()
  2. ((ops :accessor ops :initform (make-op-buffer))))
  3. (defmethod raw-string ((compiler html-compiler) string &optional newlines-p)
  4. (push-op `(:raw-string ,string ,newlines-p) (ops compiler)))
  5. (defmethod newline ((compiler html-compiler))
  6. (push-op '(:newline) (ops compiler)))
  7. (defmethod freshline ((compiler html-compiler))
  8. (push-op '(:freshline) (ops compiler)))
  9. (defmethod indent ((compiler html-compiler))
  10. (push-op `(:indent) (ops compiler)))
  11. (defmethod unindent ((compiler html-compiler))
  12. (push-op `(:unindent) (ops compiler)))
  13. (defmethod toggle-indenting ((compiler html-compiler))
  14. (push-op `(:toggle-indenting) (ops compiler)))
  15. (defmethod embed-value ((compiler html-compiler) value)
  16. (push-op `(:embed-value ,value ,*escapes*) (ops compiler)))
  17. (defmethod embed-code ((compiler html-compiler) code)
  18. (push-op `(:embed-code ,code) (ops compiler)))

With those methods defined, you can implement the first phase of the compiler, sexp->ops.

  1. (defun sexp->ops (body)
  2. (loop with compiler = (make-instance 'html-compiler)
  3. for form in body do (process compiler form)
  4. finally (return (ops compiler))))

During this phase you don’t need to worry about the value of *pretty*: just record all the functions called by process. Here’s what sexp->ops makes of a simple FOO form:

  1. HTML> (sexp->ops '((:p "Foo")))
  2. #((:FRESHLINE) (:RAW-STRING "<p" NIL) (:RAW-STRING ">" NIL)
  3. (:RAW-STRING "Foo" T) (:RAW-STRING "</p>" NIL) (:FRESHLINE))

The next phase, optimize-static-output, takes a vector of ops and returns a new vector containing the optimized version. The algorithm is simple—for each :raw-string op, it writes the string to a temporary string buffer. Thus, consecutive :raw-string ops will build up a single string containing the concatenation of the strings that need to be emitted. Whenever you encounter an op other than a :raw-string op, you convert the built-up string into a sequence of alternating :raw-string and :newline ops with the helper function compile-buffer and then add the next op. This function is also where you strip out the pretty printing ops if *pretty* is **NIL**.

  1. (defun optimize-static-output (ops)
  2. (let ((new-ops (make-op-buffer)))
  3. (with-output-to-string (buf)
  4. (flet ((add-op (op)
  5. (compile-buffer buf new-ops)
  6. (push-op op new-ops)))
  7. (loop for op across ops do
  8. (ecase (first op)
  9. (:raw-string (write-sequence (second op) buf))
  10. ((:newline :embed-value :embed-code) (add-op op))
  11. ((:indent :unindent :freshline :toggle-indenting)
  12. (when *pretty* (add-op op)))))
  13. (compile-buffer buf new-ops)))
  14. new-ops))
  15. (defun compile-buffer (buf ops)
  16. (loop with str = (get-output-stream-string buf)
  17. for start = 0 then (1+ pos)
  18. for pos = (position #\Newline str :start start)
  19. when (< start (length str))
  20. do (push-op `(:raw-string ,(subseq str start pos) nil) ops)
  21. when pos do (push-op '(:newline) ops)
  22. while pos))

The last step is to translate the ops into the corresponding Common Lisp code. This phase also pays attention to the value of *pretty*. When *pretty* is true, it generates code that invokes the backend generic functions on *html-pretty-printer*, which will be bound to an instance of html-pretty-printer. When *pretty* is **NIL**, it generates code that writes directly to *html-output*, the stream to which the pretty printer would send its output.

The actual function, generate-code, is trivial.

  1. (defun generate-code (ops)
  2. (loop for op across ops collect (apply #'op->code op)))

All the work is done by methods on the generic function op->code specializing the op argument with an **EQL** specializer on the name of the op.

  1. (defgeneric op->code (op &rest operands))
  2. (defmethod op->code ((op (eql :raw-string)) &rest operands)
  3. (destructuring-bind (string check-for-newlines) operands
  4. (if *pretty*
  5. `(raw-string *html-pretty-printer* ,string ,check-for-newlines)
  6. `(write-sequence ,string *html-output*))))
  7. (defmethod op->code ((op (eql :newline)) &rest operands)
  8. (if *pretty*
  9. `(newline *html-pretty-printer*)
  10. `(write-char #\Newline *html-output*)))
  11. (defmethod op->code ((op (eql :freshline)) &rest operands)
  12. (if *pretty*
  13. `(freshline *html-pretty-printer*)
  14. (error "Bad op when not pretty-printing: ~a" op)))
  15. (defmethod op->code ((op (eql :indent)) &rest operands)
  16. (if *pretty*
  17. `(indent *html-pretty-printer*)
  18. (error "Bad op when not pretty-printing: ~a" op)))
  19. (defmethod op->code ((op (eql :unindent)) &rest operands)
  20. (if *pretty*
  21. `(unindent *html-pretty-printer*)
  22. (error "Bad op when not pretty-printing: ~a" op)))
  23. (defmethod op->code ((op (eql :toggle-indenting)) &rest operands)
  24. (if *pretty*
  25. `(toggle-indenting *html-pretty-printer*)
  26. (error "Bad op when not pretty-printing: ~a" op)))

The two most interesting op->code methods are the ones that generate code for the :embed-value and :embed-code ops. In the :embed-value method, you can generate slightly different code depending on the value of the escapes operand since if escapes is **NIL**, you don’t need to generate a call to escape. And when both *pretty* and escapes are **NIL**, you can generate code that uses **PRINC** to emit the value directly to the stream.

  1. (defmethod op->code ((op (eql :embed-value)) &rest operands)
  2. (destructuring-bind (value escapes) operands
  3. (if *pretty*
  4. (if escapes
  5. `(raw-string *html-pretty-printer* (escape (princ-to-string ,value) ,escapes) t)
  6. `(raw-string *html-pretty-printer* (princ-to-string ,value) t))
  7. (if escapes
  8. `(write-sequence (escape (princ-to-string ,value) ,escapes) *html-output*)
  9. `(princ ,value *html-output*)))))

Thus, something like this:

  1. HTML> (let ((x 10)) (html (:p x)))
  2. <p>10</p>
  3. NIL

works because html translates (:p x) into something like this:

  1. (progn
  2. (write-sequence "<p>" *html-output*)
  3. (write-sequence (escape (princ-to-string x) "<>&") *html-output*)
  4. (write-sequence "</p>" *html-output*))

When that code replaces the call to html in the context of the **LET**, you get the following:

  1. (let ((x 10))
  2. (progn
  3. (write-sequence "<p>" *html-output*)
  4. (write-sequence (escape (princ-to-string x) "<>&") *html-output*)
  5. (write-sequence "</p>" *html-output*)))

and the reference to x in the generated code turns into a reference to the lexical variable from the **LET** surrounding the html form.

The :embed-code method, on the other hand, is interesting because it’s so trivial. Because process passed the form to embed-code, which stashed it in the :embed-code op, all you have to do is pull it out and return it.

  1. (defmethod op->code ((op (eql :embed-code)) &rest operands)
  2. (first operands))

This allows code like this to work:

  1. HTML> (html (:ul (dolist (x '(foo bar baz)) (html (:li x)))))
  2. <ul>
  3. <li>FOO</li>
  4. <li>BAR</li>
  5. <li>BAZ</li>
  6. </ul>
  7. NIL

The outer call to html expands into code that does something like this:

  1. (progn
  2. (write-sequence "<ul>" *html-output*)
  3. (dolist (x '(foo bar baz)) (html (:li x)))
  4. (write-sequence "</ul>" *html-output*))))

Then if you expand the call to html in the body of the **DOLIST**, you’ll get something like this:

  1. (progn
  2. (write-sequence "<ul>" *html-output*)
  3. (dolist (x '(foo bar baz))
  4. (progn
  5. (write-sequence "<li>" *html-output*)
  6. (write-sequence (escape (princ-to-string x) "<>&") *html-output*)
  7. (write-sequence "</li>" *html-output*)))
  8. (write-sequence "</ul>" *html-output*))

This code will, in fact, generate the output you saw.