Character Escaping
The first bit of the foundation you’ll need to lay is the code that knows how to escape characters with a special meaning in HTML. There are three such characters, and they must not appear in the text of an element or in an attribute value; they are <
, >
, and &
. In element text or attribute values, these characters must be replaced with the character reference entities <
, >
;, and &
. Similarly, in attribute values, the quotation marks used to delimit the value must be escaped, '
with '
and "
with "
. Additionally, any character can be represented by a numeric character reference entity consisting of an ampersand, followed by a sharp sign, followed by the numeric code as a base 10 integer, and followed by a semicolon. These numeric escapes are sometimes used to embed non-ASCII characters in HTML.
The Package
Since FOO is a low-level library, the package you develop it in doesn’t rely on much external code—just the usual dependency on names from the COMMON-LISP
package and, almost as usual, on the names of the macro-writing macros from COM.GIGAMONKEYS.MACRO-UTILITIES
. On the other hand, the package needs to export all the names needed by code that uses FOO. Here’s the **DEFPACKAGE**
from the source that you can download from the book’s Web site:
(defpackage :com.gigamonkeys.html
(:use :common-lisp :com.gigamonkeys.macro-utilities)
(:export :with-html-output
:in-html-style
:define-html-macro
:html
:emit-html
:&attributes))
The following function accepts a single character and returns a string containing a character reference entity for that character:
(defun escape-char (char)
(case char
(#\& "&")
(#\< "<")
(#\> ">")
(#\' "'")
(#\" """)
(t (format nil "&#~d;" (char-code char)))))
You can use this function as the basis for a function, escape
, that takes a string and a sequence of characters and returns a copy of the first argument with all occurrences of the characters in the second argument replaced with the corresponding character entity returned by escape-char
.
(defun escape (in to-escape)
(flet ((needs-escape-p (char) (find char to-escape)))
(with-output-to-string (out)
(loop for start = 0 then (1+ pos)
for pos = (position-if #'needs-escape-p in :start start)
do (write-sequence in out :start start :end pos)
when pos do (write-sequence (escape-char (char in pos)) out)
while pos))))
You can also define two parameters: *element-escapes*
, which contains the characters you need to escape in normal element data, and *attribute-escapes*
, which contains the set of characters to be escaped in attribute values.
(defparameter *element-escapes* "<>&")
(defparameter *attribute-escapes* "<>&\"'")
Here are some examples:
HTML> (escape "foo & bar" *element-escapes*)
"foo & bar"
HTML> (escape "foo & 'bar'" *element-escapes*)
"foo & 'bar'"
HTML> (escape "foo & 'bar'" *attribute-escapes*)
"foo & 'bar'"
Finally, you’ll need a variable, *escapes*
, that will be bound to the set of characters that need to be escaped. It’s initially set to the value of *element-escapes*
, but when generating attributes, it will, as you’ll see, be rebound to the value of *attribute-escapes*
.
(defvar *escapes* *element-escapes*)