4. Syntax and Semantics - Breaking Open the Black Box - 《Practical Common Lisp》

Breaking Open the Black Box

Breaking Open the Black Box

Before we look at the specifics of Lisp’s syntax and semantics, it’s worth taking a moment to look at how they’re defined and how this differs from many other languages.

In most programming languages, the language processor—whether an interpreter or a compiler—operates as a black box: you shove a sequence of characters representing the text of a program into the black box, and it—depending on whether it’s an interpreter or a compiler—either executes the behaviors indicated or produces a compiled version of the program that will execute the behaviors when it’s run.

Inside the black box, of course, language processors are usually divided into subsystems that are each responsible for one part of the task of translating a program text into behavior or object code. A typical division is to split the processor into three phases, each of which feeds into the next: a lexical analyzer breaks up the stream of characters into tokens and feeds them to a parser that builds a tree representing the expressions in the program, according to the language’s grammar. This tree—called an abstract syntax tree--is then fed to an evaluator that either interprets it directly or compiles it into some other language such as machine code. Because the language processor is a black box, the data structures used by the processor, such as the tokens and abstract syntax trees, are of interest only to the language implementer.

In Common Lisp things are sliced up a bit differently, with consequences for both the implementer and for how the language is defined. Instead of a single black box that goes from text to program behavior in one step, Common Lisp defines two black boxes, one that translates text into Lisp objects and another that implements the semantics of the language in terms of those objects. The first box is called the reader, and the second is called the evaluator.2

Each black box defines one level of syntax. The reader defines how strings of characters can be translated into Lisp objects called s-expressions.3 Since the s-expression syntax includes syntax for lists of arbitrary objects, including other lists, s-expressions can represent arbitrary tree expressions, much like the abstract syntax tree generated by the parsers for non-Lisp languages.

The evaluator then defines a syntax of Lisp forms that can be built out of s-expressions. Not all s-expressions are legal Lisp forms any more than all sequences of characters are legal s-expressions. For instance, both (foo 1 2) and ("foo" 1 2) are s-expressions, but only the former can be a Lisp form since a list that starts with a string has no meaning as a Lisp form.

This split of the black box has a couple of consequences. One is that you can use s-expressions, as you saw in Chapter 3, as an externalizable data format for data other than source code, using **READ** to read it and **PRINT** to print it.4 The other consequence is that since the semantics of the language are defined in terms of trees of objects rather than strings of characters, it’s easier to generate code within the language than it would be if you had to generate code as text. Generating code completely from scratch is only marginally easier—building up lists vs. building up strings is about the same amount of work. The real win, however, is that you can generate code by manipulating existing data. This is the basis for Lisp’s macros, which I’ll discuss in much more detail in future chapters. For now I’ll focus on the two levels of syntax defined by Common Lisp: the syntax of s-expressions understood by the reader and the syntax of Lisp forms understood by the evaluator.