S-expressions

The basic elements of s-expressions are lists and atoms. Lists are delimited by parentheses and can contain any number of whitespace-separated elements. Atoms are everything else.5 The elements of lists are themselves s-expressions (in other words, atoms or nested lists). Comments—which aren’t, technically speaking, s-expressions—start with a semicolon, extend to the end of a line, and are treated essentially like whitespace.

And that’s pretty much it. Since lists are syntactically so trivial, the only remaining syntactic rules you need to know are those governing the form of different kinds of atoms. In this section I’ll describe the rules for the most commonly used kinds of atoms: numbers, strings, and names. After that, I’ll cover how s-expressions composed of these elements can be evaluated as Lisp forms.

Numbers are fairly straightforward: any sequence of digits—possibly prefaced with a sign (+ or -), containing a decimal point (.) or a solidus (/), or ending with an exponent marker—is read as a number. For example:

  1. 123 ; the integer one hundred twenty-three
  2. 3/7 ; the ratio three-sevenths
  3. 1.0 ; the floating-point number one in default precision
  4. 1.0e0 ; another way to write the same floating-point number
  5. 1.0d0 ; the floating-point number one in "double" precision
  6. 1.0e-4 ; the floating-point equivalent to one-ten-thousandth
  7. +42 ; the integer forty-two
  8. -42 ; the integer negative forty-two
  9. -1/4 ; the ratio negative one-quarter
  10. -2/8 ; another way to write negative one-quarter
  11. 246/2 ; another way to write the integer one hundred twenty-three

These different forms represent different kinds of numbers: integers, ratios, and floating point. Lisp also supports complex numbers, which have their own notation and which I’ll discuss in Chapter 10.

As some of these examples suggest, you can notate the same number in many ways. But regardless of how you write them, all rationals—integers and ratios—are represented internally in “simplified” form. In other words, the objects that represent -2/8 or 246/2 aren’t distinct from the objects that represent -1/4 and 123. Similarly, 1.0 and 1.0e0 are just different ways of writing the same number. On the other hand, 1.0, 1.0d0, and 1 can all denote different objects because the different floating-point representations and integers are different types. We’ll save the details about the characteristics of different kinds of numbers for Chapter 10.

Strings literals, as you saw in the previous chapter, are enclosed in double quotes. Within a string a backslash (\) escapes the next character, causing it to be included in the string regardless of what it is. The only two characters that must be escaped within a string are double quotes and the backslash itself. All other characters can be included in a string literal without escaping, regardless of their meaning outside a string. Some example string literals are as follows:

  1. "foo" ; the string containing the characters f, o, and o.
  2. "fo\o" ; the same string
  3. "fo\\o" ; the string containing the characters f, o, \, and o.
  4. "fo\"o" ; the string containing the characters f, o, ", and o.

Names used in Lisp programs, such as **FORMAT** and hello-world, and *db* are represented by objects called symbols. The reader knows nothing about how a given name is going to be used—whether it’s the name of a variable, a function, or something else. It just reads a sequence of characters and builds an object to represent the name.6 Almost any character can appear in a name. Whitespace characters can’t, though, because the elements of lists are separated by whitespace. Digits can appear in names as long as the name as a whole can’t be interpreted as a number. Similarly, names can contain periods, but the reader can’t read a name that consists only of periods. Ten characters that serve other syntactic purposes can’t appear in names: open and close parentheses, double and single quotes, backtick, comma, colon, semicolon, backslash, and vertical bar. And even those characters can, if you’re willing to escape them by preceding the character to be escaped with a backslash or by surrounding the part of the name containing characters that need escaping with vertical bars.

Two important characteristics of the way the reader translates names to symbol objects have to do with how it treats the case of letters in names and how it ensures that the same name is always read as the same symbol. While reading names, the reader converts all unescaped characters in a name to their uppercase equivalents. Thus, the reader will read foo, Foo, and FOO as the same symbol: FOO. However, \f\o\o and |foo| will both be read as foo, which is a different object than the symbol FOO. This is why when you define a function at the REPL and it prints the name of the function, it’s been converted to uppercase. Standard style, these days, is to write code in all lowercase and let the reader change names to uppercase.7

To ensure that the same textual name is always read as the same symbol, the reader interns symbols—after it has read the name and converted it to all uppercase, the reader looks in a table called a package for an existing symbol with the same name. If it can’t find one, it creates a new symbol and adds it to the table. Otherwise, it returns the symbol already in the table. Thus, anywhere the same name appears in any s-expression, the same object will be used to represent it.8

Because names can contain many more characters in Lisp than they can in Algol-derived languages, certain naming conventions are distinct to Lisp, such as the use of hyphenated names like hello-world. Another important convention is that global variables are given names that start and end with *. Similarly, constants are given names starting and ending in +. And some programmers will name particularly low-level functions with names that start with % or even %%. The names defined in the language standard use only the alphabetic characters (A-Z) plus *, +, -, /, 1, 2, <, =, >, and &.

The syntax for lists, numbers, strings, and symbols can describe a good percentage of Lisp programs. Other rules describe notations for literal vectors, individual characters, and arrays, which I’ll cover when I talk about the associated data types in Chapters 10 and 11. For now the key thing to understand is how you can combine numbers, strings, and symbols with parentheses-delimited lists to build s-expressions representing arbitrary trees of objects. Some simple examples look like this:

  1. x ; the symbol X
  2. () ; the empty list
  3. (1 2 3) ; a list of three numbers
  4. ("foo" "bar") ; a list of two strings
  5. (x y z) ; a list of three symbols
  6. (x 1 "foo") ; a list of a symbol, a number, and a string
  7. (+ (* 2 3) 4) ; a list of a symbol, a list, and a number.

An only slightly more complex example is the following four-item list that contains two symbols, the empty list, and another list, itself containing two symbols and a string:

  1. (defun hello-world ()
  2. (format t "hello, world"))