Make It Work, Make It Right, Make It Fast
As has been said many times, and variously attributed to Donald Knuth, C.A.R. Hoare, and Edsger Dijkstra, premature optimization is the root of all evil.4 Common Lisp is an excellent language to program in if you want to heed this wisdom yet still need high performance. This may come as a surprise if you’ve heard the conventional wisdom that Lisp is slow. In Lisp’s earliest days, when computers were programmed with punch cards, Lisp’s high-level features may have doomed it to be slower than the competition, namely, assembly and FORTRAN. But that was a long time ago. In the meantime, Lisp has been used for everything from creating complex AI systems to writing operating systems, and a lot of work has gone into figuring out how to compile Lisp into efficient code. In this section I’ll talk about some of the reasons why Common Lisp is an excellent language for writing high-performance code and some of the techniques for doing so.
The first reason that Lisp is an excellent language for writing high-performance code is, ironically enough, the dynamic nature of Lisp programming—the very thing that originally made it hard to bring Lisp’s performance up to the levels achieved by FORTRAN compilers. The reason Common Lisp’s dynamic features make it easier to write high-performance code is that the first step to writing efficient code is to find the right algorithms and data structures.
Common Lisp’s dynamic features keep code flexible, which makes it easier to try different approaches. Given a finite amount of time to write a program, you’re much more likely to end up with a high-performance version if you don’t spend a lot of time getting into and out of dead ends. In Common Lisp, you can try an idea, see it’s going nowhere, and move on without having spent a ton of time convincing the compiler your code is worthy of being run and then waiting for it to finish compiling. You can write a straightforward but inefficient version of a function—a code sketch--to determine whether your basic approach is sound and then replace that function with a more complex but more efficient implementation if you determine that it is. And if the overall approach turns out to be flawed, then you haven’t wasted a bunch of time tuning a function that’s no longer needed, which means you have more time to find a better approach.
The next reason Common Lisp is a good language for developing high-performance software is that most Common Lisp implementations come with mature compilers that generate quite efficient machine code. I’ll talk in a moment about how to help these compilers generate code that will be competitive with code generated by C compilers, but these implementations already are quite a bit faster than those of languages whose implementations are less mature and use simpler compilers or interpreters. Also, since the Lisp compiler is available at runtime, the Lisp programmer has some possibilities that would be hard to emulate in other languages—your programs can generate Lisp code at runtime that’s then compiled into machine code and run. If the generated code is going to run enough times, this can be a big win. Or, even without using the compiler at runtime, closures give you another way to meld machine code with runtime data. For instance, the CL-PPCRE regular expression library, running in CMUCL, is faster than Perl’s regular expression engine on some benchmarks, even though Perl’s engine is written in highly tuned C. This is presumably because in Perl a regular expression is translated into what are essentially bytecodes that are then interpreted by the regex engine, while CL-PPCRE translates a regular expression into a tree of compiled closures that invoke each other via the normal function-calling machinery.5
However, even with the right algorithm and a high-quality compiler, you may not get the raw speed you need. Then it’s time to think about profiling and tuning. The key, in Lisp as in any language, is to profile first to find the spots where your program is actually spending its time and then worry about speeding up those parts.6
You have a number of different ways to approach profiling. The language standard provides a few rudimentary tools for measuring how long certain forms take to execute. In particular, the **TIME**
macro can be wrapped around any form and will return whatever values the form returns after printing a message to ***TRACE-OUTPUT***
about how long it took to run and how much memory it used. The exact form of the message is implementation defined.
You can use **TIME**
for a bit of quick-and-dirty profiling to narrow your search for bottlenecks. For instance, suppose you have a function that’s taking a long time to run and that calls two other functions—something like this:
(defun foo ()
(bar)
(baz))
If you want to see whether bar
or baz
is taking more time, you can change the definition of foo
to this:
(defun foo ()
(time (bar))
(time (baz)))
Now you can call foo
, and Lisp will print two reports, one for bar
and one for baz
. The form is implementation dependent; here’s what it looks like in Allegro Common Lisp:
CL-USER> (foo)
; cpu time (non-gc) 60 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 60 msec user, 0 msec system
; real time 105 msec
; space allocation:
; 24,172 cons cells, 1,696 other bytes, 0 static bytes
; cpu time (non-gc) 540 msec user, 10 msec system
; cpu time (gc) 170 msec user, 0 msec system
; cpu time (total) 710 msec user, 10 msec system
; real time 1,046 msec
; space allocation:
; 270,172 cons cells, 1,696 other bytes, 0 static bytes
Of course, that’d be a bit easier to read if the output included a label. If you use this technique a lot, it might be worth defining your own macro like this:
(defmacro labeled-time (form)
`(progn
(format *trace-output* "~2&~a" ',form)
(time ,form)))
If you replace **TIME**
with labeled-time
in foo
, you’ll get this output:
CL-USER> (foo)
(BAR)
; cpu time (non-gc) 60 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 60 msec user, 0 msec system
; real time 131 msec
; space allocation:
; 24,172 cons cells, 1,696 other bytes, 0 static bytes
(BAZ)
; cpu time (non-gc) 490 msec user, 0 msec system
; cpu time (gc) 190 msec user, 10 msec system
; cpu time (total) 680 msec user, 10 msec system
; real time 1,088 msec
; space allocation:
; 270,172 cons cells, 1,696 other bytes, 0 static bytes
From this output, it’s clear that most of the time in foo
is spent in baz
.
Of course, the output from **TIME**
gets a bit unwieldy if the form you want to profile is called repeatedly. You can build your own measurement tools using the functions **GET-INTERNAL-REAL-TIME**
and **GET-INTERNAL-RUN-TIME**
, which return a number that increases by the value of the constant **INTERNAL-TIME-UNITS-PER-SECOND**
each second. **GET-INTERNAL-REAL-TIME**
measures wall time, the actual amount of time elapsed, while **GET-INTERNAL-RUN-TIME**
measures some implementation-defined value such as the amount of time Lisp was actually executing or the time Lisp was executing user code and not internal bookkeeping such as the garbage collector. Here’s a trivial but useful profiling tool built with a few macros and **GET-INTERNAL-RUN-TIME**
:
(defparameter *timing-data* ())
(defmacro with-timing (label &body body)
(with-gensyms (start)
`(let ((,start (get-internal-run-time)))
(unwind-protect (progn ,@body)
(push (list ',label ,start (get-internal-run-time)) *timing-data*)))))
(defun clear-timing-data ()
(setf *timing-data* ()))
(defun show-timing-data ()
(loop for (label time count time-per %-of-total) in (compile-timing-data) do
(format t "~3d% ~a: ~d ticks over ~d calls for ~d per.~%"
%-of-total label time count time-per)))
(defun compile-timing-data ()
(loop with timing-table = (make-hash-table)
with count-table = (make-hash-table)
for (label start end) in *timing-data*
for time = (- end start)
summing time into total
do
(incf (gethash label timing-table 0) time)
(incf (gethash label count-table 0))
finally
(return
(sort
(loop for label being the hash-keys in timing-table collect
(let ((time (gethash label timing-table))
(count (gethash label count-table)))
(list label time count (round (/ time count)) (round (* 100 (/ time total))))))
#'> :key #'fifth))))
This profiler lets you wrap a with-timing
around any form; each time the form is executed, the time it starts and the time it ends are recorded, associating with a label you provide. The function show-timing-data
dumps out a table showing how much time was spent in different labeled sections of code like this:
CL-USER> (show-timing-data)
84% BAR: 650 ticks over 2 calls for 325 per.
16% FOO: 120 ticks over 5 calls for 24 per.
NIL
You could obviously make this profiling code more sophisticated in many ways. Alternatively, your Lisp implementation most likely provides its own profiling tools, which, since they have access to the internals of the implementation, can get at information not necessarily available to user-level code.
Once you’ve found the bottleneck in your code, you can start tuning. The first thing you should try, of course, is to find a more efficient basic algorithm—that’s where the big gains are to be had. But assuming you’re already using an appropriate algorithm, then it’s down to code bumming--locally optimizing the code so it does absolutely no more work than necessary.
The main tools for code bumming in Common Lisp are its optional declarations. The basic idea behind declarations in Common Lisp is that they’re used to give the compiler information it can use in a variety of ways to generate better code.
For a simple example, consider this Common Lisp function:
(defun add (x y) (+ x y))
I mentioned in Chapter 10 that if you compare the performance of this function Lisp to the seemingly equivalent C function:
int add (int x, int y) { return x + y; }
you’ll likely find the Common Lisp version to be quite a bit slower, even if your Common Lisp implementation features a high-quality native compiler.
That’s because the Common Lisp version is doing a lot more—the Common Lisp compiler doesn’t even know that the values of a
and b
are numbers and so has to generate code to check at runtime. And once it determines they are numbers, it has to determine what types of numbers—integers, rationals, floating point, or complex—and dispatch to the appropriate addition routine for the actual types. And even if a
and b
are integers—the case you care about—then the addition routine has to account for the possibility that the result may be too large to represent as a fixnum, a number that can be represented in a single machine word, and thus it may have to allocate a bignum object.
In C, on the other hand, because the type of all variables are declared, the compiler knows exactly what kind of values a
and b
will hold. And because C’s arithmetic simply overflows when the result of an addition is too large to represent in whatever type is being returned, there’s no checking for overflow and no allocation of a bignum object to represent the result when the mathematical sum is too large to fit in a machine word.
Thus, while the behavior of the Common Lisp code is much more likely to be mathematically correct, the C version can probably be compiled down to one or two machine instructions. But if you’re willing to give the Common Lisp compiler the same information the C compiler has about the types of arguments and return values and to accept certain C-like compromises in terms of generality and error checking, the Common Lisp function can also be compiled down to an instruction or two.
That’s what declarations are for. The main use of declarations is to tell the compiler about the types of variables and other expressions. For instance, you could tell the compiler that the arguments to add
are both fixnums by writing the function like this:
(defun add (x y)
(declare (fixnum x y))
(+ x y))
The **DECLARE**
expression isn’t a Lisp form; rather, it’s part of the syntax of the **DEFUN**
and must appear before any other code in the function body.7 This declaration declares that the arguments passed for the parameters x
and y
will always be fixnums. In other words, it’s a promise to the compiler, and the compiler is allowed to generate code on the assumption that whatever you tell it is true.
To declare the type of the value returned, you can wrap the form (+ x y)
in the **THE**
special operator. This operator takes a type specifier, such as **FIXNUM**
, and a form and tells the compiler the form will evaluate to the given type. Thus, to give the Common Lisp compiler all the information about add
that the C compiler gets, you can write it like this:
(defun add (x y)
(declare (fixnum x y))
(the fixnum (+ x y)))
However, even this version needs one more declaration to give the Common Lisp compiler the same license as the C compiler to generate fast but dangerous code. The **OPTIMIZE**
declaration is used to tell the compiler how to balance five qualities: the speed of the code generated; the amount of runtime error checking; the memory usage of the code, both in terms of code size and runtime memory usage; the amount of debugging information kept with the code; and the speed of the compilation process. An **OPTIMIZE**
declaration consists of one or more lists, each containing one of the symbols **SPEED**
, **SAFETY**
, **SPACE**
, **DEBUG**
, and **COMPILATION-SPEED**
, and a number from zero to three, inclusive. The number specifies the relative weighting the compiler should give to the corresponding quality, with 3
being the most important and 0
meaning not important at all. Thus, to make Common Lisp compile add
more or less like a C compiler would, you can write it like this:
(defun add (x y)
(declare (optimize (speed 3) (safety 0)))
(declare (fixnum x y))
(the fixnum (+ x y)))
Of course, now the Lisp version suffers from many of the same liabilities as the C version—if the arguments passed aren’t fixnums or if the addition overflows, the result will be mathematically incorrect or worse. Also, if someone calls add
with a wrong number of arguments, it may not be pretty. Thus, you should use these kinds of declarations only after your program is working correctly. And you should add them only where profiling shows they’ll make a difference. If you’re getting reasonable performance without them, leave them out. But when profiling shows you a real hot spot in your code and you need to tune it up, go ahead. Because you can use declarations this way, it’s rarely necessary to rewrite code in C just for performance reasons; FFIs are used to access existing C code, but declarations are used when C-like performance is needed. Of course, how close you can get the performance of a given piece of Common Lisp code to C and C++ depends mostly on how much like C you’re willing to make it.
Another code-tuning tool built into Lisp is the function **DISASSEMBLE**
. The exact behavior of this function is implementation dependent because it depends on how the implementation compiles code—whether to machine code, bytecodes, or some other form. But the basic idea is that it shows you the code generated by the compiler when it compiled a specific function.
Thus, you can use **DISASSEMBLE**
to see whether your declarations are having any effect on the code generated. And if your Lisp implementation uses a native compiler and you know your platform’s assembly language, you can get a pretty good sense of what’s actually going on when you call one of your functions. For instance, you could use **DISASSEMBLE**
to get a sense of the difference between the first version of add
, with no declarations, and the final version. First, define and compile the original version.
(defun add (x y) (+ x y))
Then, at the REPL, call **DISASSEMBLE**
with the name of the function. In Allegro, it shows the following assembly-language-like dump of the code generated by the compiler:
CL-USER> (disassemble 'add)
;; disassembly of #<Function ADD>
;; formals: X Y
;; code start: #x737496f4:
0: 55 pushl ebp
1: 8b ec movl ebp,esp
3: 56 pushl esi
4: 83 ec 24 subl esp,$36
7: 83 f9 02 cmpl ecx,$2
10: 74 02 jz 14
12: cd 61 int $97 ; SYS::TRAP-ARGERR
14: 80 7f cb 00 cmpb [edi-53],$0 ; SYS::C_INTERRUPT-PENDING
18: 74 02 jz 22
20: cd 64 int $100 ; SYS::TRAP-SIGNAL-HIT
22: 8b d8 movl ebx,eax
24: 0b da orl ebx,edx
26: f6 c3 03 testb bl,$3
29: 75 0e jnz 45
31: 8b d8 movl ebx,eax
33: 03 da addl ebx,edx
35: 70 08 jo 45
37: 8b c3 movl eax,ebx
39: f8 clc
40: c9 leave
41: 8b 75 fc movl esi,[ebp-4]
44: c3 ret
45: 8b 5f 8f movl ebx,[edi-113] ; EXCL::+_2OP
48: ff 57 27 call *[edi+39] ; SYS::TRAMP-TWO
51: eb f3 jmp 40
53: 90 nop
; No value
Clearly, there’s a bunch of stuff going on here. If you’re familiar with x86 assembly language, you can probably tell what. Now compile this version of add
with all the declarations.
(defun add (x y)
(declare (optimize (speed 3) (safety 0)))
(declare (fixnum x y))
(the fixnum (+ x y)))
Now disassemble add
again, and see if the declarations had any effect.
CL-USER> (disassemble 'add)
;; disassembly of #<Function ADD>
;; formals: X Y
;; code start: #x7374dc34:
0: 03 c2 addl eax,edx
2: f8 clc
3: 8b 75 fc movl esi,[ebp-4]
6: c3 ret
7: 90 nop
; No value
Looks like they did.