Binary Format Basics
The starting point for reading and writing binary files is to open the file for reading or writing individual bytes. As I discussed in Chapter 14, both **OPEN**
and **WITH-OPEN-FILE**
accept a keyword argument, :element-type
, that controls the basic unit of transfer for the stream. When you’re dealing with binary files, you’ll specify (unsigned-byte 8)
. An input stream opened with such an :element-type
will return an integer between 0 and 255 each time it’s passed to **READ-BYTE**
. Conversely, you can write bytes to an (unsigned-byte 8)
output stream by passing numbers between 0 and 255 to **WRITE-BYTE**
.
Above the level of individual bytes, most binary formats use a smallish number of primitive data types—numbers encoded in various ways, textual strings, bit fields, and so on—which are then composed into more complex structures. So your first task is to define a framework for writing code to read and write the primitive data types used by a given binary format.
To take a simple example, suppose you’re dealing with a binary format that uses an unsigned 16-bit integer as a primitive data type. To read such an integer, you need to read the two bytes and then combine them into a single number by multiplying one byte by 256, a.k.a. 2^8, and adding it to the other byte. For instance, assuming the binary format specifies that such 16-bit quantities are stored in big-endian3 form, with the most significant byte first, you can read such a number with this function:
(defun read-u2 (in)
(+ (* (read-byte in) 256) (read-byte in)))
However, Common Lisp provides a more convenient way to perform this kind of bit twiddling. The function **LDB**
, whose name stands for load byte, can be used to extract and set (with **SETF**
) any number of contiguous bits from an integer.4 The number of bits and their position within the integer is specified with a byte specifier created with the **BYTE**
function. **BYTE**
takes two arguments, the number of bits to extract (or set) and the position of the rightmost bit where the least significant bit is at position zero. **LDB**
takes a byte specifier and the integer from which to extract the bits and returns the positive integer represented by the extracted bits. Thus, you can extract the least significant octet of an integer like this:
(ldb (byte 8 0) #xabcd) ==> 205 ; 205 is #xcd
To get the next octet, you’d use a byte specifier of (byte 8 8)
like this:
(ldb (byte 8 8) #xabcd) ==> 171 ; 171 is #xab
You can use **LDB**
with **SETF**
to set the specified bits of an integer stored in a **SETF**
able place.
CL-USER> (defvar *num* 0)
*NUM*
CL-USER> (setf (ldb (byte 8 0) *num*) 128)
128
CL-USER> *num*
128
CL-USER> (setf (ldb (byte 8 8) *num*) 255)
255
CL-USER> *num*
65408
Thus, you can also write read-u2
like this:5
(defun read-u2 (in)
(let ((u2 0))
(setf (ldb (byte 8 8) u2) (read-byte in))
(setf (ldb (byte 8 0) u2) (read-byte in))
u2))
To write a number out as a 16-bit integer, you need to extract the individual 8-bit bytes and write them one at a time. To extract the individual bytes, you just need to use **LDB**
with the same byte specifiers.
(defun write-u2 (out value)
(write-byte (ldb (byte 8 8) value) out)
(write-byte (ldb (byte 8 0) value) out))
Of course, you can also encode integers in many other ways—with different numbers of bytes, with different endianness, and in signed and unsigned format.