1. Introduction: Why Lisp?
If you think the greatest pleasure in programming comes from getting a lot done with code that simply and clearly expresses your intention, then programming in Common Lisp is likely to be about the most fun you can have with a computer. You'll get more done, faster, using it than you would using pretty much any other language.
That's a bold claim. Can I justify it? Not in a just a few pages in this chapter—you're going to have to learn some Lisp and see for yourself—thus the rest of this book. For now, let me start with some anecdotal evidence, the story of my own road to Lisp. Then, in the next section, I'll explain the payoff I think you'll get from learning Common Lisp.
I'm one of what must be a fairly small number of second-generation Lisp hackers. My father got his start in computers writing an operating system in assembly for the machine he used to gather data for his doctoral dissertation in physics. After running computer systems at various physics labs, by the 1980s he had left physics altogether and was working at a large pharmaceutical company. That company had a project under way to develop software to model production processes in its chemical plants—if you increase the size of this vessel, how does it affect annual production? The original team, writing in FORTRAN, had burned through half the money and almost all the time allotted to the project with nothing to show for their efforts. This being the 1980s and the middle of the artificial intelligence (AI) boom, Lisp was in the air. So my dad—at that point not a Lisper—went to Carnegie Mellon University (CMU) to talk to some of the folks working on what was to become Common Lisp about whether Lisp might be a good language for this project.
The CMU folks showed him some demos of stuff they were working on, and he was convinced. He in turn convinced his bosses to let his team take over the failing project and do it in Lisp. A year later, and using only what was left of the original budget, his team delivered a working application with features that the original team had given up any hope of delivering. My dad credits his team's success to their decision to use Lisp.
Now, that's just one anecdote. And maybe my dad is wrong about why they succeeded. Or maybe Lisp was better only in comparison to other languages of the day. These days we have lots of fancy new languages, many of which have incorporated features from Lisp. Am I really saying Lisp can offer you the same benefits today as it offered my dad in the 1980s? Read on.
Despite my father's best efforts, I didn't learn any Lisp in high school. After a college career that didn't involve much programming in any language, I was seduced by the Web and back into computers. I worked first in Perl, learning enough to be dangerous while building an online discussion forum for Mother Jones magazine's Web site and then moving to a Web shop, Organic Online, where I worked on big—for the time—Web sites such as the one Nike put up during the 1996 Olympics. Later I moved onto Java as an early developer at WebLogic, now part of BEA. After WebLogic, I joined another startup where I was the lead programmer on a team building a transactional messaging system in Java. Along the way, my general interest in programming languages led me to explore such mainstream languages as C, C++, and Python, as well as less well-known ones such as Smalltalk, Eiffel, and Beta.
So I knew two languages inside and out and was familiar with another half dozen. Eventually, however, I realized my interest in programming languages was really rooted in the idea planted by my father's tales of Lisp—that different languages really are different, and that, despite the formal Turing equivalence of all programming languages, you really can get more done more quickly in some languages than others and have more fun doing it. Yet, ironically, I had never spent that much time with Lisp itself. So, I started doing some Lisp hacking in my free time. And whenever I did, it was exhilarating how quickly I was able to go from idea to working code.
For example, one vacation, having a week or so to hack Lisp, I decided to try writing a version of a program—a system for breeding genetic algorithms to play the game of Go—that I had written early in my career as a Java programmer. Even handicapped by my then rudimentary knowledge of Common Lisp and having to look up even basic functions, it still felt more productive than it would have been to rewrite the same program in Java, even with several extra years of Java experience acquired since writing the first version.
A similar experiment led to the library I'll discuss in Chapter 24. Early in my time at WebLogic I had written a library, in Java, for taking apart Java class files. It worked, but the code was a bit of a mess and hard to modify or extend. I had tried several times, over the years, to rewrite that library, thinking that with my ever-improving Java chops I'd find some way to do it that didn't bog down in piles of duplicated code. I never found a way. But when I tried to do it in Common Lisp, it took me only two days, and I ended up not only with a Java class file parser but with a general-purpose library for taking apart any kind of binary file. You'll see how that library works in Chapter 24 and use it in Chapter 25 to write a parser for the ID3 tags embedded in MP3 files.
Why Lisp?
It's hard, in only a few pages of an introductory chapter, to explain why users of a language like it, and it's even harder to make the case for why you should invest your time in learning a certain language. Personal history only gets us so far. Perhaps I like Lisp because of some quirk in the way my brain is wired. It could even be genetic, since my dad has it too. So before you dive into learning Lisp, it's reasonable to want to know what the payoff is going to be.
For some languages, the payoff is relatively obvious. For instance, if you want to write low-level code on Unix, you should learn C. Or if you want to write certain kinds of cross-platform applications, you should learn Java. And any of a number companies still use a lot of C++, so if you want to get a job at one of them, you should learn C++.
For most languages, however, the payoff isn't so easily categorized; it has to do with subjective criteria such as how it feels to use the language. Perl advocates like to say that Perl "makes easy things easy and hard things possible" and revel in the fact that, as the Perl motto has it, "There's more than one way to do it."[1] Python's fans, on the other hand, think Python is clean and simple and think Python code is easier to understand because, as their motto says, "There's only one way to do it."
So, why Common Lisp? There's no immediately obvious payoff for adopting Common Lisp the way there is for C, Java, and C++ (unless, of course, you happen to own a Lisp Machine). The benefits of using Lisp have much more to do with the experience of using it. I'll spend the rest of this book showing you the specific features of Common Lisp and how to use them so you can see for yourself what it's like. For now I'll try to give you a sense of Lisp's philosophy.
The nearest thing Common Lisp has to a motto is the koan-like description, "the programmable programming language." While cryptic, that description gets at the root of the biggest advantage Common Lisp still has over other languages. More than any other language, Common Lisp follows the philosophy that what's good for the language's designer is good for the language's users. Thus, when you're programming in Common Lisp, you almost never find yourself wishing the language supported some feature that would make your program easier to write, because, as you'll see throughout this book, you can just add the feature yourself.
Consequently, a Common Lisp program tends to provide a much clearer mapping between your ideas about how the program works and the code you actually write. Your ideas aren't obscured by boilerplate code and endlessly repeated idioms. This makes your code easier to maintain because you don't have to wade through reams of code every time you need to make a change. Even systemic changes to a program's behavior can often be achieved with relatively small changes to the actual code. This also means you'll develop code more quickly; there's less code to write, and you don't waste time thrashing around trying to find a clean way to express yourself within the limitations of the language.[2]
Common Lisp is also an excellent language for exploratory programming—if you don't know exactly how your program is going to work when you first sit down to write it, Common Lisp provides several features to help you develop your code incrementally and interactively.
For starters, the interactive read-eval-print loop, which I'll introduce in the next chapter, lets you continually interact with your program as you develop it. Write a new function. Test it. Change it. Try a different approach. You never have to stop for a lengthy compilation cycle.[3]
Other features that support a flowing, interactive programming style are Lisp's dynamic typing and the Common Lisp condition system. Because of the former, you spend less time convincing the compiler you should be allowed to run your code and more time actually running it and working on it,[4] and the latter lets you develop even your error handling code interactively.
Another consequence of being "a programmable programming language" is that Common Lisp, in addition to incorporating small changes that make particular programs easier to write, can easily adopt big new ideas about how programming languages should work. For instance, the original implementation of the Common Lisp Object System (CLOS), Common Lisp's powerful object system, was as a library written in portable Common Lisp. This allowed Lisp programmers to gain actual experience with the facilities it provided before it was officially incorporated into the language.
Whatever new paradigm comes down the pike next, it's extremely likely that Common Lisp will be able to absorb it without requiring any changes to the core language. For example, a Lisper has recently written a library, AspectL, that adds support for aspect-oriented programming (AOP) to Common Lisp.[5] If AOP turns out to be the next big thing, Common Lisp will be able to support it without any changes to the base language and without extra preprocessors and extra compilers.[6]
Where It Began
Common Lisp is the modern descendant of the Lisp language first conceived by John McCarthy in 1956. Lisp circa 1956 was designed for "symbolic data processing"[7] and derived its name from one of the things it was quite good at: LISt Processing. We've come a long way since then: Common Lisp sports as fine an array of modern data types as you can ask for: a condition system that, as you'll see in Chapter 19, provides a whole level of flexibility missing from the exception systems of languages such as Java, Python, and C++; powerful facilities for doing object-oriented programming; and several language facilities that just don't exist in other programming languages. How is this possible? What on Earth would provoke the evolution of such a well-equipped language?
Well, McCarthy was (and still is) an artificial intelligence (AI) researcher, and many of the features he built into his initial version of the language made it an excellent language for AI programming. During the AI boom of the 1980s, Lisp remained a favorite tool for programmers writing software to solve hard problems such as automated theorem proving, planning and scheduling, and computer vision. These were problems that required a lot of hard-to-write software; to make a dent in them, AI programmers needed a powerful language, and they grew Lisp into the language they needed. And the Cold War helped—as the Pentagon poured money into the Defense Advanced Research Projects Agency (DARPA), a lot of it went to folks working on problems such as large-scale battlefield simulations, automated planning, and natural language interfaces. These folks also used Lisp and continued pushing it to do what they needed.
The same forces that drove Lisp's feature evolution also pushed the envelope along other dimensions—big AI problems eat up a lot of computing resources however you code them, and if you run Moore's law in reverse for 20 years, you can imagine how scarce computing resources were on circa-80s hardware. The Lisp guys had to find all kinds of ways to squeeze performance out of their implementations. Modern Common Lisp implementations are the heirs to those early efforts and often include quite sophisticated, native machine code-generating compilers. While today, thanks to Moore's law, it's possible to get usable performance from a purely interpreted language, that's no longer an issue for Common Lisp. As I'll show in Chapter 32, with proper (optional) declarations, a good Lisp compiler can generate machine code quite similar to what might be generated by a C compiler.
The 1980s were also the era of the Lisp Machines, with several companies, most famously Symbolics, producing computers that ran Lisp natively from the chips up. Thus, Lisp became a systems programming language, used for writing the operating system, editors, compilers, and pretty much everything else that ran on the Lisp Machines.
In fact, by the early 1980s, with various AI labs and the Lisp machine vendors all providing their own Lisp implementations, there was such a proliferation of Lisp systems and dialects that the folks at DARPA began to express concern about the Lisp community splintering. To address this concern, a grassroots group of Lisp hackers got together in 1981 and began the process of standardizing a new language called Common Lisp that combined the best features from the existing Lisp dialects. Their work was documented in the book Common Lisp the Language by Guy Steele (Digital Press, 1984)—CLtL to the Lisp-cognoscenti.
By 1986 the first Common Lisp implementations were available, and the writing was on the wall for the dialects it was intended to replace. In 1996, the American National Standards Institute (ANSI) released a standard for Common Lisp that built on and extended the language specified in CLtL, adding some major new features such as the CLOS and the condition system. And even that wasn't the last word: like CLtL before it, the ANSI standard intentionally leaves room for implementers to experiment with the best way to do things: a full Lisp implementation provides a rich runtime environment with access to GUI widgets, multiple threads of control, TCP/IP sockets, and more. These days Common Lisp is evolving much like other open-source languages—the folks who use it write the libraries they need and often make them available to others. In the last few years, in particular, there has been a spurt of activity in open-source Lisp libraries.
So, on one hand, Lisp is one of computer science's "classical" languages, based on ideas that have stood the test of time.[8] On the other, it's a thoroughly modern, general-purpose language whose design reflects a deeply pragmatic approach to solving real problems as efficiently and robustly as possible. The only downside of Lisp's "classical" heritage is that lots of folks are still walking around with ideas about Lisp based on some particular flavor of Lisp they were exposed to at some particular time in the nearly half century since McCarthy invented Lisp. If someone tells you Lisp is only interpreted, that it's slow, or that you have to use recursion for everything, ask them what dialect of Lisp they're talking about and whether people were wearing bell-bottoms when they learned it.[9]
But I learned Lisp Once, And It Wasn't Like what you're describing |
If you've used Lisp in the past, you may have ideas about what "Lisp" is that have little to do with Common Lisp. While Common Lisp supplanted most of the dialects it's descended from, it isn't the only remaining Lisp dialect, and depending on where and when you were exposed to Lisp, you may very well have learned one of these other dialects. Other than Common Lisp, the one general-purpose Lisp dialect that still has an active user community is Scheme. Common Lisp borrowed a few important features from Scheme but never intended to replace it. Originally designed at M.I.T., where it was quickly put to use as a teaching language for undergraduate computer science courses, Scheme has always been aimed at a different language niche than Common Lisp. In particular, Scheme's designers have focused on keeping the core language as small and as simple as possible. This has obvious benefits for a teaching language and also for programming language researchers who like to be able to formally prove things about languages. It also has the benefit of making it relatively easy to understand the whole language as specified in the standard. But, it does so at the cost of omitting many useful features that are standardized in Common Lisp. Individual Scheme implementations may provide these features in implementation-specific ways, but their omission from the standard makes it harder to write portable Scheme code than to write portable Common Lisp code. Scheme also emphasizes a functional programming style and the use of recursion much more than Common Lisp does. If you studied Lisp in college and came away with the impression that it was only an academic language with no real-world application, chances are you learned Scheme. This isn't to say that's a particularly fair characterization of Scheme, but it's even less applicable to Common Lisp, which was expressly designed to be a real-world engineering language rather than a theoretically "pure" language. If you've learned Scheme, you should also be aware that a number of subtle differences between Scheme and Common Lisp may trip you up. These differences are also the basis for several perennial religious wars between the hotheads in the Common Lisp and Scheme communities. I'll try to point out some of the more important differences as we go along. Two other Lisp dialects still in widespread use are Elisp, the extension language for the Emacs editor, and Autolisp, the extension language for Autodesk's AutoCAD computer-aided design tool. Although it's possible more lines of Elisp and Autolisp have been written than of any other dialect of Lisp, neither can be used outside their host application, and both are quite old-fashioned Lisps compared to either Scheme or Common Lisp. If you've used one of these dialects, prepare to hop in the Lisp time machine and jump forward several decades. |
Who This Book Is For
This book is for you if you're curious about Common Lisp, regardless of whether you're already convinced you want to use it or if you just want to know what all the fuss is about.
If you've learned some Lisp already but have had trouble making the leap from academic exercises to real programs, this book should get you on your way. On the other hand, you don't have to be already convinced that you want to use Lisp to get something out of this book.
If you're a hard-nosed pragmatist who wants to know what advantages Common Lisp has over languages such as Perl, Python, Java, C, or C#, this book should give you some ideas. Or maybe you don't even care about using Lisp—maybe you're already sure Lisp isn't really any better than other languages you know but are annoyed by some Lisper telling you that's because you just don't "get it." If so, this book will give you a straight-to-the-point introduction to Common Lisp. If, after reading this book, you still think Common Lisp is no better than your current favorite languages, you'll be in an excellent position to explain exactly why.
I cover not only the syntax and semantics of the language but also how you can use it to write software that does useful stuff. In the first part of the book, I'll cover the language itself, mixing in a few "practical" chapters, where I'll show you how to write real code. Then, after I've covered most of the language, including several parts that other books leave for you to figure out on your own, the remainder of the book consists of nine more practical chapters where I'll help you write several medium-sized programs that actually do things you might find useful: filter spam, parse binary files, catalog MP3s, stream MP3s over a network, and provide a Web interface for the MP3 catalog and server.
Ater you finish this book, you'll be familiar with all the most important features of the language and how they fit together, you'll have used Common Lisp to write several nontrivial programs, and you'll be well prepared to continue exploring the language on your own. While everyone's road to Lisp is different, I hope this book will help smooth the way for you. So, let's begin.
2. Lather, Rinse, Repeat: A Tour of the REPL
In this chapter you'll set up your programming environment and write your first Common Lisp programs. We'll use the easy-to-install Lisp in a Box developed by Matthew Danish and Mikel Evins, which packages a Common Lisp implementation with Emacs, a powerful Lisp-aware text editor, and SLIME,[10] a Common Lisp development environment built on top of Emacs.
This combo provides a state-of-the-art Common Lisp development environment that supports the incremental, interactive development style that characterizes Lisp programming. The SLIME environment has the added advantage of providing a fairly uniform user interface regardless of the operating system and Common Lisp implementation you choose. I'll use the Lisp in a Box environment in order to have a specific development environment to talk about; folks who want to explore other development environments such as the graphical integrated development environments (IDEs) provided by some of the commercial Lisp vendors or environments based on other editors shouldn't have too much trouble translating the basics.[11]
Choosing a Lisp Implementation
The first thing you have to do is to choose a Lisp implementation. This may seem like a strange thing to have to do for folks used to languages such as Perl, Python, Visual Basic (VB), C#, and Java. The difference between Common Lisp and these languages is that Common Lisp is defined by its standard—there is neither a single implementation controlled by a benevolent dictator, as with Perl and Python, nor a canonical implementation controlled by a single company, as with VB, C#, and Java. Anyone who wants to read the standard and implement the language is free to do so. Furthermore, changes to the standard have to be made in accordance with a process controlled by the standards body American National Standards Institute (ANSI). That process is designed to keep any one entity, such as a single vendor, from being able to arbitrarily change the standard.[12] Thus, the Common Lisp standard is a contract between any Common Lisp vendor and Common Lisp programmers. The contract tells you that if you write a program that uses the features of the language the way they're described in the standard, you can count on your program behaving the same in any conforming implementation.
On the other hand, the standard may not cover everything you may want to do in your programs—some things were intentionally left unspecified in order to allow continuing experimentation by implementers in areas where there wasn't consensus about the best way for the language to support certain features. So every implementation offers some features above and beyond what's specified in the standard. Depending on what kind of programming you're going to be doing, it may make sense to just pick one implementation that has the extra features you need and use that. On the other hand, if we're delivering Lisp source to be used by others, such as libraries, you'll want—as far as possible—to write portable Common Lisp. For writing code that should be mostly portable but that needs facilities not defined by the standard, Common Lisp provides a flexible way to write code "conditionalized" on the features available in a particular implementation. You'll see an example of this kind of code in Chapter 15 when we develop a simple library that smoothes over some differences between how different Lisp implementations deal with filenames.
For the moment, however, the most important characteristic of an implementation is whether it runs on our favorite operating system. The folks at Franz, makers of Allegro Common Lisp, are making available a trial version of their product for use with this book that runs on Linux, Windows, and OS X. Folks looking for an open-source implementation have several options. SBCL[13] is a high-quality open-source implementation that compiles to native code and runs on a wide variety of Unixes, including Linux and OS X. SBCL is derived from CMUCL,[14] which is a Common Lisp developed at Carnegie Mellon University, and, like CMUCL, is largely in the public domain, except a few sections licensed under Berkeley Software Distribution (BSD) style licenses. CMUCL itself is another fine choice, though SBCL tends to be easier to install and now supports 21-bit Unicode.[15] For OS X users, OpenMCL is an excellent choice—it compiles to machine code, supports threads, and has quite good integration with OS X's Carbon and Cocoa toolkits. Other open-source and commercial implementations are available. See Chapter 32 for resources from which you can get more information.
All the Lisp code in this book should work in any conforming Common Lisp implementation unless otherwise noted, and SLIME will smooth out some of the differences between implementations by providing us with a common interface for interacting with Lisp. The output shown in this book is from Allegro running on GNU/Linux; in some cases, other Lisp's may generate slightly different error messages or debugger output.
Getting Up and Running with Lisp in a Box
Since the Lisp in a Box packaging is designed to get new Lispers up and running in a first-rate Lisp development environment with minimum hassle, all you need to do to get it running is to grab the appropriate package for your operating system and the preferred Lisp from the Lisp in a Box Web site listed in Chapter 32 and then follow the installation instructions.
Since Lisp in a Box uses Emacs as its editor, you'll need to know at least a bit about how to use it. Perhaps the best way to get started with Emacs is to work through its built-in tutorial. To start the tutorial, select the first item of the Help menu, Emacs tutorial. Or press the Ctrl key, type h
, release the Ctrl key, and then press t
. Most Emacs commands are accessible via such key combinations; because key combinations are so common, Emacs users have a notation for describing key combinations that avoids having to constantly write out combinations such as "Press the Ctrl key, type h
, release the Ctrl key, and then press t
." Keys to be pressed together—a so-called key chord—are written together and separated by a hyphen. Keys, or key chords, to be pressed in sequence are separated by spaces. In a key chord, C
represents the Ctrl key and M
represents the Meta key (also known as Alt). Thus, we could write the key combination we just described that starts the tutorial like so: C-h t
.
The tutorial describes other useful commands and the key combinations that invoke them. Emacs also comes with extensive online documentation using its own built-in hypertext documentation browser, Info. To read the manual, type C-h i
. The Info system comes with its own tutorial, accessible simply by pressing h
while reading the manual. Finally, Emacs provides quite a few ways to get help, all bound to key combos starting with C-h
. Typing C-h ?
brings up a complete list. Two of the most useful, besides the tutorial, are C-h k
, which lets us type any key combo and tells us what command it invokes, and C-h w
, which lets us enter the name of a command and tells us what key combination invokes it.
The other crucial bit of Emacs terminology, for folks who refuse to work through the tutorial, is the notion of a buffer. While working in Emacs, each file you edit will be represented by a different buffer, only one of which is "current" at any given time. The current buffer receives all input—whatever you type and any commands you invoke. Buffers are also used to represent interactions with programs such as Common Lisp. Thus, one common action you'll take is to "switch buffers," which means to make a different buffer the current buffer so you can edit a particular file or interact with a particular program. The command switch-to-buffer
, bound to the key combination C-x b
, prompts for the name of a buffer in the area at the bottom of the Emacs frame. When entering a buffer name, hitting Tab will complete the name based on the characters typed so far or will show a list of possible completions. The prompt also suggests a default buffer, which you can accept just by hitting Return. You can also switch buffers by selecting a buffer from the Buffers menu.
In certain contexts, other key combinations may be available for switching to certain buffers. For instance, when editing Lisp source files, the key combo C-c C-z
switches to the buffer where you interact with Lisp.
Free Your Mind: Interactive Programming
When you start Lisp in a Box, you should see a buffer containing a prompt that looks like this:
CL-USER>
This is the Lisp prompt. Like a Unix or DOS shell prompt, the Lisp prompt is a place where you can type expressions that will cause things to happen. However, instead of reading and interpreting a line of shell commands, Lisp reads Lisp expressions, evaluates them according to the rules of Lisp, and prints the result. Then it does it again with the next expression you type. That endless cycle of reading, evaluating, and printing is why it's called the read-eval-print loop, or REPL for short. It's also referred to as the top-level, the top-level listener, or the Lisp listener.
From within the environment provided by the REPL, you can define and redefine program elements such as variables, functions, classes, and methods; evaluate any Lisp expression; load files containing Lisp source code or compiled code; compile whole files or individual functions; enter the debugger; step through code; and inspect the state of individual Lisp objects.
All those facilities are built into the language, accessible via functions defined in the language standard. If you had to, you could build a pretty reasonable programming environment out of just the REPL and any text editor that knows how to properly indent Lisp code. But for the true Lisp programming experience, you need an environment, such as SLIME, that lets you interact with Lisp both via the REPL and while editing source files. For instance, you don't want to have to cut and paste a function definition from a source file to the REPL or have to load a whole file just because you changed one function; your Lisp environment should let us evaluate or compile both individual expressions and whole files directly from your editor.
Experimenting in the REPL
To try the REPL, you need a Lisp expression that can be read, evaluated, and printed. One of the simplest kinds of Lisp expressions is a number. At the Lisp prompt, you can type 10
followed by Return and should see something like this:
CL-USER> 10
10
The first 10
is the one you typed. The Lisp reader, the R in REPL, reads the text "10" and creates a Lisp object representing the number 10. This object is a self-evaluating object, which means that when given to the evaluator, the E in REPL, it evaluates to itself. This value is then given to the printer, which prints the 10
on the line by itself. While that may seem like a lot of work just to get back to where you started, things get a bit more interesting when you give Lisp something meatier to chew on. For instance, you can type (+ 2 3)
at the Lisp prompt.
CL-USER> (+ 2 3)
5
Anything in parentheses is a list, in this case a list of three elements, the symbol +
, and the numbers 2 and 3. Lisp, in general, evaluates lists by treating the first element as the name of a function and the rest of the elements as expressions to be evaluated to yield the arguments to the function. In this case, the symbol +
names a function that performs addition. 2 and 3 evaluate to themselves and are then passed to the addition function, which returns 5. The value 5 is passed to the printer, which prints it. Lisp can evaluate a list expression in other ways, but we needn't get into them right away. First we have to write. . .
"Hello, World," Lisp Style
No programming book is complete without a "hello, world"[16] program. As it turns out, it's trivially easy to get the REPL to print "hello, world."
CL-USER> "hello, world"
"hello, world"
This works because strings, like numbers, have a literal syntax that's understood by the Lisp reader and are self-evaluating objects: Lisp reads the double-quoted string and instantiates a string object in memory that, when evaluated, evaluates to itself and is then printed in the same literal syntax. The quotation marks aren't part of the string object in memory—they're just the syntax that tells the reader to read a string. The printer puts them back on when it prints the string because it tries to print objects in the same syntax the reader understands.
However, this may not really qualify as a "hello, world" program. It's more like the "hello, world" value.
You can take a step toward a real program by writing some code that as a side effect prints the string "hello, world" to standard output. Common Lisp provides a couple ways to emit output, but the most flexible is the FORMAT
function. FORMAT
takes a variable number of arguments, but the only two required arguments are the place to send the output and a string. You'll see in the next chapter how the string can contain embedded directives that allow you to interpolate subsequent arguments into the string, à la printf
or Python's string-%
. As long as the string doesn't contain an ~
, it will be emitted as-is. If you pass t
as its first argument, it sends its output to standard output. So a FORMAT
expression that will print "hello, world" looks like this:[17]
CL-USER> (format t "hello, world")
hello, world
NIL
One thing to note about the result of the FORMAT
expression is the NIL
on the line after the "hello, world" output. That NIL
is the result of evaluating the FORMAT
expression, printed by the REPL. (NIL
is Lisp's version of false and/or null. More on that in Chapter 4.) Unlike the other expressions we've seen so far, a FORMAT
expression is more interesting for its side effect—printing to standard output in this case—than for its return value. But every expression in Lisp evaluates to some result.[18]
However, it's still arguable whether you've yet written a true "program." But you're getting there. And you're seeing the bottom-up style of programming supported by the REPL: you can experiment with different approaches and build a solution from parts you've already tested. Now that you have a simple expression that does what you want, you just need to package it in a function. Functions are one of the basic program building blocks in Lisp and can be defined with a DEFUN
expression such as this:
CL-USER> (defun hello-world () (format t "hello, world"))
HELLO-WORLD
The hello-world
after the DEFUN
is the name of the function. In Chapter 4 we'll look at exactly what characters can be used in a name, but for now suffice it to say that lots of characters, such as -
, that are illegal in names in other languages are legal in Common Lisp. It's standard Lisp style—not to mention more in line with normal English typography—to form compound names with hyphens, such as hello-world
, rather than with underscores, as in hello_world
, or with inner caps such as helloWorld
. The ()
s after the name delimit the parameter list, which is empty in this case because the function takes no arguments. The rest is the body of the function.
At one level, this expression, like all the others you've seen, is just another expression to be read, evaluated, and printed by the REPL. The return value in this case is the name of the function you just defined.[19] But like the FORMAT
expression, this expression is more interesting for the side effects it has than for its return value. Unlike the FORMAT
expression, however, the side effects are invisible: when this expression is evaluated, a new function that takes no arguments and with the body (format t "hello, world")
is created and given the name HELLO-WORLD
.
Once you've defined the function, you can call it like this:
CL-USER> (hello-world)
hello, world
NIL
You can see that the output is just the same as when you evaluated the FORMAT
expression directly, including the NIL
value printed by the REPL. Functions in Common Lisp automatically return the value of the last expression evaluated.
Saving Your Work
You could argue that this is a complete "hello, world" program of sorts. However, it still has a problem. If you exit Lisp and restart, the function definition will be gone. Having written such a fine function, you'll want to save your work.
Easy enough. You just need to create a file in which to save the definition. In Emacs you can create a new file by typing C-x C-f
and then, when Emacs prompts you, entering the name of the file you want to create. It doesn't matter particularly where you put the file. It's customary to name Common Lisp source files with a .lisp
extension, though some folks use .cl
instead.
Once you've created the file, you can type the definition you previously entered at the REPL. Some things to note are that after you type the opening parenthesis and the word DEFUN
, at the bottom of the Emacs window, SLIME will tell you the arguments expected. The exact form will depend somewhat on what Common Lisp implementation you're using, but it'll probably look something like this:
(defun name varlist &rest body)
The message will disappear as you start to type each new element but will reappear each time you enter a space. When you're entering the definition in the file, you might choose to break the definition across two lines after the parameter list. If you hit Return and then Tab, SLIME will automatically indent the second line appropriately, like this:[20]
(defun hello-world ()
(format t "hello, world"))
SLIME will also help match up the parentheses—as you type a closing parenthesis, it will flash the corresponding opening parenthesis. Or you can just type C-c C-q
to invoke the command slime-close-parens-at-point
, which will insert as many closing parentheses as necessary to match all the currently open parentheses.
Now you can get this definition into your Lisp environment in several ways. The easiest is to type C-c C-c
with the cursor anywhere in or immediately after the DEFUN
form, which runs the command slime-compile-defun
, which in turn sends the definition to Lisp to be evaluated and compiled. To make sure this is working, you can make some change to hello-world
, recompile it, and then go back to the REPL, using C-c C-z
or C-x b
, and call it again. For instance, you could make it a bit more grammatical.
(defun hello-world ()
(format t "Hello, world!"))
Next, recompile with C-c C-c
and then type C-c C-z
to switch to the REPL to try the new version.
CL-USER> (hello-world)
Hello, world!
NIL
You'll also probably want to save the file you've been working on; in the hello.lisp
buffer, type C-x C-s
to invoke the Emacs command save-buffer
.
Now to try reloading this function from the source file, you'll need to quit Lisp and restart. To quit you can use a SLIME shortcut: at the REPL, type a comma. At the bottom of the Emacs window, you will be prompted for a command. Type quit
(or sayoonara
), and then hit Enter. This will quit Lisp and close all the buffers created by SLIME such as the REPL buffer.[21] Now restart SLIME by typing M-x slime
.
Just for grins, you can try to invoke hello-world
.
CL-USER> (hello-world)
At that point SLIME will pop up a new buffer that starts with something that looks like this:
attempt to call `HELLO-WORLD' which is an undefined function.
[Condition of type UNDEFINED-FUNCTION]
Restarts:
0: [TRY-AGAIN] Try calling HELLO-WORLD again.
1: [RETURN-VALUE] Return a value instead of calling HELLO-WORLD.
2: [USE-VALUE] Try calling a function other than HELLO-WORLD.
3: [STORE-VALUE] Setf the symbol-function of HELLO-WORLD and call it again.
4: [ABORT] Abort handling SLIME request.
5: [ABORT] Abort entirely from this process.
Backtrace:
0: (SWANK::DEBUG-IN-EMACS #<UNDEFINED-FUNCTION @ #x716b082a>)
1: ((FLET SWANK:SWANK-DEBUGGER-HOOK SWANK::DEBUG-IT))
2: (SWANK:SWANK-DEBUGGER-HOOK #<UNDEFINED-FUNCTION @ #x716b082a> #<Function SWANK-DEBUGGER-HOOK>)
3: (ERROR #<UNDEFINED-FUNCTION @ #x716b082a>)
4: (EVAL (HELLO-WORLD))
5: (SWANK::EVAL-REGION "(hello-world)
" T)
Blammo! What happened? Well, you tried to invoke a function that doesn't exist. But despite the burst of output, Lisp is actually handling this situation gracefully. Unlike Java or Python, Common Lisp doesn't just bail—throwing an exception and unwinding the stack. And it definitely doesn't dump core just because you tried to invoke a missing function. Instead Lisp drops you into the debugger.
While you're in the debugger you still have full access to Lisp, so you can evaluate expressions to examine the state of our program and maybe even fix things. For now don't worry about that; just type q
to exit the debugger and get back to the REPL. The debugger buffer will go away, and the REPL will show this:
CL-USER> (hello-world)
; Evaluation aborted
CL-USER>
There's obviously more that can be done from within the debugger than just abort—we'll see, for instance, in Chapter 19 how the debugger integrates with the error handling system. For now, however, the important thing to know is that you can always get out of it, and back to the REPL, by typing q
.
Back at the REPL you can try again. Things blew up because Lisp didn't know the definition of hello-world
. So you need to let Lisp know about the definition we saved in the file hello.lisp
. You have several ways you could do this. You could switch back to the buffer containing the file (type C-x b
and then enter hello.lisp
when prompted) and recompile the definition as you did before with C-c C-c
. Or you can load the whole file, which would be a more convenient approach if the file contained a bunch of definitions, using the LOAD
function at the REPL like this:
CL-USER> (load "hello.lisp")
; Loading /home/peter/my-lisp-programs/hello.lisp
T
The T
means everything loaded correctly.[22] Loading a file with LOAD
is essentially equivalent to typing each of the expressions in the file at the REPL in the order they appear in the file, so after the call to LOAD
, hello-world
should be defined:
CL-USER> (hello-world)
Hello, world!
NIL
Another way to load a file's worth of definitions is to compile the file first with COMPILE-FILE
and then LOAD
the resulting compiled file, called a FASL file, which is short for fast-load file. COMPILE-FILE
returns the name of the FASL file, so we can compile and load from the REPL like this:
CL-USER> (load (compile-file "hello.lisp"))
;;; Compiling file hello.lisp
;;; Writing fasl file hello.fasl
;;; Fasl write complete
; Fast loading /home/peter/my-lisp-programs/hello.fasl
T
SLIME also provides support for loading and compiling files without using the REPL. When you're in a source code buffer, you can use C-c C-l
to load the file with slime-load-file
. Emacs will prompt for the name of a file to load with the name of the current file already filled in; you can just hit Enter. Or you can type C-c C-k
to compile and load the file represented by the current buffer. In some Common Lisp implementations, compiling code this way will make it quite a bit faster; in others, it won't, typically because they always compile everything.
This should be enough to give you a flavor of how Lisp programming works. Of course I haven't covered all the tricks and techniques yet, but you've seen the essential elements—interacting with the REPL trying things out, loading and testing new code, tweaking and debugging. Serious Lisp hackers often keep a Lisp image running for days on end, adding, redefining, and testing bits of their program incrementally.
Also, even when the Lisp app is deployed, there's often still a way to get to a REPL. You'll see in Chapter 26 how you can use the REPL and SLIME to interact with the Lisp that's running a Web server at the same time as it's serving up Web pages. It's even possible to use SLIME to connect to a Lisp running on a different machine, allowing you—for instance—to debug a remote server just like a local one.
An even more impressive instance of remote debugging occurred on NASA's 1998 Deep Space 1 mission. A half year after the space craft launched, a bit of Lisp code was going to control the spacecraft for two days while conducting a sequence of experiments. Unfortunately, a subtle race condition in the code had escaped detection during ground testing and was already in space. When the bug manifested in the wild—100 million miles away from Earth—the team was able to diagnose and fix the running code, allowing the experiments to complete.[23] One of the programmers described it as follows:
Debugging a program running on a $100M piece of hardware that is 100 million miles away is an interesting experience. Having a read-eval-print loop running on the spacecraft proved invaluable in finding and fixing the problem.
You're not quite ready to send any Lisp code into deep space, but in the next chapter you'll take a crack at writing a program a bit more interesting than "hello, world."
3. Practical: A Simple Database
Obviously, before you can start building real software in Lisp, you'll have to learn the language. But let's face it—you may be thinking, "'Practical Common Lisp,' isn't that an oxymoron? Why should you be expected to bother learning all the details of a language unless it's actually good for something you care about?" So I'll start by giving you a small example of what you can do with Common Lisp. In this chapter you'll write a simple database for keeping track of CDs. You'll use similar techniques in Chapter 27 when you build a database of MP3s for our streaming MP3 server. In fact, you could think of this as part of the MP3 software project—after all, in order to have a bunch of MP3s to listen to, it might be helpful to be able to keep track of which CDs you have and which ones you need to rip.
In this chapter, I'll cover just enough Lisp as we go along for you to understand how the code works. But I'll gloss over quite a few details. For now you needn't sweat the small stuff—the next several chapters will cover all the Common Lisp constructs used here, and more, in a much more systematic way.
One terminology note: I'll discuss a handful of Lisp operators in this chapter. In Chapter 4, you'll learn that Common Lisp provides three distinct kinds of operators: functions, macros, and special operators. For the purposes of this chapter, you don't really need to know the difference. I will, however, refer to different operators as functions or macros or special operators as appropriate, rather than trying to hide the details behind the word operator. For now you can treat function, macro, and special operator as all more or less equivalent.[24]
Also, keep in mind that I won't bust out all the most sophisticated Common Lisp techniques for your very first post-"hello, world" program. The point of this chapter isn't that this is how you would write a database in Lisp; rather, the point is for you to get an idea of what programming in Lisp is like and to see how even a relatively simple Lisp program can be quite featureful.
CDs and Records
To keep track of CDs that need to be ripped to MP3s and which CDs should be ripped first, each record in the database will contain the title and artist of the CD, a rating of how much the user likes it, and a flag saying whether it has been ripped. So, to start with, you'll need a way to represent a single database record (in other words, one CD). Common Lisp gives you lots of choices of data structures from a simple four-item list to a user-defined class, using the Common Lisp Object System (CLOS).
For now you can stay at the simple end of the spectrum and use a list. You can make a list with the LIST
function, which, appropriately enough, returns a list of its arguments.
CL-USER> (list 1 2 3)
(1 2 3)
You could use a four-item list, mapping a given position in the list to a given field in the record. However, another flavor of list—called a property list, or plist for short—is even more convenient. A plist is a list where every other element, starting with the first, is a symbol that describes what the next element in the list is. I won't get into all the details of exactly what a symbol is right now; basically it's a name. For the symbols that name the fields in the CD database, you can use a particular kind of symbol, called a keyword symbol. A keyword is any name that starts with a colon (:
), for instance, :foo
. Here's an example of a plist using the keyword symbols :a
, :b
, and :c
as property names:
CL-USER> (list :a 1 :b 2 :c 3)
(:A 1 :B 2 :C 3)
Note that you can create a property list with the same LIST
function as you use to create other lists; it's the contents that make it a plist.
The thing that makes plists a convenient way to represent the records in a database is the function GETF
, which takes a plist and a symbol and returns the value in the plist following the symbol, making a plist a sort of poor man's hash table. Lisp has real hash tables too, but plists are sufficient for your needs here and can more easily be saved to a file, which will come in handy later.
CL-USER> (getf (list :a 1 :b 2 :c 3) :a)
1
CL-USER> (getf (list :a 1 :b 2 :c 3) :c)
3
Given all that, you can easily enough write a function make-cd
that will take the four fields as arguments and return a plist representing that CD.
(defun make-cd (title artist rating ripped)
(list :title title :artist artist :rating rating :ripped ripped))
The word DEFUN
tells us that this form is defining a new function. The name of the function is make-cd
. After the name comes the parameter list. This function has four parameters: title
, artist
, rating
, and ripped
. Everything after the parameter list is the body of the function. In this case the body is just one form, a call to LIST
. When make-cd
is called, the arguments passed to the call will be bound to the variables in the parameter list. For instance, to make a record for the CD Roses by Kathy Mattea, you might call make-cd
like this:
CL-USER> (make-cd "Roses" "Kathy Mattea" 7 t)
(:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T)
Filing CDs
A single record, however, does not a database make. You need some larger construct to hold the records. Again, for simplicity's sake, a list seems like a good choice. Also for simplicity you can use a global variable, *db*
, which you can define with the DEFVAR
macro. The asterisks (*) in the name are a Lisp naming convention for global variables.[25]
(defvar *db* nil)
You can use the PUSH
macro to add items to *db*
. But it's probably a good idea to abstract things a tiny bit, so you should define a function add-record
that adds a record to the database.
(defun add-record (cd) (push cd *db*))
Now you can use add-record
and make-cd
together to add CDs to the database.
CL-USER> (add-record (make-cd "Roses" "Kathy Mattea" 7 t))
((:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T))
CL-USER> (add-record (make-cd "Fly" "Dixie Chicks" 8 t))
((:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)
(:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T))
CL-USER> (add-record (make-cd "Home" "Dixie Chicks" 9 t))
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)
(:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T))
The stuff printed by the REPL after each call to add-record
is the return value, which is the value returned by the last expression in the function body, the PUSH
. And PUSH
returns the new value of the variable it's modifying. So what you're actually seeing is the value of the database after the record has been added.
Looking at the Database Contents
You can also see the current value of *db*
whenever you want by typing *db*
at the REPL.
CL-USER> *db*
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)
(:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 7 :RIPPED T))
However, that's not a very satisfying way of looking at the output. You can write a dump-db
function that dumps out the database in a more human-readable format, like this:
TITLE: Home
ARTIST: Dixie Chicks
RATING: 9
RIPPED: T
TITLE: Fly
ARTIST: Dixie Chicks
RATING: 8
RIPPED: T
TITLE: Roses
ARTIST: Kathy Mattea
RATING: 7
RIPPED: T
The function looks like this:
(defun dump-db ()
(dolist (cd *db*)
(format t "~{~a:~10t~a~%~}~%" cd)))
This function works by looping over all the elements of *db*
with the DOLIST
macro, binding each element to the variable cd
in turn. For each value of cd
, you use the FORMAT
function to print it.
Admittedly, the FORMAT
call is a little cryptic. However, FORMAT
isn't particularly more complicated than C or Perl's printf
function or Python's string-%
operator. In Chapter 18 I'll discuss FORMAT
in greater detail. For now we can take this call bit by bit. As you saw in Chapter 2, FORMAT
takes at least two arguments, the first being the stream where it sends its output; t
is shorthand for the stream *standard-output*
.
The second argument to FORMAT
is a format string that can contain both literal text and directives telling FORMAT
things such as how to interpolate the rest of its arguments. Format directives start with ~
(much the way printf
's directives start with %
). FORMAT
understands dozens of directives, each with their own set of options.[26] However, for now I'll just focus on the ones you need to write dump-db
.
The ~a
directive is the aesthetic directive; it means to consume one argument and output it in a human-readable form. This will render keywords without the leading : and strings without quotation marks. For instance:
CL-USER> (format t "~a" "Dixie Chicks")
Dixie Chicks
NIL
or:
CL-USER> (format t "~a" :title)
TITLE
NIL
The ~t
directive is for tabulating. The ~10t
tells FORMAT
to emit enough spaces to move to the tenth column before processing the next ~a
. A ~t
doesn't consume any arguments.
CL-USER> (format t "~a:~10t~a" :artist "Dixie Chicks")
ARTIST: Dixie Chicks
NIL
Now things get slightly more complicated. When FORMAT
sees ~{
the next argument to be consumed must be a list. FORMAT
loops over that list, processing the directives between the ~{
and ~
}, consuming as many elements of the list as needed each time through the list. In dump-db
, the FORMAT
loop will consume one keyword and one value from the list each time through the loop. The ~%
directive doesn't consume any arguments but tells FORMAT
to emit a newline. Then after the ~
} ends the loop, the last ~%
tells FORMAT
to emit one more newline to put a blank line between each CD.
Technically, you could have also used FORMAT
to loop over the database itself, turning our dump-db
function into a one-liner.
(defun dump-db ()
(format t "~{~{~a:~10t~a~%~}~%~}" *db*))
That's either very cool or very scary depending on your point of view.
Improving the User Interaction
While our add-record
function works fine for adding records, it's a bit Lispy for the casual user. And if they want to add a bunch of records, it's not very convenient. So you may want to write a function to prompt the user for information about a set of CDs. Right away you know you'll need some way to prompt the user for a piece of information and read it. So let's write that.
(defun prompt-read (prompt)
(format *query-io* "~a: " prompt)
(force-output *query-io*)
(read-line *query-io*))
You use your old friend FORMAT
to emit a prompt. Note that there's no ~%
in the format string, so the cursor will stay on the same line. The call to FORCE-OUTPUT
is necessary in some implementations to ensure that Lisp doesn't wait for a newline before it prints the prompt.
Then you can read a single line of text with the aptly named READ-LINE
function. The variable *query-io*
is a global variable (which you can tell because of the *
naming convention for global variables) that contains the input stream connected to the terminal. The return value of prompt-read
will be the value of the last form, the call to READ-LINE
, which returns the string it read (without the trailing newline.)
You can combine your existing make-cd
function with prompt-read
to build a function that makes a new CD record from data it gets by prompting for each value in turn.
(defun prompt-for-cd ()
(make-cd
(prompt-read "Title")
(prompt-read "Artist")
(prompt-read "Rating")
(prompt-read "Ripped [y/n]")))
That's almost right. Except prompt-read
returns a string, which, while fine for the Title and Artist fields, isn't so great for the Rating and Ripped fields, which should be a number and a boolean. Depending on how sophisticated a user interface you want, you can go to arbitrary lengths to validate the data the user enters. For now let's lean toward the quick and dirty: you can wrap the prompt-read
for the rating in a call to Lisp's PARSE-INTEGER
function, like this:
(parse-integer (prompt-read "Rating"))
Unfortunately, the default behavior of PARSE-INTEGER
is to signal an error if it can't parse an integer out of the string or if there's any non-numeric junk in the string. However, it takes an optional keyword argument :junk-allowed
, which tells it to relax a bit.
(parse-integer (prompt-read "Rating") :junk-allowed t)
But there's still one problem: if it can't find an integer amidst all the junk, PARSE-INTEGER
will return NIL
rather than a number. In keeping with the quick-and-dirty approach, you may just want to call that 0 and continue. Lisp's OR
macro is just the thing you need here. It's similar to the "short-circuiting" ||
in Perl, Python, Java, and C; it takes a series of expressions, evaluates them one at a time, and returns the first non-nil value (or NIL
if they're all NIL
). So you can use the following:
(or (parse-integer (prompt-read "Rating") :junk-allowed t) 0)
to get a default value of 0.
Fixing the code to prompt for Ripped is quite a bit simpler. You can just use the Common Lisp function Y-OR-N-P
.
(y-or-n-p "Ripped [y/n]: ")
In fact, this will be the most robust part of prompt-for-cd
, as Y-OR-N-P
will reprompt the user if they enter something that doesn't start with y, Y, n, or N.
Putting those pieces together you get a reasonably robust prompt-for-cd
function.
(defun prompt-for-cd ()
(make-cd
(prompt-read "Title")
(prompt-read "Artist")
(or (parse-integer (prompt-read "Rating") :junk-allowed t) 0)
(y-or-n-p "Ripped [y/n]: ")))
Finally, you can finish the "add a bunch of CDs" interface by wrapping prompt-for-cd
in a function that loops until the user is done. You can use the simple form of the LOOP
macro, which repeatedly executes a body of expressions until it's exited by a call to RETURN
. For example:
(defun add-cds ()
(loop (add-record (prompt-for-cd))
(if (not (y-or-n-p "Another? [y/n]: ")) (return))))
Now you can use add-cds
to add some more CDs to the database.
CL-USER> (add-cds)
Title: Rockin' the Suburbs
Artist: Ben Folds
Rating: 6
Ripped [y/n]: y
Another? [y/n]: y
Title: Give Us a Break
Artist: Limpopo
Rating: 10
Ripped [y/n]: y
Another? [y/n]: y
Title: Lyle Lovett
Artist: Lyle Lovett
Rating: 9
Ripped [y/n]: y
Another? [y/n]: n
NIL
Saving and Loading the Database
Having a convenient way to add records to the database is nice. But it's not so nice that the user is going to be very happy if they have to reenter all the records every time they quit and restart Lisp. Luckily, with the data structures you're using to represent the data, it's trivially easy to save the data to a file and reload it later. Here's a save-db
function that takes a filename as an argument and saves the current state of the database:
(defun save-db (filename)
(with-open-file (out filename
:direction :output
:if-exists :supersede)
(with-standard-io-syntax
(print *db* out))))
The WITH-OPEN-FILE
macro opens a file, binds the stream to a variable, executes a set of expressions, and then closes the file. It also makes sure the file is closed even if something goes wrong while evaluating the body. The list directly after WITH-OPEN-FILE
isn't a function call but rather part of the syntax defined by WITH-OPEN-FILE
. It contains the name of the variable that will hold the file stream to which you'll write within the body of WITH-OPEN-FILE
, a value that must be a file name, and then some options that control how the file is opened. Here you specify that you're opening the file for writing with :direction :output
and that you want to overwrite an existing file of the same name if it exists with :if-exists :supersede
.
Once you have the file open, all you have to do is print the contents of the database with (print *db* out)
. Unlike FORMAT
, PRINT
prints Lisp objects in a form that can be read back in by the Lisp reader. The macro WITH-STANDARD-IO-SYNTAX
ensures that certain variables that affect the behavior of PRINT
are set to their standard values. You'll use the same macro when you read the data back in to make sure the Lisp reader and printer are operating compatibly.
The argument to save-db
should be a string containing the name of the file where the user wants to save the database. The exact form of the string will depend on what operating system they're using. For instance, on a Unix box they should be able to call save-db
like this:
CL-USER> (save-db "~/my-cds.db")
((:TITLE "Lyle Lovett" :ARTIST "Lyle Lovett" :RATING 9 :RIPPED T)
(:TITLE "Give Us a Break" :ARTIST "Limpopo" :RATING 10 :RIPPED T)
(:TITLE "Rockin' the Suburbs" :ARTIST "Ben Folds" :RATING 6 :RIPPED
T)
(:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T)
(:TITLE "Roses" :ARTIST "Kathy Mattea" :RATING 9 :RIPPED T))
On Windows, the filename might be something like "c:/my-cds.db
" or "c:\\my-cds.db
."[27]
You can open this file in any text editor to see what it looks like. You should see something a lot like what the REPL prints if you type *db*
.
The function to load the database back in is similar.
(defun load-db (filename)
(with-open-file (in filename)
(with-standard-io-syntax
(setf *db* (read in)))))
This time you don't need to specify :direction
in the options to WITH-OPEN-FILE
, since you want the default of :input
. And instead of printing, you use the function READ
to read from the stream in
. This is the same reader used by the REPL and can read any Lisp expression you could type at the REPL prompt. However, in this case, you're just reading and saving the expression, not evaluating it. Again, the WITH-STANDARD-IO-SYNTAX
macro ensures that READ
is using the same basic syntax that save-db
did when it PRINT
ed the data.
The SETF
macro is Common Lisp's main assignment operator. It sets its first argument to the result of evaluating its second argument. So in load-db
the *db*
variable will contain the object read from the file, namely, the list of lists written by save-db
. You do need to be careful about one thing—load-db
clobbers whatever was in *db*
before the call. So if you've added records with add-record
or add-cds
that haven't been saved with save-db
, you'll lose them.
Querying the Database
Now that you have a way to save and reload the database to go along with a convenient user interface for adding new records, you soon may have enough records that you won't want to be dumping out the whole database just to look at what's in it. What you need is a way to query the database. You might like, for instance, to be able to write something like this:
(select :artist "Dixie Chicks")
and get a list of all the records where the artist is the Dixie Chicks. Again, it turns out that the choice of saving the records in a list will pay off.
The function REMOVE-IF-NOT
takes a predicate and a list and returns a list containing only the elements of the original list that match the predicate. In other words, it has removed all the elements that don't match the predicate. However, REMOVE-IF-NOT
doesn't really remove anything—it creates a new list, leaving the original list untouched. It's like running grep over a file. The predicate argument can be any function that accepts a single argument and returns a boolean value—NIL
for false and anything else for true.
For instance, if you wanted to extract all the even elements from a list of numbers, you could use REMOVE-IF-NOT
as follows:
CL-USER> (remove-if-not #'evenp '(1 2 3 4 5 6 7 8 9 10))
(2 4 6 8 10)
In this case, the predicate is the function EVENP
, which returns true if its argument is an even number. The funny notation #'
is shorthand for "Get me the function with the following name." Without the #'
, Lisp would treat evenp
as the name of a variable and look up the value of the variable, not the function.
You can also pass REMOVE-IF-NOT
an anonymous function. For instance, if EVENP
didn't exist, you could write the previous expression as the following:
CL-USER> (remove-if-not #'(lambda (x) (= 0 (mod x 2))) '(1 2 3 4 5 6 7 8 9 10))
(2 4 6 8 10)
In this case, the predicate is this anonymous function:
(lambda (x) (= 0 (mod x 2)))
which checks that its argument is equal to 0 modulus 2 (in other words, is even). If you wanted to extract only the odd numbers using an anonymous function, you'd write this:
CL-USER> (remove-if-not #'(lambda (x) (= 1 (mod x 2))) '(1 2 3 4 5 6 7 8 9 10))
(1 3 5 7 9)
Note that lambda
isn't the name of the function—it's the indicator you're defining an anonymous function.[28] Other than the lack of a name, however, a LAMBDA
expression looks a lot like a DEFUN
: the word lambda
is followed by a parameter list, which is followed by the body of the function.
To select all the Dixie Chicks' albums in the database using REMOVE-IF-NOT
, you need a function that returns true when the artist field of a record is "Dixie Chicks"
. Remember that we chose the plist representation for the database records because the function GETF
can extract named fields from a plist. So assuming cd
is the name of a variable holding a single database record, you can use the expression (getf cd :artist)
to extract the name of the artist. The function EQUAL
, when given string arguments, compares them character by character. So (equal (getf cd :artist) "Dixie Chicks")
will test whether the artist field of a given CD is equal to "Dixie Chicks"
. All you need to do is wrap that expression in a LAMBDA
form to make an anonymous function and pass it to REMOVE-IF-NOT
.
CL-USER> (remove-if-not
#'(lambda (cd) (equal (getf cd :artist) "Dixie Chicks")) *db*)
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T))
Now suppose you want to wrap that whole expression in a function that takes the name of the artist as an argument. You can write that like this:
(defun select-by-artist (artist)
(remove-if-not
#'(lambda (cd) (equal (getf cd :artist) artist))
*db*))
Note how the anonymous function, which contains code that won't run until it's invoked in REMOVE-IF-NOT
, can nonetheless refer to the variable artist
. In this case the anonymous function doesn't just save you from having to write a regular function—it lets you write a function that derives part of its meaning—the value of artist
—from the context in which it's embedded.
So that's select-by-artist
. However, selecting by artist is only one of the kinds of queries you might like to support. You could write several more functions, such as select-by-title
, select-by-rating
, select-by-title-and-artist
, and so on. But they'd all be about the same except for the contents of the anonymous function. You can instead make a more general select
function that takes a function as an argument.
(defun select (selector-fn)
(remove-if-not selector-fn *db*))
So what happened to the #'
? Well, in this case you don't want REMOVE-IF-NOT
to use the function named selector-fn
. You want it to use the anonymous function that was passed as an argument to select
in the variable selector-fn
. Though, the #'
comes back in the call to select
.
CL-USER> (select #'(lambda (cd) (equal (getf cd :artist) "Dixie Chicks")))
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T))
But that's really quite gross-looking. Luckily, you can wrap up the creation of the anonymous function.
(defun artist-selector (artist)
#'(lambda (cd) (equal (getf cd :artist) artist)))
This is a function that returns a function and one that references a variable that—it seems—won't exist after artist-selector
returns.[29] It may seem odd now, but it actually works just the way you'd want—if you call artist-selector
with an argument of "Dixie Chicks"
, you get an anonymous function that matches CDs whose :artist
field is "Dixie Chicks"
, and if you call it with "Lyle Lovett"
, you get a different function that will match against an :artist
field of "Lyle Lovett"
. So now you can rewrite the call to select
like this:
CL-USER> (select (artist-selector "Dixie Chicks"))
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 9 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 8 :RIPPED T))
Now you just need some more functions to generate selectors. But just as you don't want to have to write select-by-title
, select-by-rating
, and so on, because they would all be quite similar, you're not going to want to write a bunch of nearly identical selector-function generators, one for each field. Why not write one general-purpose selector-function generator, a function that, depending on what arguments you pass it, will generate a selector function for different fields or maybe even a combination of fields? You can write such a function, but first you need a crash course in a feature called keyword parameters.
In the functions you've written so far, you've specified a simple list of parameters, which are bound to the corresponding arguments in the call to the function. For instance, the following function:
(defun foo (a b c) (list a b c))
has three parameters, a
, b
, and c
, and must be called with three arguments. But sometimes you may want to write a function that can be called with varying numbers of arguments. Keyword parameters are one way to achieve this. A version of foo
that uses keyword parameters might look like this:
(defun foo (&key a b c) (list a b c))
The only difference is the &key
at the beginning of the argument list. However, the calls to this new foo
will look quite different. These are all legal calls with the result to the right of the ==>:
(foo :a 1 :b 2 :c 3) ==> (1 2 3)
(foo :c 3 :b 2 :a 1) ==> (1 2 3)
(foo :a 1 :c 3) ==> (1 NIL 3)
(foo) ==> (NIL NIL NIL)
As these examples show, the value of the variables a
, b
, and c
are bound to the values that follow the corresponding keyword. And if a particular keyword isn't present in the call, the corresponding variable is set to NIL
. I'm glossing over a bunch of details of how keyword parameters are specified and how they relate to other kinds of parameters, but you need to know one more detail.
Normally if a function is called with no argument for a particular keyword parameter, the parameter will have the value NIL
. However, sometimes you'll want to be able to distinguish between a NIL
that was explicitly passed as the argument to a keyword parameter and the default value NIL
. To allow this, when you specify a keyword parameter you can replace the simple name with a list consisting of the name of the parameter, a default value, and another parameter name, called a supplied-p parameter. The supplied-p parameter will be set to true or false depending on whether an argument was actually passed for that keyword parameter in a particular call to the function. Here's a version of foo
that uses this feature:
(defun foo (&key a (b 20) (c 30 c-p)) (list a b c c-p))
Now the same calls from earlier yield these results:
(foo :a 1 :b 2 :c 3) ==> (1 2 3 T)
(foo :c 3 :b 2 :a 1) ==> (1 2 3 T)
(foo :a 1 :c 3) ==> (1 20 3 T)
(foo) ==> (NIL 20 30 NIL)
The general selector-function generator, which you can call where
for reasons that will soon become apparent if you're familiar with SQL databases, is a function that takes four keyword parameters corresponding to the fields in our CD records and generates a selector function that selects any CDs that match all the values given to where
. For instance, it will let you say things like this:
(select (where :artist "Dixie Chicks"))
or this:
(select (where :rating 10 :ripped nil))
The function looks like this:
(defun where (&key title artist rating (ripped nil ripped-p))
#'(lambda (cd)
(and
(if title (equal (getf cd :title) title) t)
(if artist (equal (getf cd :artist) artist) t)
(if rating (equal (getf cd :rating) rating) t)
(if ripped-p (equal (getf cd :ripped) ripped) t))))
This function returns an anonymous function that returns the logical AND
of one clause per field in our CD records. Each clause checks if the appropriate argument was passed in and then either compares it to the value in the corresponding field in the CD record or returns t
, Lisp's version of truth, if the parameter wasn't passed in. Thus, the selector function will return t
only for CDs that match all the arguments passed to where
.[30] Note that you need to use a three-item list to specify the keyword parameter ripped
because you need to know whether the caller actually passed :ripped nil
, meaning, "Select CDs whose ripped field is nil," or whether they left out :ripped
altogether, meaning "I don't care what the value of the ripped field is."
Updating Existing Records—Another Use for WHERE
Now that you've got nice generalized select
and where
functions, you're in a good position to write the next feature that every database needs—a way to update particular records. In SQL the update
command is used to update a set of records matching a particular where
clause. That seems like a good model, especially since you've already got a where
-clause generator. In fact, the update
function is mostly just the application of a few ideas you've already seen: using a passed-in selector function to choose the records to update and using keyword arguments to specify the values to change. The main new bit is the use of a function MAPCAR
that maps over a list, *db*
in this case, and returns a new list containing the results of calling a function on each item in the original list.
(defun update (selector-fn &key title artist rating (ripped nil ripped-p))
(setf *db*
(mapcar
#'(lambda (row)
(when (funcall selector-fn row)
(if title (setf (getf row :title) title))
(if artist (setf (getf row :artist) artist))
(if rating (setf (getf row :rating) rating))
(if ripped-p (setf (getf row :ripped) ripped)))
row) *db*)))
One other new bit here is the use of SETF
on a complex form such as (getf row :title)
. I'll discuss SETF
in greater detail in Chapter 6, but for now you just need to know that it's a general assignment operator that can be used to assign lots of "places" other than just variables. (It's a coincidence that SETF
and GETF
have such similar names—they don't have any special relationship.) For now it's enough to know that after (setf (getf row :title) title)
, the plist referenced by row will have the value of the variable title
following the property name :title
. With this update
function if you decide that you really dig the Dixie Chicks and that all their albums should go to 11, you can evaluate the following form:
CL-USER> (update (where :artist "Dixie Chicks") :rating 11)
NIL
And it is so.
CL-USER> (select (where :artist "Dixie Chicks"))
((:TITLE "Home" :ARTIST "Dixie Chicks" :RATING 11 :RIPPED T)
(:TITLE "Fly" :ARTIST "Dixie Chicks" :RATING 11 :RIPPED T))
You can even more easily add a function to delete rows from the database.
(defun delete-rows (selector-fn)
(setf *db* (remove-if selector-fn *db*)))
The function REMOVE-IF
is the complement of REMOVE-IF-NOT
; it returns a list with all the elements that do match the predicate removed. Like REMOVE-IF-NOT
, it doesn't actually affect the list it's passed but by saving the result back into *db*
, delete-rows
[31] actually changes the contents of the database.[32]
Removing Duplication and Winning Big
So far all the database code supporting insert, select, update, and delete, not to mention a command-line user interface for adding new records and dumping out the contents, is just a little more than 50 lines. Total.[33]
Yet there's still some annoying code duplication. And it turns out you can remove the duplication and make the code more flexible at the same time. The duplication I'm thinking of is in the where function. The body of the where
function is a bunch of clauses like this, one per field:
(if title (equal (getf cd :title) title) t)
Right now it's not so bad, but like all code duplication it has the same cost: if you want to change how it works, you have to change multiple copies. And if you change the fields in a CD, you'll have to add or remove clauses to where
. And update
suffers from the same kind of duplication. It's doubly annoying since the whole point of the where
function is to dynamically generate a bit of code that checks the values you care about; why should it have to do work at runtime checking whether title
was even passed in?
Imagine that you were trying to optimize this code and discovered that it was spending too much time checking whether title
and the rest of the keyword parameters to where
were even set?[34] If you really wanted to remove all those runtime checks, you could go through a program and find all the places you call where
and look at exactly what arguments you're passing. Then you could replace each call to where
with an anonymous function that does only the computation necessary. For instance, if you found this snippet of code:
(select (where :title "Give Us a Break" :ripped t))
you could change it to this:
(select
#'(lambda (cd)
(and (equal (getf cd :title) "Give Us a Break")
(equal (getf cd :ripped) t))))
Note that the anonymous function is different from the one that where
would have returned; you're not trying to save the call to where
but rather to provide a more efficient selector function. This anonymous function has clauses only for the fields that you actually care about at this call site, so it doesn't do any extra work the way a function returned by where
might.
You can probably imagine going through all your source code and fixing up all the calls to where
in this way. But you can probably also imagine that it would be a huge pain. If there were enough of them, and it was important enough, it might even be worthwhile to write some kind of preprocessor that converts where
calls to the code you'd write by hand.
The Lisp feature that makes this trivially easy is its macro system. I can't emphasize enough that the Common Lisp macro shares essentially nothing but the name with the text-based macros found in C and C++. Where the C pre-processor operates by textual substitution and understands almost nothing of the structure of C and C++, a Lisp macro is essentially a code generator that gets run for you automatically by the compiler.[35] When a Lisp expression contains a call to a macro, instead of evaluating the arguments and passing them to the function, the Lisp compiler passes the arguments, unevaluated, to the macro code, which returns a new Lisp expression that is then evaluated in place of the original macro call.
I'll start with a simple, and silly, example and then show how you can replace the where
function with a where
macro. Before I can write this example macro, I need to quickly introduce one new function: REVERSE
takes a list as an argument and returns a new list that is its reverse. So (reverse '(1 2 3))
evaluates to (3 2 1)
. Now let's create a macro.
(defmacro backwards (expr) (reverse expr))
The main syntactic difference between a function and a macro is that you define a macro with DEFMACRO
instead of DEFUN
. After that a macro definition consists of a name, just like a function, a parameter list, and a body of expressions, both also like a function. However, a macro has a totally different effect. You can use this macro as follows:
CL-USER> (backwards ("hello, world" t format))
hello, world
NIL
How did that work? When the REPL started to evaluate the backwards
expression, it recognized that backwards
is the name of a macro. So it left the expression ("hello, world" t format)
unevaluated, which is good because it isn't a legal Lisp form. It then passed that list to the backwards
code. The code in backwards
passed the list to REVERSE
, which returned the list (format t "hello, world")
. backwards
then passed that value back out to the REPL, which then evaluated it in place of the original expression.
The backwards
macro thus defines a new language that's a lot like Lisp—just backward—that you can drop into anytime simply by wrapping a backward Lisp expression in a call to the backwards
macro. And, in a compiled Lisp program, that new language is just as efficient as normal Lisp because all the macro code—the code that generates the new expression—runs at compile time. In other words, the compiler will generate exactly the same code whether you write (backwards ("hello, world" t format))
or (format t "hello, world")
.
So how does that help with the code duplication in where
? Well, you can write a macro that generates exactly the code you need for each particular call to where
. Again, the best approach is to build our code bottom up. In the hand-optimized selector function, you had an expression of the following form for each actual field referred to in the original call to where
:
(equal (getf cd field) value)
So let's write a function that, given the name of a field and a value, returns such an expression. Since an expression is just a list, you might think you could write something like this:
(defun make-comparison-expr (field value) ; wrong
(list equal (list getf cd field) value))
However, there's one trick here: as you know, when Lisp sees a simple name such as field
or value
other than as the first element of a list, it assumes it's the name of a variable and looks up its value. That's fine for field
and value
; it's exactly what you want. But it will treat equal
, getf
, and cd
the same way, which isn't what you want. However, you also know how to stop Lisp from evaluating a form: stick a single forward quote ('
) in front of it. So if you write make-comparison-expr
like this, it will do what you want:
(defun make-comparison-expr (field value)
(list 'equal (list 'getf 'cd field) value))
You can test it out in the REPL.
CL-USER> (make-comparison-expr :rating 10)
(EQUAL (GETF CD :RATING) 10)
CL-USER> (make-comparison-expr :title "Give Us a Break")
(EQUAL (GETF CD :TITLE) "Give Us a Break")
It turns out that there's an even better way to do it. What you'd really like is a way to write an expression that's mostly not evaluated and then have some way to pick out a few expressions that you do want evaluated. And, of course, there's just such a mechanism. A back quote (`
) before an expression stops evaluation just like a forward quote.
CL-USER> `(1 2 3)
(1 2 3)
CL-USER> '(1 2 3)
(1 2 3)
However, in a back-quoted expression, any subexpression that's preceded by a comma is evaluated. Notice the effect of the comma in the second expression:
`(1 2 (+ 1 2)) ==> (1 2 (+ 1 2))
`(1 2 ,(+ 1 2)) ==> (1 2 3)
Using a back quote, you can write make-comparison-expr
like this:
(defun make-comparison-expr (field value)
`(equal (getf cd ,field) ,value))
Now if you look back to the hand-optimized selector function, you can see that the body of the function consisted of one comparison expression per field/value pair, all wrapped in an AND
expression. Assume for the moment that you'll arrange for the arguments to the where
macro to be passed as a single list. You'll need a function that can take the elements of such a list pairwise and collect the results of calling make-comparison-expr
on each pair. To implement that function, you can dip into the bag of advanced Lisp tricks and pull out the mighty and powerful LOOP
macro.
(defun make-comparisons-list (fields)
(loop while fields
collecting (make-comparison-expr (pop fields) (pop fields))))
A full discussion of LOOP
will have to wait until Chapter 22; for now just note that this LOOP
expression does exactly what you need: it loops while there are elements left in the fields
list, popping off two at a time, passing them to make-comparison-expr
, and collecting the results to be returned at the end of the loop. The POP
macro performs the inverse operation of the PUSH
macro you used to add records to *db*
.
Now you just need to wrap up the list returned by make-comparison-list
in an AND
and an anonymous function, which you can do in the where
macro itself. Using a back quote to make a template that you fill in by interpolating the value of make-comparisons-list
, it's trivial.
(defmacro where (&rest clauses)
`#'(lambda (cd) (and ,@(make-comparisons-list clauses))))
This macro uses a variant of ,
(namely, the ,@
) before the call to make-comparisons-list
. The ,@
"splices" the value of the following expression—which must evaluate to a list—into the enclosing list. You can see the difference between ,
and ,@
in the following two expressions:
`(and ,(list 1 2 3)) ==> (AND (1 2 3))
`(and ,@(list 1 2 3)) ==> (AND 1 2 3)
You can also use ,@
to splice into the middle of a list.
`(and ,@(list 1 2 3) 4) ==> (AND 1 2 3 4)
The other important feature of the where
macro is the use of &rest
in the argument list. Like &key
, &rest
modifies the way arguments are parsed. With a &rest
in its parameter list, a function or macro can take an arbitrary number of arguments, which are collected into a single list that becomes the value of the variable whose name follows the &rest
. So if you call where
like this:
(where :title "Give Us a Break" :ripped t)
the variable clauses
will contain the list.
(:title "Give Us a Break" :ripped t)
This list is passed to make-comparisons-list
, which returns a list of comparison expressions. You can see exactly what code a call to where
will generate using the function MACROEXPAND-1
. If you pass MACROEXPAND-1
, a form representing a macro call, it will call the macro code with appropriate arguments and return the expansion. So you can check out the previous where
call like this:
CL-USER> (macroexpand-1 '(where :title "Give Us a Break" :ripped t))
#'(LAMBDA (CD)
(AND (EQUAL (GETF CD :TITLE) "Give Us a Break")
(EQUAL (GETF CD :RIPPED) T)))
T
Looks good. Let's try it for real.
CL-USER> (select (where :title "Give Us a Break" :ripped t))
((:TITLE "Give Us a Break" :ARTIST "Limpopo" :RATING 10 :RIPPED T))
It works. And the where
macro with its two helper functions is actually one line shorter than the old where
function. And it's more general in that it's no longer tied to the specific fields in our CD records.
Wrapping Up
Now, an interesting thing has happened. You removed duplication and made the code more efficient and more general at the same time. That's often the way it goes with a well-chosen macro. This makes sense because a macro is just another mechanism for creating abstractions—abstraction at the syntactic level, and abstractions are by definition more concise ways of expressing underlying generalities. Now the only code in the mini-database that's specific to CDs and the fields in them is in the make-cd
, prompt-for-cd
, and add-cd
functions. In fact, our new where
macro would work with any plist-based database.
However, this is still far from being a complete database. You can probably think of plenty of features to add, such as supporting multiple tables or more elaborate queries. In Chapter 27 we'll build an MP3 database that incorporates some of those features.
The point of this chapter was to give you a quick introduction to just a handful of Lisp's features and show how they're used to write code that's a bit more interesting than "hello, world." In the next chapter we'll begin a more systematic overview of Lisp.
4. Syntax and Semantics
After that whirlwind tour, we'll settle down for a few chapters to take a more systematic look at the features you've used so far. I'll start with an overview of the basic elements of Lisp's syntax and semantics, which means, of course, that I must first address that burning question. . .
What's with All the Parentheses?
Lisp's syntax is quite a bit different from the syntax of languages descended from Algol. The two most immediately obvious characteristics are the extensive use of parentheses and prefix notation. For whatever reason, a lot of folks are put off by this syntax. Lisp's detractors tend to describe the syntax as "weird" and "annoying." Lisp, they say, must stand for Lots of Irritating Superfluous Parentheses. Lisp folks, on the other hand, tend to consider Lisp's syntax one of its great virtues. How is it that what's so off-putting to one group is a source of delight to another?
I can't really make the complete case for Lisp's syntax until I've explained Lisp's macros a bit more thoroughly, but I can start with an historical tidbit that suggests it may be worth keeping an open mind: when John McCarthy first invented Lisp, he intended to implement a more Algol-like syntax, which he called M-expressions. However, he never got around to it. He explained why not in his article "History of Lisp."[36]
The project of defining M-expressions precisely and compiling them or at least translating them into S-expressions was neither finalized nor explicitly abandoned. It just receded into the indefinite future, and a new generation of programmers appeared who preferred [S-expressions] to any FORTRAN-like or ALGOL-like notation that could be devised.
In other words, the people who have actually used Lisp over the past 45 years have liked the syntax and have found that it makes the language more powerful. In the next few chapters, you'll begin to see why.
Breaking Open the Black Box
Before we look at the specifics of Lisp's syntax and semantics, it's worth taking a moment to look at how they're defined and how this differs from many other languages.
In most programming languages, the language processor—whether an interpreter or a compiler—operates as a black box: you shove a sequence of characters representing the text of a program into the black box, and it—depending on whether it's an interpreter or a compiler—either executes the behaviors indicated or produces a compiled version of the program that will execute the behaviors when it's run.
Inside the black box, of course, language processors are usually divided into subsystems that are each responsible for one part of the task of translating a program text into behavior or object code. A typical division is to split the processor into three phases, each of which feeds into the next: a lexical analyzer breaks up the stream of characters into tokens and feeds them to a parser that builds a tree representing the expressions in the program, according to the language's grammar. This tree—called an abstract syntax tree—is then fed to an evaluator that either interprets it directly or compiles it into some other language such as machine code. Because the language processor is a black box, the data structures used by the processor, such as the tokens and abstract syntax trees, are of interest only to the language implementer.
In Common Lisp things are sliced up a bit differently, with consequences for both the implementer and for how the language is defined. Instead of a single black box that goes from text to program behavior in one step, Common Lisp defines two black boxes, one that translates text into Lisp objects and another that implements the semantics of the language in terms of those objects. The first box is called the reader, and the second is called the evaluator.[37]
Each black box defines one level of syntax. The reader defines how strings of characters can be translated into Lisp objects called s-expressions.[38] Since the s-expression syntax includes syntax for lists of arbitrary objects, including other lists, s-expressions can represent arbitrary tree expressions, much like the abstract syntax tree generated by the parsers for non-Lisp languages.
The evaluator then defines a syntax of Lisp forms that can be built out of s-expressions. Not all s-expressions are legal Lisp forms any more than all sequences of characters are legal s-expressions. For instance, both (foo 1 2)
and ("foo" 1 2)
are s-expressions, but only the former can be a Lisp form since a list that starts with a string has no meaning as a Lisp form.
This split of the black box has a couple of consequences. One is that you can use s-expressions, as you saw in Chapter 3, as an externalizable data format for data other than source code, using READ
to read it and PRINT
to print it.[39] The other consequence is that since the semantics of the language are defined in terms of trees of objects rather than strings of characters, it's easier to generate code within the language than it would be if you had to generate code as text. Generating code completely from scratch is only marginally easier—building up lists vs. building up strings is about the same amount of work. The real win, however, is that you can generate code by manipulating existing data. This is the basis for Lisp's macros, which I'll discuss in much more detail in future chapters. For now I'll focus on the two levels of syntax defined by Common Lisp: the syntax of s-expressions understood by the reader and the syntax of Lisp forms understood by the evaluator.
S-expressions
The basic elements of s-expressions are lists and atoms. Lists are delimited by parentheses and can contain any number of whitespace-separated elements. Atoms are everything else.[40] The elements of lists are themselves s-expressions (in other words, atoms or nested lists). Comments—which aren't, technically speaking, s-expressions—start with a semicolon, extend to the end of a line, and are treated essentially like whitespace.
And that's pretty much it. Since lists are syntactically so trivial, the only remaining syntactic rules you need to know are those governing the form of different kinds of atoms. In this section I'll describe the rules for the most commonly used kinds of atoms: numbers, strings, and names. After that, I'll cover how s-expressions composed of these elements can be evaluated as Lisp forms.
Numbers are fairly straightforward: any sequence of digits—possibly prefaced with a sign (+
or -
), containing a decimal point (.
) or a solidus (/
), or ending with an exponent marker—is read as a number. For example:
123 ; the integer one hundred twenty-three
3/7 ; the ratio three-sevenths
1.0 ; the floating-point number one in default precision
1.0e0 ; another way to write the same floating-point number
1.0d0 ; the floating-point number one in "double" precision
1.0e-4 ; the floating-point equivalent to one-ten-thousandth
+42 ; the integer forty-two
-42 ; the integer negative forty-two
-1/4 ; the ratio negative one-quarter
-2/8 ; another way to write negative one-quarter
246/2 ; another way to write the integer one hundred twenty-three
These different forms represent different kinds of numbers: integers, ratios, and floating point. Lisp also supports complex numbers, which have their own notation and which I'll discuss in Chapter 10.
As some of these examples suggest, you can notate the same number in many ways. But regardless of how you write them, all rationals—integers and ratios—are represented internally in "simplified" form. In other words, the objects that represent -2/8 or 246/2 aren't distinct from the objects that represent -1/4 and 123. Similarly, 1.0
and 1.0e0
are just different ways of writing the same number. On the other hand, 1.0
, 1.0d0
, and 1
can all denote different objects because the different floating-point representations and integers are different types. We'll save the details about the characteristics of different kinds of numbers for Chapter 10.
Strings literals, as you saw in the previous chapter, are enclosed in double quotes. Within a string a backslash (\
) escapes the next character, causing it to be included in the string regardless of what it is. The only two characters that must be escaped within a string are double quotes and the backslash itself. All other characters can be included in a string literal without escaping, regardless of their meaning outside a string. Some example string literals are as follows:
"foo" ; the string containing the characters f, o, and o.
"fo\o" ; the same string
"fo\\o" ; the string containing the characters f, o, \, and o.
"fo\"o" ; the string containing the characters f, o, ", and o.
Names used in Lisp programs, such as FORMAT
and hello-world
, and *db*
are represented by objects called symbols. The reader knows nothing about how a given name is going to be used—whether it's the name of a variable, a function, or something else. It just reads a sequence of characters and builds an object to represent the name.[41] Almost any character can appear in a name. Whitespace characters can't, though, because the elements of lists are separated by whitespace. Digits can appear in names as long as the name as a whole can't be interpreted as a number. Similarly, names can contain periods, but the reader can't read a name that consists only of periods. Ten characters that serve other syntactic purposes can't appear in names: open and close parentheses, double and single quotes, backtick, comma, colon, semicolon, backslash, and vertical bar. And even those characters can, if you're willing to escape them by preceding the character to be escaped with a backslash or by surrounding the part of the name containing characters that need escaping with vertical bars.
Two important characteristics of the way the reader translates names to symbol objects have to do with how it treats the case of letters in names and how it ensures that the same name is always read as the same symbol. While reading names, the reader converts all unescaped characters in a name to their uppercase equivalents. Thus, the reader will read foo
, Foo
, and FOO
as the same symbol: FOO
. However, \f\o\o
and |foo|
will both be read as foo
, which is a different object than the symbol FOO
. This is why when you define a function at the REPL and it prints the name of the function, it's been converted to uppercase. Standard style, these days, is to write code in all lowercase and let the reader change names to uppercase.[42]
To ensure that the same textual name is always read as the same symbol, the reader interns symbols—after it has read the name and converted it to all uppercase, the reader looks in a table called a package for an existing symbol with the same name. If it can't find one, it creates a new symbol and adds it to the table. Otherwise, it returns the symbol already in the table. Thus, anywhere the same name appears in any s-expression, the same object will be used to represent it.[43]
Because names can contain many more characters in Lisp than they can in Algol-derived languages, certain naming conventions are distinct to Lisp, such as the use of hyphenated names like hello-world
. Another important convention is that global variables are given names that start and end with *
. Similarly, constants are given names starting and ending in +
. And some programmers will name particularly low-level functions with names that start with %
or even %%
. The names defined in the language standard use only the alphabetic characters (A-Z) plus *
, +
, -
, /
, 1
, 2
, <
, =
, >
, and &
.
The syntax for lists, numbers, strings, and symbols can describe a good percentage of Lisp programs. Other rules describe notations for literal vectors, individual characters, and arrays, which I'll cover when I talk about the associated data types in Chapters 10 and 11. For now the key thing to understand is how you can combine numbers, strings, and symbols with parentheses-delimited lists to build s-expressions representing arbitrary trees of objects. Some simple examples look like this:
x ; the symbol X
() ; the empty list
(1 2 3) ; a list of three numbers
("foo" "bar") ; a list of two strings
(x y z) ; a list of three symbols
(x 1 "foo") ; a list of a symbol, a number, and a string
(+ (* 2 3) 4) ; a list of a symbol, a list, and a number.
An only slightly more complex example is the following four-item list that contains two symbols, the empty list, and another list, itself containing two symbols and a string:
(defun hello-world ()
(format t "hello, world"))
S-expressions As Lisp Forms
After the reader has translated a bunch of text into s-expressions, the s-expressions can then be evaluated as Lisp code. Or some of them can—not every s-expressions that the reader can read can necessarily be evaluated as Lisp code. Common Lisp's evaluation rule defines a second level of syntax that determines which s-expressions can be treated as Lisp forms.[44] The syntactic rules at this level are quite simple. Any atom—any nonlist or the empty list—is a legal Lisp form as is any list that has a symbol as its first element.[45]
Of course, the interesting thing about Lisp forms isn't their syntax but how they're evaluated. For purposes of discussion, you can think of the evaluator as a function that takes as an argument a syntactically well-formed Lisp form and returns a value, which we can call the value of the form. Of course, when the evaluator is a compiler, this is a bit of a simplification—in that case, the evaluator is given an expression and generates code that will compute the appropriate value when it's run. But this simplification lets me describe the semantics of Common Lisp in terms of how the different kinds of Lisp forms are evaluated by this notional function.
The simplest Lisp forms, atoms, can be divided into two categories: symbols and everything else. A symbol, evaluated as a form, is considered the name of a variable and evaluates to the current value of the variable.[46] I'll discuss in Chapter 6 how variables get their values in the first place. You should also note that certain "variables" are that old oxymoron of programming: "constant variables." For instance, the symbol PI
names a constant variable whose value is the best possible floating-point approximation to the mathematical constant pi.
All other atoms—numbers and strings are the kinds you've seen so far—are self-evaluating objects. This means when such an expression is passed to the notional evaluation function, it's simply returned. You saw examples of self-evaluating objects in Chapter 2 when you typed 10
and "hello, world"
at the REPL.
It's also possible for symbols to be self-evaluating in the sense that the variables they name can be assigned the value of the symbol itself. Two important constants that are defined this way are T
and NIL
, the canonical true and false values. I'll discuss their role as booleans in the section "Truth, Falsehood, and Equality."
Another class of self-evaluating symbols are the keyword symbols—symbols whose names start with :
. When the reader interns such a name, it automatically defines a constant variable with the name and with the symbol as the value.
Things get more interesting when we consider how lists are evaluated. All legal list forms start with a symbol, but three kinds of list forms are evaluated in three quite different ways. To determine what kind of form a given list is, the evaluator must determine whether the symbol that starts the list is the name of a function, a macro, or a special operator. If the symbol hasn't been defined yet—as may be the case if you're compiling code that contains references to functions that will be defined later—it's assumed to be a function name.[47] I'll refer to the three kinds of forms as function call forms, macro forms, and special forms.
Function Calls
The evaluation rule for function call forms is simple: evaluate the remaining elements of the list as Lisp forms and pass the resulting values to the named function. This rule obviously places some additional syntactic constraints on a function call form: all the elements of the list after the first must themselves be well-formed Lisp forms. In other words, the basic syntax of a function call form is as follows, where each of the arguments is itself a Lisp form:
(function-name argument*)
Thus, the following expression is evaluated by first evaluating 1
, then evaluating 2
, and then passing the resulting values to the +
function, which returns 3:
(+ 1 2)
A more complex expression such as the following is evaluated in similar fashion except that evaluating the arguments (+ 1 2)
and (- 3 4)
entails first evaluating their arguments and applying the appropriate functions to them:
(* (+ 1 2) (- 3 4))
Eventually, the values 3 and -1 are passed to the *
function, which returns -3.
As these examples show, functions are used for many of the things that require special syntax in other languages. This helps keep Lisp's syntax regular.
Special Operators
That said, not all operations can be defined as functions. Because all the arguments to a function are evaluated before the function is called, there's no way to write a function that behaves like the IF
operator you used in Chapter 3. To see why, consider this form:
(if x (format t "yes") (format t "no"))
If IF
were a function, the evaluator would evaluate the argument expressions from left to right. The symbol x
would be evaluated as a variable yielding some value; then (format t "yes")
would be evaluated as a function call, yielding NIL
after printing "yes" to standard output. Then (format t "no")
would be evaluated, printing "no" and also yielding NIL
. Only after all three expressions were evaluated would the resulting values be passed to IF
, too late for it to control which of the two FORMAT
expressions gets evaluated.
To solve this problem, Common Lisp defines a couple dozen so-called special operators, IF
being one, that do things that functions can't do. There are 25 in all, but only a small handful are used directly in day-to-day programming.[48]
When the first element of a list is a symbol naming a special operator, the rest of the expressions are evaluated according to the rule for that operator.
The rule for IF
is pretty easy: evaluate the first expression. If it evaluates to non-NIL
, then evaluate the next expression and return its value. Otherwise, return the value of evaluating the third expression or NIL
if the third expression is omitted. In other words, the basic form of an IF
expression is as follows:
(if test-form then-form [ else-form ])
The test-form will always be evaluated and then one or the other of the then-form or else-form.
An even simpler special operator is QUOTE
, which takes a single expression as its "argument" and simply returns it, unevaluated. For instance, the following evaluates to the list (+ 1 2)
, not the value 3:
(quote (+ 1 2))
There's nothing special about this list; you can manipulate it just like any list you could create with the LIST
function.[49]
QUOTE
is used commonly enough that a special syntax for it is built into the reader. Instead of writing the following:
(quote (+ 1 2))
you can write this:
'(+ 1 2)
This syntax is a small extension of the s-expression syntax understood by the reader. From the point of view of the evaluator, both those expressions will look the same: a list whose first element is the symbol QUOTE
and whose second element is the list (+ 1 2)
.[50]
In general, the special operators implement features of the language that require some special processing by the evaluator. For instance, several special operators manipulate the environment in which other forms will be evaluated. One of these, which I'll discuss in detail in Chapter 6, is LET
, which is used to create new variable bindings. The following form evaluates to 10 because the second x
is evaluated in an environment where it's the name of a variable established by the LET
with the value 10:
(let ((x 10)) x)
Macros
While special operators extend the syntax of Common Lisp beyond what can be expressed with just function calls, the set of special operators is fixed by the language standard. Macros, on the other hand, give users of the language a way to extend its syntax. As you saw in Chapter 3, a macro is a function that takes s-expressions as arguments and returns a Lisp form that's then evaluated in place of the macro form. The evaluation of a macro form proceeds in two phases: First, the elements of the macro form are passed, unevaluated, to the macro function. Second, the form returned by the macro function—called its expansion—is evaluated according to the normal evaluation rules.
It's important to keep the two phases of evaluating a macro form clear in your mind. It's easy to lose track when you're typing expressions at the REPL because the two phases happen one after another and the value of the second phase is immediately returned. But when Lisp code is compiled, the two phases happen at completely different times, so it's important to keep clear what's happening when. For instance, when you compile a whole file of source code with the function COMPILE-FILE
, all the macro forms in the file are recursively expanded until the code consists of nothing but function call forms and special forms. This macroless code is then compiled into a FASL file that the LOAD
function knows how to load. The compiled code, however, isn't executed until the file is loaded. Because macros generate their expansion at compile time, they can do relatively large amounts of work generating their expansion without having to pay for it when the file is loaded or the functions defined in the file are called.
Since the evaluator doesn't evaluate the elements of the macro form before passing them to the macro function, they don't need to be well-formed Lisp forms. Each macro assigns a meaning to the s-expressions in the macro form by virtue of how it uses them to generate its expansion. In other words, each macro defines its own local syntax. For instance, the backwards
macro from Chapter 3 defines a syntax in which an expression is a legal backwards
form if it's a list that's the reverse of a legal Lisp form.
I'll talk quite a bit more about macros throughout this book. For now the important thing for you to realize is that macros—while syntactically similar to function calls—serve quite a different purpose, providing a hook into the compiler.[51]
Truth, Falsehood, and Equality
Two last bits of basic knowledge you need to get under your belt are Common Lisp's notion of truth and falsehood and what it means for two Lisp objects to be "equal." Truth and falsehood are—in this realm—straightforward: the symbol NIL
is the only false value, and everything else is true. The symbol T
is the canonical true value and can be used when you need to return a non-NIL
value and don't have anything else handy. The only tricky thing about NIL
is that it's the only object that's both an atom and a list: in addition to falsehood, it's also used to represent the empty list.[52] This equivalence between NIL
and the empty list is built into the reader: if the reader sees ()
, it reads it as the symbol NIL
. They're completely interchangeable. And because NIL
, as I mentioned previously, is the name of a constant variable with the symbol NIL
as its value, the expressions nil
, ()
, 'nil
, and '()
all evaluate to the same thing—the unquoted forms are evaluated as a reference to the constant variable whose value is the symbol NIL
, but in the quoted forms the QUOTE
special operator evaluates to the symbol directly. For the same reason, both t
and 't
will evaluate to the same thing: the symbol T
.
Using phrases such as "the same thing" of course begs the question of what it means for two values to be "the same." As you'll see in future chapters, Common Lisp provides a number of type-specific equality predicates: =
is used to compare numbers, CHAR=
to compare characters, and so on. In this section I'll discuss the four "generic" equality predicates—functions that can be passed any two Lisp objects and will return true if they're equivalent and false otherwise. They are, in order of discrimination, EQ
, EQL
, EQUAL
, and EQUALP
.
EQ
tests for "object identity"—two objects are EQ
if they're identical. Unfortunately, the object identity of numbers and characters depends on how those data types are implemented in a particular Lisp. Thus, EQ
may consider two numbers or two characters with the same value to be equivalent, or it may not. Implementations have enough leeway that the expression (eq 3 3)
can legally evaluate to either true or false. More to the point, (eq x x)
can evaluate to either true or false if the value of x
happens to be a number or character.
Thus, you should never use EQ
to compare values that may be numbers or characters. It may seem to work in a predictable way for certain values in a particular implementation, but you have no guarantee that it will work the same way if you switch implementations. And switching implementations may mean simply upgrading your implementation to a new version—if your Lisp implementer changes how they represent numbers or characters, the behavior of EQ
could very well change as well.
Thus, Common Lisp defines EQL
to behave like EQ
except that it also is guaranteed to consider two objects of the same class representing the same numeric or character value to be equivalent. Thus, (eql 1 1)
is guaranteed to be true. And (eql 1 1.0)
is guaranteed to be false since the integer value 1 and the floating-point value are instances of different classes.
There are two schools of thought about when to use EQ
and when to use EQL
: The "use EQ
when possible" camp argues you should use EQ
when you know you aren't going to be com-paring numbers or characters because (a) it's a way to indicate that you aren't going to be comparing numbers or characters and (b) it will be marginally more efficient since EQ
doesn't have to check whether its arguments are numbers or characters.
The "always use EQL
" camp says you should never use EQ
because (a) the potential gain in clarity is lost because every time someone reading your code—including you—sees an EQ
, they have to stop and check whether it's being used correctly (in other words, that it's never going to be called upon to compare numbers or characters) and (b) that the efficiency difference between EQ
and EQL
is in the noise compared to real performance bottlenecks.
The code in this book is written in the "always use EQL
" style.[53]
The other two equality predicates, EQUAL
and EQUALP
, are general in the sense that they can operate on all types of objects, but they're much less fundamental than EQ
or EQL
. They each define a slightly less discriminating notion of equivalence than EQL
, allowing different objects to be considered equivalent. There's nothing special about the particular notions of equivalence these functions implement except that they've been found to be handy by Lisp programmers in the past. If these predicates don't suit your needs, you can always define your own predicate function that compares different types of objects in the way you need.
EQUAL
loosens the discrimination of EQL
to consider lists equivalent if they have the same structure and contents, recursively, according to EQUAL
. EQUAL
also considers strings equivalent if they contain the same characters. It also defines a looser definition of equivalence than EQL
for bit vectors and pathnames, two data types I'll discuss in future chapters. For all other types, it falls back on EQL
.
EQUALP
is similar to EQUAL
except it's even less discriminating. It considers two strings equivalent if they contain the same characters, ignoring differences in case. It also considers two characters equivalent if they differ only in case. Numbers are equivalent under EQUALP
if they represent the same mathematical value. Thus, (equalp 1 1.0)
is true. Lists with EQUALP
elements are EQUALP
; likewise, arrays with EQUALP
elements are EQUALP
. As with EQUAL
, there are a few other data types that I haven't covered yet for which EQUALP
can consider two objects equivalent that neither EQL
nor EQUAL
will. For all other data types, EQUALP
falls back on EQL
.
Formatting Lisp Code
While code formatting is, strictly speaking, neither a syntactic nor a semantic matter, proper formatting is important to reading and writing code fluently and idiomatically. The key to formatting Lisp code is to indent it properly. The indentation should reflect the structure of the code so that you don't need to count parentheses to see what goes with what. In general, each new level of nesting gets indented a bit more, and, if line breaks are necessary, items at the same level of nesting are lined up. Thus, a function call that needs to be broken up across multiple lines might be written like this:
(some-function arg-with-a-long-name
another-arg-with-an-even-longer-name)
Macro and special forms that implement control constructs are typically indented a little differently: the "body" elements are indented two spaces relative to the opening parenthesis of the form. Thus:
(defun print-list (list)
(dolist (i list)
(format t "item: ~a~%" i)))
However, you don't need to worry too much about these rules because a proper Lisp environment such as SLIME will take care of it for you. In fact, one of the advantages of Lisp's regular syntax is that it's fairly easy for software such as editors to know how to indent it. Since the indentation is supposed to reflect the structure of the code and the structure is marked by parentheses, it's easy to let the editor indent your code for you.
In SLIME, hitting Tab at the beginning of each line will cause it to be indented appropriately, or you can re-indent a whole expression by positioning the cursor on the opening parenthesis and typing C-M-q
. Or you can re-indent the whole body of a function from anywhere within it by typing C-c M-q
.
Indeed, experienced Lisp programmers tend to rely on their editor handling indenting automatically, not just to make their code look nice but to detect typos: once you get used to how code is supposed to be indented, a misplaced parenthesis will be instantly recognizable by the weird indentation your editor gives you. For example, suppose you were writing a function that was supposed to look like this:
(defun foo ()
(if (test)
(do-one-thing)
(do-another-thing)))
Now suppose you accidentally left off the closing parenthesis after test
. Because you don't bother counting parentheses, you quite likely would have added an extra parenthesis at the end of the DEFUN
form, giving you this code:
(defun foo ()
(if (test
(do-one-thing)
(do-another-thing))))
However, if you had been indenting by hitting Tab at the beginning of each line, you wouldn't have code like that. Instead you'd have this:
(defun foo ()
(if (test
(do-one-thing)
(do-another-thing))))
Seeing the then and else clauses indented way out under the condition rather than just indented slightly relative to the IF
shows you immediately that something is awry.
Another important formatting rule is that closing parentheses are always put on the same line as the last element of the list they're closing. That is, don't write this:
(defun foo ()
(dotimes (i 10)
(format t "~d. hello~%" i)
)
)
but instead write this:
(defun foo ()
(dotimes (i 10)
(format t "~d. hello~%" i)))
The string of )))
s at the end may seem forbidding, but as long your code is properly indented the parentheses should fade away—no need to give them undue prominence by spreading them across several lines.
Finally, comments should be prefaced with one to four semicolons depending on the scope of the comment as follows:
;;;; Four semicolons are used for a file header comment.
;;; A comment with three semicolons will usually be a paragraph
;;; comment that applies to a large section of code that follows,
(defun foo (x)
(dotimes (i x)
;; Two semicolons indicate this comment applies to the code
;; that follows. Note that this comment is indented the same
;; as the code that follows.
(some-function-call)
(another i) ; this comment applies to this line only
(and-another) ; and this is for this line
(baz)))
Now you're ready to start looking in greater detail at the major building blocks of Lisp programs, functions, variables, and macros. Up next: functions.
5. Functions
After the rules of syntax and semantics, the three most basic components of all Lisp programs are functions, variables and macros. You used all three while building the database in Chapter 3, but I glossed over a lot of the details of how they work and how to best use them. I'll devote the next few chapters to these three topics, starting with functions, which—like their counterparts in other languages—provide the basic mechanism for abstracting, well, functionality.
The bulk of Lisp itself consists of functions. More than three quarters of the names defined in the language standard name functions. All the built-in data types are defined purely in terms of what functions operate on them. Even Lisp's powerful object system is built upon a conceptual extension to functions, generic functions, which I'll cover in Chapter 16.
And, despite the importance of macros to The Lisp Way, in the end all real functionality is provided by functions. Macros run at compile time, so the code they generate—the code that will actually make up the program after all the macros are expanded—will consist entirely of calls to functions and special operators. Not to mention, macros themselves are also functions, albeit functions that are used to generate code rather than to perform the actions of the program.[54]
Defining New Functions
Normally functions are defined using the DEFUN
macro. The basic skeleton of a DEFUN
looks like this:
(defun name (parameter*)
"Optional documentation string."
body-form*)
Any symbol can be used as a function name.[55] Usually function names contain only alphabetic characters and hyphens, but other characters are allowed and are used in certain naming conventions. For instance, functions that convert one kind of value to another sometimes use ->
in the name. For example, a function to convert strings to widgets might be called string->widget
. The most important naming convention is the one mentioned in Chapter 2, which is that you construct compound names with hyphens rather than underscores or inner caps. Thus, frob-widget
is better Lisp style than either frob_widget
or frobWidget
.
A function's parameter list defines the variables that will be used to hold the arguments passed to the function when it's called.[56] If the function takes no arguments, the list is empty, written as ()
. Different flavors of parameters handle required, optional, multiple, and keyword arguments. I'll discuss the details in the next section.
If a string literal follows the parameter list, it's a documentation string that should describe the purpose of the function. When the function is defined, the documentation string will be associated with the name of the function and can later be obtained using the DOCUMENTATION
function.[57]
Finally, the body of a DEFUN
consists of any number of Lisp expressions. They will be evaluated in order when the function is called and the value of the last expression is returned as the value of the function. Or the RETURN-FROM
special operator can be used to return immediately from anywhere in a function, as I'll discuss in a moment.
In Chapter 2 we wrote a hello-world
function, which looked like this:
(defun hello-world () (format t "hello, world"))
You can now analyze the parts of this function. Its name is hello-world
, its parameter list is empty so it takes no arguments, it has no documentation string, and its body consists of one expression.
(format t "hello, world")
The following is a slightly more complex function:
(defun verbose-sum (x y)
"Sum any two numbers after printing a message."
(format t "Summing ~d and ~d.~%" x y)
(+ x y))
This function is named verbose-sum
, takes two arguments that will be bound to the parameters x
and y
, has a documentation string, and has a body consisting of two expressions. The value returned by the call to +
becomes the return value of verbose-sum
.
Function Parameter Lists
There's not a lot more to say about function names or documentation strings, and it will take a good portion of the rest of this book to describe all the things you can do in the body of a function, which leaves us with the parameter list.
The basic purpose of a parameter list is, of course, to declare the variables that will receive the arguments passed to the function. When a parameter list is a simple list of variable names—as in verbose-sum
—the parameters are called required parameters. When a function is called, it must be supplied with one argument for every required parameter. Each parameter is bound to the corresponding argument. If a function is called with too few or too many arguments, Lisp will signal an error.
However, Common Lisp's parameter lists also give you more flexible ways of mapping the arguments in a function call to the function's parameters. In addition to required parameters, a function can have optional parameters. Or a function can have a single parameter that's bound to a list containing any extra arguments. And, finally, arguments can be mapped to parameters using keywords rather than position. Thus, Common Lisp's parameter lists provide a convenient solution to several common coding problems.
Optional Parameters
While many functions, like verbose-sum
, need only required parameters, not all functions are quite so simple. Sometimes a function will have a parameter that only certain callers will care about, perhaps because there's a reasonable default value. An example is a function that creates a data structure that can grow as needed. Since the data structure can grow, it doesn't matter—from a correctness point of view—what the initial size is. But callers who have a good idea how many items they're going to put into the data structure may be able to improve performance by specifying a specific initial size. Most callers, though, would probably rather let the code that implements the data structure pick a good general-purpose value. In Common Lisp you can accommodate both kinds of callers by using an optional parameter; callers who don't care will get a reasonable default, and other callers can provide a specific value.[58]
To define a function with optional parameters, after the names of any required parameters, place the symbol &optional
followed by the names of the optional parameters. A simple example looks like this:
(defun foo (a b &optional c d) (list a b c d))
When the function is called, arguments are first bound to the required parameters. After all the required parameters have been given values, if there are any arguments left, their values are assigned to the optional parameters. If the arguments run out before the optional parameters do, the remaining optional parameters are bound to the value NIL
. Thus, the function defined previously gives the following results:
(foo 1 2) ==> (1 2 NIL NIL)
(foo 1 2 3) ==> (1 2 3 NIL)
(foo 1 2 3 4) ==> (1 2 3 4)
Lisp will still check that an appropriate number of arguments are passed to the function—in this case between two and four, inclusive—and will signal an error if the function is called with too few or too many.
Of course, you'll often want a different default value than NIL
. You can specify the default value by replacing the parameter name with a list containing a name and an expression. The expression will be evaluated only if the caller doesn't pass enough arguments to provide a value for the optional parameter. The common case is simply to provide a value as the expression.
(defun foo (a &optional (b 10)) (list a b))
This function requires one argument that will be bound to the parameter a
. The second parameter, b
, will take either the value of the second argument, if there is one, or 10.
(foo 1 2) ==> (1 2)
(foo 1) ==> (1 10)
Sometimes, however, you may need more flexibility in choosing the default value. You may want to compute a default value based on other parameters. And you can—the default-value expression can refer to parameters that occur earlier in the parameter list. If you were writing a function that returned some sort of representation of a rectangle and you wanted to make it especially convenient to make squares, you might use an argument list like this:
(defun make-rectangle (width &optional (height width)) ...)
which would cause the height
parameter to take the same value as the width
parameter unless explicitly specified.
Occasionally, it's useful to know whether the value of an optional argument was supplied by the caller or is the default value. Rather than writing code to check whether the value of the parameter is the default (which doesn't work anyway, if the caller happens to explicitly pass the default value), you can add another variable name to the parameter specifier after the default-value expression. This variable will be bound to true if the caller actually supplied an argument for this parameter and NIL
otherwise. By convention, these variables are usually named the same as the actual parameter with a "-supplied-p" on the end. For example:
(defun foo (a b &optional (c 3 c-supplied-p))
(list a b c c-supplied-p))
This gives results like this:
(foo 1 2) ==> (1 2 3 NIL)
(foo 1 2 3) ==> (1 2 3 T)
(foo 1 2 4) ==> (1 2 4 T)
Rest Parameters
Optional parameters are just the thing when you have discrete parameters for which the caller may or may not want to provide values. But some functions need to take a variable number of arguments. Several of the built-in functions you've seen already work this way. FORMAT
has two required arguments, the stream and the control string. But after that it needs a variable number of arguments depending on how many values need to be interpolated into the control string. The +
function also takes a variable number of arguments—there's no particular reason to limit it to summing just two numbers; it will sum any number of values. (It even works with zero arguments, returning 0, the identity under addition.) The following are all legal calls of those two functions:
(format t "hello, world")
(format t "hello, ~a" name)
(format t "x: ~d y: ~d" x y)
(+)
(+ 1)
(+ 1 2)
(+ 1 2 3)
Obviously, you could write functions taking a variable number of arguments by simply giving them a lot of optional parameters. But that would be incredibly painful—just writing the parameter list would be bad enough, and that doesn't get into dealing with all the parameters in the body of the function. To do it properly, you'd have to have as many optional parameters as the number of arguments that can legally be passed in a function call. This number is implementation dependent but guaranteed to be at least 50. And in current implementations it ranges from 4,096 to 536,870,911.[59] Blech. That kind of mind-bending tedium is definitely not The Lisp Way.
Instead, Lisp lets you include a catchall parameter after the symbol &rest
. If a function includes a &rest
parameter, any arguments remaining after values have been doled out to all the required and optional parameters are gathered up into a list that becomes the value of the &rest
parameter. Thus, the parameter lists for FORMAT
and +
probably look something like this:
(defun format (stream string &rest values) ...)
(defun + (&rest numbers) ...)
Keyword Parameters
Optional and rest parameters give you quite a bit of flexibility, but neither is going to help you out much in the following situation: Suppose you have a function that takes four optional parameters. Now suppose that most of the places the function is called, the caller wants to provide a value for only one of the four parameters and, further, that the callers are evenly divided as to which parameter they will use.
The callers who want to provide a value for the first parameter are fine—they just pass the one optional argument and leave off the rest. But all the other callers have to pass some value for between one and three arguments they don't care about. Isn't that exactly the problem optional parameters were designed to solve?
Of course it is. The problem is that optional parameters are still positional—if the caller wants to pass an explicit value for the fourth optional parameter, it turns the first three optional parameters into required parameters for that caller. Luckily, another parameter flavor, keyword parameters, allow the caller to specify which values go with which parameters.
To give a function keyword parameters, after any required, &optional
, and &rest
parameters you include the symbol &key
and then any number of keyword parameter specifiers, which work like optional parameter specifiers. Here's a function that has only keyword parameters:
(defun foo (&key a b c) (list a b c))
When this function is called, each keyword parameters is bound to the value immediately following a keyword of the same name. Recall from Chapter 4 that keywords are names that start with a colon and that they're automatically defined as self-evaluating constants.
If a given keyword doesn't appear in the argument list, then the corresponding parameter is assigned its default value, just like an optional parameter. Because the keyword arguments are labeled, they can be passed in any order as long as they follow any required arguments. For instance, foo
can be invoked as follows:
(foo) ==> (NIL NIL NIL)
(foo :a 1) ==> (1 NIL NIL)
(foo :b 1) ==> (NIL 1 NIL)
(foo :c 1) ==> (NIL NIL 1)
(foo :a 1 :c 3) ==> (1 NIL 3)
(foo :a 1 :b 2 :c 3) ==> (1 2 3)
(foo :a 1 :c 3 :b 2) ==> (1 2 3)
As with optional parameters, keyword parameters can provide a default value form and the name of a supplied-p variable. In both keyword and optional parameters, the default value form can refer to parameters that appear earlier in the parameter list.
(defun foo (&key (a 0) (b 0 b-supplied-p) (c (+ a b)))
(list a b c b-supplied-p))
(foo :a 1) ==> (1 0 1 NIL)
(foo :b 1) ==> (0 1 1 T)
(foo :b 1 :c 4) ==> (0 1 4 T)
(foo :a 2 :b 1 :c 4) ==> (2 1 4 T)
Also, if for some reason you want the keyword the caller uses to specify the parameter to be different from the name of the actual parameter, you can replace the parameter name with another list containing the keyword to use when calling the function and the name to be used for the parameter. The following definition of foo
:
(defun foo (&key ((:apple a)) ((:box b) 0) ((:charlie c) 0 c-supplied-p))
(list a b c c-supplied-p))
lets the caller call it like this:
(foo :apple 10 :box 20 :charlie 30) ==> (10 20 30 T)
This style is mostly useful if you want to completely decouple the public API of the function from the internal details, usually because you want to use short variable names internally but descriptive keywords in the API. It's not, however, very frequently used.
Mixing Different Parameter Types
It's possible, but rare, to use all four flavors of parameters in a single function. Whenever more than one flavor of parameter is used, they must be declared in the order I've discussed them: first the names of the required parameters, then the optional parameters, then the rest parameter, and finally the keyword parameters. Typically, however, in functions that use multiple flavors of parameters, you'll combine required parameters with one other flavor or possibly combine &optional
and &rest
parameters. The other two combinations, either &optional
or &rest
parameters combined with &key
parameters, can lead to somewhat surprising behavior.
Combining &optional
and &key
parameters yields surprising enough results that you should probably avoid it altogether. The problem is that if a caller doesn't supply values for all the optional parameters, then those parameters will eat up the keywords and values intended for the keyword parameters. For instance, this function unwisely mixes &optional
and &key
parameters:
(defun foo (x &optional y &key z) (list x y z))
If called like this, it works fine:
(foo 1 2 :z 3) ==> (1 2 3)
And this is also fine:
(foo 1) ==> (1 nil nil)
But this will signal an error:
(foo 1 :z 3) ==> ERROR
This is because the keyword :z
is taken as a value to fill the optional y
parameter, leaving only the argument 3 to be processed. At that point, Lisp will be expecting either a keyword/value pair or nothing and will complain. Perhaps even worse, if the function had had two &optional
parameters, this last call would have resulted in the values :z
and 3 being bound to the two &optional
parameters and the &key
parameter z
getting the default value NIL
with no indication that anything was amiss.
In general, if you find yourself writing a function that uses both &optional
and &key
parameters, you should probably just change it to use all &key
parameters—they're more flexible, and you can always add new keyword parameters without disturbing existing callers of the function. You can also remove keyword parameters, as long as no one is using them.[60] In general, using keyword parameters helps make code much easier to maintain and evolve—if you need to add some new behavior to a function that requires new parameters, you can add keyword parameters without having to touch, or even recompile, any existing code that calls the function.
You can safely combine &rest
and &key
parameters, but the behavior may be a bit surprising initially. Normally the presence of either &rest
or &key
in a parameter list causes all the values remaining after the required and &optional
parameters have been filled in to be processed in a particular way—either gathered into a list for a &rest
parameter or assigned to the appropriate &key
parameters based on the keywords. If both &rest
and &key
appear in a parameter list, then both things happen—all the remaining values, which include the keywords themselves, are gathered into a list that's bound to the &rest
parameter, and the appropriate values are also bound to the &key
parameters. So, given this function:
(defun foo (&rest rest &key a b c) (list rest a b c))
you get this result:
(foo :a 1 :b 2 :c 3) ==> ((:A 1 :B 2 :C 3) 1 2 3)
Function Return Values
All the functions you've written so far have used the default behavior of returning the value of the last expression evaluated as their own return value. This is the most common way to return a value from a function.
However, sometimes it's convenient to be able to return from the middle of a function such as when you want to break out of nested control constructs. In such cases you can use the RETURN-FROM
special operator to immediately return any value from the function.
You'll see in Chapter 20 that RETURN-FROM
is actually not tied to functions at all; it's used to return from a block of code defined with the BLOCK
special operator. However, DEFUN
automatically wraps the whole function body in a block with the same name as the function. So, evaluating a RETURN-FROM
with the name of the function and the value you want to return will cause the function to immediately exit with that value. RETURN-FROM
is a special operator whose first "argument" is the name of the block from which to return. This name isn't evaluated and thus isn't quoted.
The following function uses nested loops to find the first pair of numbers, each less than 10, whose product is greater than the argument, and it uses RETURN-FROM
to return the pair as soon as it finds it:
(defun foo (n)
(dotimes (i 10)
(dotimes (j 10)
(when (> (* i j) n)
(return-from foo (list i j))))))
Admittedly, having to specify the name of the function you're returning from is a bit of a pain—for one thing, if you change the function's name, you'll need to change the name used in the RETURN-FROM
as well.[61] But it's also the case that explicit RETURN-FROM
s are used much less frequently in Lisp than return
statements in C-derived languages, because all Lisp expressions, including control constructs such as loops and conditionals, evaluate to a value. So it's not much of a problem in practice.
Functions As Data, a.k.a. Higher-Order Functions
While the main way you use functions is to call them by name, a number of situations exist where it's useful to be able treat functions as data. For instance, if you can pass one function as an argument to another, you can write a general-purpose sorting function while allowing the caller to provide a function that's responsible for comparing any two elements. Then the same underlying algorithm can be used with many different comparison functions. Similarly, callbacks and hooks depend on being able to store references to code in order to run it later. Since functions are already the standard way to abstract bits of code, it makes sense to allow functions to be treated as data.[62]
In Lisp, functions are just another kind of object. When you define a function with DEFUN
, you're really doing two things: creating a new function object and giving it a name. It's also possible, as you saw in Chapter 3, to use LAMBDA
expressions to create a function without giving it a name. The actual representation of a function object, whether named or anonymous, is opaque—in a native-compiling Lisp, it probably consists mostly of machine code. The only things you need to know are how to get hold of it and how to invoke it once you've got it.
The special operator FUNCTION
provides the mechanism for getting at a function object. It takes a single argument and returns the function with that name. The name isn't quoted. Thus, if you've defined a function foo
, like so:
CL-USER> (defun foo (x) (* 2 x))
FOO
you can get the function object like this:[63]
CL-USER> (function foo)
#<Interpreted Function FOO>
In fact, you've already used FUNCTION
, but it was in disguise. The syntax #'
, which you used in Chapter 3, is syntactic sugar for FUNCTION
, just the way '
is syntactic sugar for QUOTE
.[64] Thus, you can also get the function object for foo
like this:
CL-USER> #'foo
#<Interpreted Function FOO>
Once you've got the function object, there's really only one thing you can do with it—invoke it. Common Lisp provides two functions for invoking a function through a function object: FUNCALL
and APPLY
.[65] They differ only in how they obtain the arguments to pass to the function.
FUNCALL
is the one to use when you know the number of arguments you're going to pass to the function at the time you write the code. The first argument to FUNCALL
is the function object to be invoked, and the rest of the arguments are passed onto that function. Thus, the following two expressions are equivalent:
(foo 1 2 3) === (funcall #'foo 1 2 3)
However, there's little point in using FUNCALL
to call a function whose name you know when you write the code. In fact, the previous two expressions will quite likely compile to exactly the same machine instructions.
The following function demonstrates a more apt use of FUNCALL
. It accepts a function object as an argument and plots a simple ASCII-art histogram of the values returned by the argument function when it's invoked on the values from min
to max
, stepping by step
.
(defun plot (fn min max step)
(loop for i from min to max by step do
(loop repeat (funcall fn i) do (format t "*"))
(format t "~%")))
The FUNCALL
expression computes the value of the function for each value of i
. The inner LOOP
uses that computed value to determine how many times to print an asterisk to standard output.
Note that you don't use FUNCTION
or #'
to get the function value of fn
; you want it to be interpreted as a variable because it's the variable's value that will be the function object. You can call plot
with any function that takes a single numeric argument, such as the built-in function EXP
that returns the value of e raised to the power of its argument.
CL-USER> (plot #'exp 0 4 1/2)
*
*
**
****
*******
************
********************
*********************************
******************************************************
NIL
FUNCALL
, however, doesn't do you any good when the argument list is known only at runtime. For instance, to stick with the plot
function for another moment, suppose you've obtained a list containing a function object, a minimum and maximum value, and a step value. In other words, the list contains the values you want to pass as arguments to plot
. Suppose this list is in the variable plot-data
. You could invoke plot
on the values in that list like this:
(plot (first plot-data) (second plot-data) (third plot-data) (fourth plot-data))
This works fine, but it's pretty annoying to have to explicitly unpack the arguments just so you can pass them to plot
.
That's where APPLY
comes in. Like FUNCALL
, the first argument to APPLY
is a function object. But after the function object, instead of individual arguments, it expects a list. It then applies the function to the values in the list. This allows you to write the following instead:
(apply #'plot plot-data)
As a further convenience, APPLY
can also accept "loose" arguments as long as the last argument is a list. Thus, if plot-data
contained just the min, max, and step values, you could still use APPLY
like this to plot the EXP
function over that range:
(apply #'plot #'exp plot-data)
APPLY
doesn't care about whether the function being applied takes &optional
, &rest
, or &key
arguments—the argument list produced by combining any loose arguments with the final list must be a legal argument list for the function with enough arguments for all the required parameters and only appropriate keyword parameters.
Anonymous Functions
Once you start writing, or even simply using, functions that accept other functions as arguments, you're bound to discover that sometimes it's annoying to have to define and name a whole separate function that's used in only one place, especially when you never call it by name.
When it seems like overkill to define a new function with DEFUN
, you can create an "anonymous" function using a LAMBDA
expression. As discussed in Chapter 3, a LAMBDA
expression looks like this:
(lambda (parameters) body)
One way to think of LAMBDA
expressions is as a special kind of function name where the name itself directly describes what the function does. This explains why you can use a LAMBDA
expression in the place of a function name with #'
.
(funcall #'(lambda (x y) (+ x y)) 2 3) ==> 5
You can even use a LAMBDA
expression as the "name" of a function in a function call expression. If you wanted, you could write the previous FUNCALL
expression more concisely.
((lambda (x y) (+ x y)) 2 3) ==> 5
But this is almost never done; it's merely worth noting that it's legal in order to emphasize that LAMBDA
expressions can be used anywhere a normal function name can be.[66]
Anonymous functions can be useful when you need to pass a function as an argument to another function and the function you need to pass is simple enough to express inline. For instance, suppose you wanted to plot the function 2x. You could define the following function:
(defun double (x) (* 2 x))
which you could then pass to plot
.
CL-USER> (plot #'double 0 10 1)
**
****
******
********
**********
************
**************
****************
******************
********************
NIL
But it's easier, and arguably clearer, to write this:
CL-USER> (plot #'(lambda (x) (* 2 x)) 0 10 1)
**
****
******
********
**********
************
**************
****************
******************
********************
NIL
The other important use of LAMBDA expressions is in making closures, functions that capture part of the environment where they're created. You used closures a bit in Chapter 3, but the details of how closures work and what they're used for is really more about how variables work than functions, so I'll save that discussion for the next chapter.
6. Variables
The next basic building block we need to look at are variables. Common Lisp supports two kinds of variables: lexical and dynamic.[67] These two types correspond roughly to "local" and "global" variables in other languages. However, the correspondence is only approximate. On one hand, some languages' "local" variables are in fact much like Common Lisp's dynamic variables.[68] And on the other, some languages' local variables are lexically scoped without providing all the capabilities provided by Common Lisp's lexical variables. In particular, not all languages that provide lexically scoped variables support closures.
To make matters a bit more confusing, many of the forms that deal with variables can be used with both lexical and dynamic variables. So I'll start by discussing a few aspects of Lisp's variables that apply to both kinds and then cover the specific characteristics of lexical and dynamic variables. Then I'll discuss Common Lisp's general-purpose assignment operator, SETF
, which is used to assign new values to variables and just about every other place that can hold a value.
Variable Basics
As in other languages, in Common Lisp variables are named places that can hold a value. However, in Common Lisp, variables aren't typed the way they are in languages such as Java or C++. That is, you don't need to declare the type of object that each variable can hold. Instead, a variable can hold values of any type and the values carry type information that can be used to check types at runtime. Thus, Common Lisp is dynamically typed—type errors are detected dynamically. For instance, if you pass something other than a number to the +
function, Common Lisp will signal a type error. On the other hand, Common Lisp is a strongly typed language in the sense that all type errors will be detected—there's no way to treat an object as an instance of a class that it's not.[69]
All values in Common Lisp are, conceptually at least, references to objects.[70] Consequently, assigning a variable a new value changes what object the variable refers to but has no effect on the previously referenced object. However, if a variable holds a reference to a mutable object, you can use that reference to modify the object, and the modification will be visible to any code that has a reference to the same object.
One way to introduce new variables you've already used is to define function parameters. As you saw in the previous chapter, when you define a function with DEFUN
, the parameter list defines the variables that will hold the function's arguments when it's called. For example, this function defines three variables—x
, y
, and z
—to hold its arguments.
(defun foo (x y z) (+ x y z))
Each time a function is called, Lisp creates new bindings to hold the arguments passed by the function's caller. A binding is the runtime manifestation of a variable. A single variable—the thing you can point to in the program's source code—can have many different bindings during a run of the program. A single variable can even have multiple bindings at the same time; parameters to a recursive function, for example, are rebound for each call to the function.
As with all Common Lisp variables, function parameters hold object references.[71] Thus, you can assign a new value to a function parameter within the body of the function, and it will not affect the bindings created for another call to the same function. But if the object passed to a function is mutable and you change it in the function, the changes will be visible to the caller since both the caller and the callee will be referencing the same object.
Another form that introduces new variables is the LET
special operator. The skeleton of a LET
form looks like this:
(let (variable*)
body-form*)
where each variable is a variable initialization form. Each initialization form is either a list containing a variable name and an initial value form or—as a shorthand for initializing the variable to NIL
—a plain variable name. The following LET
form, for example, binds the three variables x
, y
, and z
with initial values 10, 20, and NIL
:
(let ((x 10) (y 20) z)
...)
When the LET
form is evaluated, all the initial value forms are first evaluated. Then new bindings are created and initialized to the appropriate initial values before the body forms are executed. Within the body of the LET
, the variable names refer to the newly created bindings. After the LET
, the names refer to whatever, if anything, they referred to before the LET
.
The value of the last expression in the body is returned as the value of the LET
expression. Like function parameters, variables introduced with LET
are rebound each time the LET
is entered.[72]
The scope of function parameters and LET
variables—the area of the program where the variable name can be used to refer to the variable's binding—is delimited by the form that introduces the variable. This form—the function definition or the LET
—is called the binding form. As you'll see in a bit, the two types of variables—lexical and dynamic—use two slightly different scoping mechanisms, but in both cases the scope is delimited by the binding form.
If you nest binding forms that introduce variables with the same name, then the bindings of the innermost variable shadows the outer bindings. For instance, when the following function is called, a binding is created for the parameter x
to hold the function's argument. Then the first LET
creates a new binding with the initial value 2, and the inner LET
creates yet another binding, this one with the initial value 3. The bars on the right mark the scope of each binding.
(defun foo (x)
(format t "Parameter: ~a~%" x) ; |<——— x is argument
(let ((x 2)) ; |
(format t "Outer LET: ~a~%" x) ; | |<—— x is 2
(let ((x 3)) ; | |
(format t "Inner LET: ~a~%" x)) ; | | |<— x is 3
(format t "Outer LET: ~a~%" x)) ; | |
(format t "Parameter: ~a~%" x)) ; |
Each reference to x
will refer to the binding with the smallest enclosing scope. Once control leaves the scope of one binding form, the binding from the immediately enclosing scope is unshadowed and x
refers to it instead. Thus, calling foo
results in this output:
CL-USER> (foo 1)
Parameter: 1
Outer LET: 2
Inner LET: 3
Outer LET: 2
Parameter: 1
NIL
In future chapters I'll discuss other constructs that also serve as binding forms—any construct that introduces a new variable name that's usable only within the construct is a binding form.
For instance, in Chapter 7 you'll meet the DOTIMES
loop, a basic counting loop. It introduces a variable that holds the value of a counter that's incremented each time through the loop. The following loop, for example, which prints the numbers from 0 to 9, binds the variable x
:
(dotimes (x 10) (format t "~d " x))
Another binding form is a variant of LET
, LET*
. The difference is that in a LET
, the variable names can be used only in the body of the LET
—the part of the LET
after the variables list—but in a LET*
, the initial value forms for each variable can refer to variables introduced earlier in the variables list. Thus, you can write the following:
(let* ((x 10)
(y (+ x 10)))
(list x y))
but not this:
(let ((x 10)
(y (+ x 10)))
(list x y))
However, you could achieve the same result with nested LET
s.
(let ((x 10))
(let ((y (+ x 10)))
(list x y)))
Lexical Variables and Closures
By default all binding forms in Common Lisp introduce lexically scoped variables. Lexically scoped variables can be referred to only by code that's textually within the binding form. Lexical scoping should be familiar to anyone who has programmed in Java, C, Perl, or Python since they all provide lexically scoped "local" variables. For that matter, Algol programmers should also feel right at home, as Algol first introduced lexical scoping in the 1960s.
However, Common Lisp's lexical variables are lexical variables with a twist, at least compared to the original Algol model. The twist is provided by the combination of lexical scoping with nested functions. By the rules of lexical scoping, only code textually within the binding form can refer to a lexical variable. But what happens when an anonymous function contains a reference to a lexical variable from an enclosing scope? For instance, in this expression:
(let ((count 0)) #'(lambda () (setf count (1+ count))))
the reference to count
inside the LAMBDA
form should be legal according to the rules of lexical scoping. Yet the anonymous function containing the reference will be returned as the value of the LET
form and can be invoked, via FUNCALL
, by code that's not in the scope of the LET
. So what happens? As it turns out, when count
is a lexical variable, it just works. The binding of count
created when the flow of control entered the LET
form will stick around for as long as needed, in this case for as long as someone holds onto a reference to the function object returned by the LET
form. The anonymous function is called a closure because it "closes over" the binding created by the LET
.
The key thing to understand about closures is that it's the binding, not the value of the variable, that's captured. Thus, a closure can not only access the value of the variables it closes over but can also assign new values that will persist between calls to the closure. For instance, you can capture the closure created by the previous expression in a global variable like this:
(defparameter *fn* (let ((count 0)) #'(lambda () (setf count (1+ count)))))
Then each time you invoke it, the value of count will increase by one.
CL-USER> (funcall *fn*)
1
CL-USER> (funcall *fn*)
2
CL-USER> (funcall *fn*)
3
A single closure can close over many variable bindings simply by referring to them. Or multiple closures can capture the same binding. For instance, the following expression returns a list of three closures, one that increments the value of the closed over count
binding, one that decrements it, and one that returns the current value:
(let ((count 0))
(list
#'(lambda () (incf count))
#'(lambda () (decf count))
#'(lambda () count)))
Dynamic, a.k.a. Special, Variables
Lexically scoped bindings help keep code understandable by limiting the scope, literally, in which a given name has meaning. This is why most modern languages use lexical scoping for local variables. Sometimes, however, you really want a global variable—a variable that you can refer to from anywhere in your program. While it's true that indiscriminate use of global variables can turn code into spaghetti nearly as quickly as unrestrained use of goto
, global variables do have legitimate uses and exist in one form or another in almost every programming language.[73] And as you'll see in a moment, Lisp's version of global variables, dynamic variables, are both more useful and more manageable.
Common Lisp provides two ways to create global variables: DEFVAR
and DEFPARAMETER
. Both forms take a variable name, an initial value, and an optional documentation string. After it has been DEFVAR
ed or DEFPARAMETER
ed, the name can be used anywhere to refer to the current binding of the global variable. As you've seen in previous chapters, global variables are conventionally named with names that start and end with *
. You'll see later in this section why it's quite important to follow that naming convention. Examples of DEFVAR
and DEFPARAMETER
look like this:
(defvar *count* 0
"Count of widgets made so far.")
(defparameter *gap-tolerance* 0.001
"Tolerance to be allowed in widget gaps.")
The difference between the two forms is that DEFPARAMETER
always assigns the initial value to the named variable while DEFVAR
does so only if the variable is undefined. A DEFVAR
form can also be used with no initial value to define a global variable without giving it a value. Such a variable is said to be unbound.
Practically speaking, you should use DEFVAR
to define variables that will contain data you'd want to keep even if you made a change to the source code that uses the variable. For instance, suppose the two variables defined previously are part of an application for controlling a widget factory. It's appropriate to define the *count*
variable with DEFVAR
because the number of widgets made so far isn't invalidated just because you make some changes to the widget-making code.[74]
On the other hand, the variable *gap-tolerance*
presumably has some effect on the behavior of the widget-making code itself. If you decide you need a tighter or looser tolerance and change the value in the DEFPARAMETER
form, you'd like the change to take effect when you recompile and reload the file.
After defining a variable with DEFVAR
or DEFPARAMETER
, you can refer to it from anywhere. For instance, you might define this function to increment the count of widgets made:
(defun increment-widget-count () (incf *count*))
The advantage of global variables is that you don't have to pass them around. Most languages store the standard input and output streams in global variables for exactly this reason—you never know when you're going to want to print something to standard out, and you don't want every function to have to accept and pass on arguments containing those streams just in case someone further down the line needs them.
However, once a value, such as the standard output stream, is stored in a global variable and you have written code that references that global variable, it's tempting to try to temporarily modify the behavior of that code by changing the variable's value.
For instance, suppose you're working on a program that contains some low-level logging functions that print to the stream in the global variable *standard-output*
. Now suppose that in part of the program you want to capture all the output generated by those functions into a file. You might open a file and assign the resulting stream to *standard-output*
. Now the low-level functions will send their output to the file.
This works fine until you forget to set *standard-output*
back to the original stream when you're done. If you forget to reset *standard-output*
, all the other code in the program that uses *standard-output*
will also send its output to the file.[75]
What you really want, it seems, is a way to wrap a piece of code in something that says, "All code below here—all the functions it calls, all the functions they call, and so on, down to the lowest-level functions—should use this value for the global variable *standard-output*
." Then when the high-level function returns, the old value of *standard-output*
should be automatically restored.
It turns out that that's exactly what Common Lisp's other kind of variable—dynamic variables—let you do. When you bind a dynamic variable—for example, with a LET
variable or a function parameter—the binding that's created on entry to the binding form replaces the global binding for the duration of the binding form. Unlike a lexical binding, which can be referenced by code only within the lexical scope of the binding form, a dynamic binding can be referenced by any code that's invoked during the execution of the binding form.[76] And it turns out that all global variables are, in fact, dynamic variables.
Thus, if you want to temporarily redefine *standard-output*
, the way to do it is simply to rebind it, say, with a LET
.
(let ((*standard-output* *some-other-stream*))
(stuff))
In any code that runs as a result of the call to stuff
, references to *standard-output*
will use the binding established by the LET
. And when stuff
returns and control leaves the LET
, the new binding of *standard-output*
will go away and subsequent references to *standard-output*
will see the binding that was current before the LET
. At any given time, the most recently established binding shadows all other bindings. Conceptually, each new binding for a given dynamic variable is pushed onto a stack of bindings for that variable, and references to the variable always use the most recent binding. As binding forms return, the bindings they created are popped off the stack, exposing previous bindings.[77]
A simple example shows how this works.
(defvar *x* 10)
(defun foo () (format t "X: ~d~%" *x*))
The DEFVAR
creates a global binding for the variable *x*
with the value 10. The reference to *x*
in foo
will look up the current binding dynamically. If you call foo
from the top level, the global binding created by the DEFVAR
is the only binding available, so it prints 10.
CL-USER> (foo)
X: 10
NIL
But you can use LET
to create a new binding that temporarily shadows the global binding, and foo
will print a different value.
CL-USER> (let ((*x* 20)) (foo))
X: 20
NIL
Now call foo
again, with no LET
, and it again sees the global binding.
CL-USER> (foo)
X: 10
NIL
Now define another function.
(defun bar ()
(foo)
(let ((*x* 20)) (foo))
(foo))
Note that the middle call to foo
is wrapped in a LET
that binds *x*
to the new value 20. When you run bar
, you get this result:
CL-USER> (bar)
X: 10
X: 20
X: 10
NIL
As you can see, the first call to foo
sees the global binding, with its value of 10. The middle call, however, sees the new binding, with the value 20. But after the LET
, foo
once again sees the global binding.
As with lexical bindings, assigning a new value affects only the current binding. To see this, you can redefine foo
to include an assignment to *x*
.
(defun foo ()
(format t "Before assignment~18tX: ~d~%" *x*)
(setf *x* (+ 1 *x*))
(format t "After assignment~18tX: ~d~%" *x*))
Now foo
prints the value of *x*
, increments it, and prints it again. If you just run foo
, you'll see this:
CL-USER> (foo)
Before assignment X: 10
After assignment X: 11
NIL
Not too surprising. Now run bar
.
CL-USER> (bar)
Before assignment X: 11
After assignment X: 12
Before assignment X: 20
After assignment X: 21
Before assignment X: 12
After assignment X: 13
NIL
Notice that *x*
started at 11—the earlier call to foo
really did change the global value. The first call to foo
from bar
increments the global binding to 12. The middle call doesn't see the global binding because of the LET
. Then the last call can see the global binding again and increments it from 12 to 13.
So how does this work? How does LET
know that when it binds *x*
it's supposed to create a dynamic binding rather than a normal lexical binding? It knows because the name has been declared special.[78] The name of every variable defined with DEFVAR
and DEFPARAMETER
is automatically declared globally special. This means whenever you use such a name in a binding form—in a LET
or as a function parameter or any other construct that creates a new variable binding—the binding that's created will be a dynamic binding. This is why the *naming* *convention*
is so important—it'd be bad news if you used a name for what you thought was a lexical variable and that variable happened to be globally special. On the one hand, code you call could change the value of the binding out from under you; on the other, you might be shadowing a binding established by code higher up on the stack. If you always name global variables according to the *
naming convention, you'll never accidentally use a dynamic binding where you intend to establish a lexical binding.
It's also possible to declare a name locally special. If, in a binding form, you declare a name special, then the binding created for that variable will be dynamic rather than lexical. Other code can locally declare a name special in order to refer to the dynamic binding. However, locally special variables are relatively rare, so you needn't worry about them.[79]
Dynamic bindings make global variables much more manageable, but it's important to notice they still allow action at a distance. Binding a global variable has two at a distance effects—it can change the behavior of downstream code, and it also opens the possibility that downstream code will assign a new value to a binding established higher up on the stack. You should use dynamic variables only when you need to take advantage of one or both of these characteristics.
Constants
One other kind of variable I haven't mentioned at all is the oxymoronic "constant variable." All constants are global and are defined with DEFCONSTANT
. The basic form of DEFCONSTANT
is like DEFPARAMETER
.
(defconstant name initial-value-form [ documentation-string ])
As with DEFVAR
and DEFPARAMETER
, DEFCONSTANT
has a global effect on the name used—thereafter the name can be used only to refer to the constant; it can't be used as a function parameter or rebound with any other binding form. Thus, many Lisp programmers follow a naming convention of using names starting and ending with +
for constants. This convention is somewhat less universally followed than the *
-naming convention for globally special names but is a good idea for the same reason.[80]
Another thing to note about DEFCONSTANT
is that while the language allows you to redefine a constant by reevaluating a DEFCONSTANT
with a different initial-value-form, what exactly happens after the redefinition isn't defined. In practice, most implementations will require you to reevaluate any code that refers to the constant in order to see the new value since the old value may well have been inlined. Consequently, it's a good idea to use DEFCONSTANT
only to define things that are really constant, such as the value of NIL. For things you might ever want to change, you should use DEFPARAMETER
instead.
Assignment
Once you've created a binding, you can do two things with it: get the current value and set it to a new value. As you saw in Chapter 4, a symbol evaluates to the value of the variable it names, so you can get the current value simply by referring to the variable. To assign a new value to a binding, you use the SETF
macro, Common Lisp's general-purpose assignment operator. The basic form of SETF
is as follows:
(setf place value)
Because SETF
is a macro, it can examine the form of the place it's assigning to and expand into appropriate lower-level operations to manipulate that place. When the place is a variable, it expands into a call to the special operator SETQ
, which, as a special operator, has access to both lexical and dynamic bindings.[81] For instance, to assign the value 10 to the variable x
, you can write this:
(setf x 10)
As I discussed earlier, assigning a new value to a binding has no effect on any other bindings of that variable. And it doesn't have any effect on the value that was stored in the binding prior to the assignment. Thus, the SETF
in this function:
(defun foo (x) (setf x 10))
will have no effect on any value outside of foo
. The binding that was created when foo
was called is set to 10, immediately replacing whatever value was passed as an argument. In particular, a form such as the following:
(let ((y 20))
(foo y)
(print y))
will print 20, not 10, as it's the value of y
that's passed to foo
where it's briefly the value of the variable x
before the SETF
gives x
a new value.
SETF
can also assign to multiple places in sequence. For instance, instead of the following:
(setf x 1)
(setf y 2)
you can write this:
(setf x 1 y 2)
SETF
returns the newly assigned value, so you can also nest calls to SETF
as in the following expression, which assigns both x
and y
the same random value:
(setf x (setf y (random 10)))
Generalized Assignment
Variable bindings, of course, aren't the only places that can hold values. Common Lisp supports composite data structures such as arrays, hash tables, and lists, as well as user-defined data structures, all of which consist of multiple places that can each hold a value.
I'll cover those data structures in future chapters, but while we're on the topic of assignment, you should note that SETF
can assign any place a value. As I cover the different composite data structures, I'll point out which functions can serve as "SETF
able places." The short version, however, is if you need to assign a value to a place, SETF
is almost certainly the tool to use. It's even possible to extend SETF
to allow it to assign to user-defined places though I won't cover that.[82]
In this regard SETF
is no different from the =
assignment operator in most C-derived languages. In those languages, the =
operator assigns new values to variables, array elements, and fields of classes. In languages such as Perl and Python that support hash tables as a built-in data type, =
can also set the values of individual hash table entries. Table 6-1 summarizes the various ways =
is used in those languages.
Table 6-1. Assignment with =
in Other Languages
Assigning to ... | Java, C, C++ | Perl | Python |
... variable | x = 10; |
$x = 10; |
x = 10 |
... array element | a[0] = 10; |
$a[0] = 10; |
a[0] = 10 |
... hash table entry | — |
$hash{'key'} = 10; |
hash['key'] = 10 |
... field in object | o.field = 10; |
$o->{'field'} = 10; |
o.field = 10 |
SETF
works the same way—the first "argument" to SETF
is a place to store the value, and the second argument provides the value. As with the =
operator in these languages, you use the same form to express the place as you'd normally use to fetch the value.[83] Thus, the Lisp equivalents of the assignments in Table 6-1—given that AREF
is the array access function, GETHASH
does a hash table lookup, and field
might be a function that accesses a slot named field
of a user-defined object—are as follows:
Simple variable: (setf x 10)
Array: (setf (aref a 0) 10)
Hash table: (setf (gethash 'key hash) 10)
Slot named 'field': (setf (field o) 10)
Note that SETF
ing a place that's part of a larger object has the same semantics as SETF
ing a variable: the place is modified without any effect on the object that was previously stored in the place. Again, this is similar to how =
behaves in Java, Perl, and Python.[84]
Other Ways to Modify Places
While all assignments can be expressed with SETF
, certain patterns involving assigning a new value based on the current value are sufficiently common to warrant their own operators. For instance, while you could increment a number with SETF
, like this:
(setf x (+ x 1))
or decrement it with this:
(setf x (- x 1))
it's a bit tedious, compared to the C-style ++x
and —x
. Instead, you can use the macros INCF
and DECF
, which increment and decrement a place by a certain amount that defaults to 1.
(incf x) === (setf x (+ x 1))
(decf x) === (setf x (- x 1))
(incf x 10) === (setf x (+ x 10))
INCF
and DECF
are examples of a kind of macro called modify macros. Modify macros are macros built on top of SETF
that modify places by assigning a new value based on the current value of the place. The main benefit of modify macros is that they're more concise than the same modification written out using SETF
. Additionally, modify macros are defined in a way that makes them safe to use with places where the place expression must be evaluated only once. A silly example is this expression, which increments the value of an arbitrary element of an array:
(incf (aref *array* (random (length *array*))))
A naive translation of that into a SETF
expression might look like this:
(setf (aref *array* (random (length *array*)))
(1+ (aref *array* (random (length *array*)))))
However, that doesn't work because the two calls to RANDOM
won't necessarily return the same value—this expression will likely grab the value of one element of the array, increment it, and then store it back as the new value of a different element. The INCF
expression, however, does the right thing because it knows how to take apart this expression:
(aref *array* (random (length *array*)))
to pull out the parts that could possibly have side effects to make sure they're evaluated only once. In this case, it would probably expand into something more or less equivalent to this:
(let ((tmp (random (length *array*))))
(setf (aref *array* tmp) (1+ (aref *array* tmp))))
In general, modify macros are guaranteed to evaluate both their arguments and the subforms of the place form exactly once each, in left-to-right order.
The macro PUSH
, which you used in the mini-database to add elements to the *db*
variable, is another modify macro. You'll take a closer look at how it and its counterparts POP
and PUSHNEW
work in Chapter 12 when I talk about how lists are represented in Lisp.
Finally, two slightly esoteric but useful modify macros are ROTATEF
and SHIFTF
. ROTATEF
rotates values between places. For instance, if you have two variables, a
and b
, this call:
(rotatef a b)
swaps the values of the two variables and returns NIL
. Since a
and b
are variables and you don't have to worry about side effects, the previous ROTATEF
expression is equivalent to this:
(let ((tmp a)) (setf a b b tmp) nil)
With other kinds of places, the equivalent expression using SETF
would be quite a bit more complex.
SHIFTF
is similar except instead of rotating values it shifts them to the left—the last argument provides a value that's moved to the second-to-last argument while the rest of the values are moved one to the left. The original value of the first argument is simply returned. Thus, the following:
(shiftf a b 10)
is equivalent—again, since you don't have to worry about side effects—to this:
(let ((tmp a)) (setf a b b 10) tmp)
Both ROTATEF
and SHIFTF
can be used with any number of arguments and, like all modify macros, are guaranteed to evaluate them exactly once, in left to right order.
With the basics of Common Lisp's functions and variables under your belt, now you're ready to move onto the feature that continues to differentiate Lisp from other languages: macros.
7. Macros: Standard Control Constructs
While many of the ideas that originated in Lisp, from the conditional expression to garbage collection, have been incorporated into other languages, the one language feature that continues to set Common Lisp apart is its macro system. Unfortunately, the word macro describes a lot of things in computing to which Common Lisp's macros bear only a vague and metaphorical similarity. This causes no end of misunderstanding when Lispers try to explain to non-Lispers what a great feature macros are.[85] To understand Lisp's macros, you really need to come at them fresh, without preconceptions based on other things that also happen to be called macros. So let's start our discussion of Lisp's macros by taking a step back and looking at various ways languages support extensibility.
All programmers should be used to the idea that the definition of a language can include a standard library of functionality that's implemented in terms of the "core" language—functionality that could have been implemented by any programmer on top of the language if it hadn't been defined as part of the standard library. C's standard library, for instance, can be implemented almost entirely in portable C. Similarly, most of the ever-growing set of classes and interfaces that ship with Java's standard Java Development Kit (JDK) are written in "pure" Java.
One advantage of defining languages in terms of a core plus a standard library is it makes them easier to understand and implement. But the real benefit is in terms of expressiveness—since much of what you think of as "the language" is really just a library—the language is easy to extend. If C doesn't have a function to do some thing or another that you need, you can write that function, and now you have a slightly richer version of C. Similarly, in a language such as Java or Smalltalk where almost all the interesting parts of the "language" are defined in terms of classes, by defining new classes you extend the language, making it more suited for writing programs to do whatever it is you're trying to do.
While Common Lisp supports both these methods of extending the language, macros give Common Lisp yet another way. As I discussed briefly in Chapter 4, each macro defines its own syntax, determining how the s-expressions it's passed are turned into Lisp forms. With macros as part of the core language it's possible to build new syntax—control constructs such as WHEN
, DOLIST
, and LOOP
as well as definitional forms such as DEFUN
and DEFPARAMETER
—as part of the "standard library" rather than having to hardwire them into the core. This has implications for how the language itself is implemented, but as a Lisp programmer you'll care more that it gives you another way to extend the language, making it a better language for expressing solutions to your particular programming problems.
Now, it may seem that the benefits of having another way to extend the language would be easy to recognize. But for some reason a lot of folks who haven't actually used Lisp macros—folks who think nothing of spending their days creating new functional abstractions or defining hierarchies of classes to solve their programming problems—get spooked by the idea of being able to define new syntactic abstractions. The most common cause of macrophobia seems to be bad experiences with other "macro" systems. Simple fear of the unknown no doubt plays a role, too. To avoid triggering any macrophobic reactions, I'll ease into the subject by discussing several of the standard control-construct macros defined by Common Lisp. These are some of the things that, if Lisp didn't have macros, would have to be built into the language core. When you use them, you don't have to care that they're implemented as macros, but they provide a good example of some of the things you can do with macros.[86] In the next chapter, I'll show you how you can define your own macros.
WHEN and UNLES
As you've already seen, the most basic form of conditional execution—if x, do y; otherwise do z—is provided by the IF
special operator, which has this basic form:
(if condition then-form [else-form])
The condition is evaluated and, if its value is non-NIL
, the then-form is evaluated and the resulting value returned. Otherwise, the else-form, if any, is evaluated and its value returned. If condition is NIL
and there's no else-form, then the IF
returns NIL
.
(if (> 2 3) "Yup" "Nope") ==> "Nope"
(if (> 2 3) "Yup") ==> NIL
(if (> 3 2) "Yup" "Nope") ==> "Yup"
However, IF
isn't actually such a great syntactic construct because the then-form and else-form are each restricted to being a single Lisp form. This means if you want to perform a sequence of actions in either clause, you need to wrap them in some other syntax. For instance, suppose in the middle of a spam-filtering program you wanted to both file a message as spam and update the spam database when a message is spam. You can't write this:
(if (spam-p current-message)
(file-in-spam-folder current-message)
(update-spam-database current-message))
because the call to update-spam-database
will be treated as the else clause, not as part of the then clause. Another special operator, PROGN
, executes any number of forms in order and returns the value of the last form. So you could get the desired behavior by writing the following:
(if (spam-p current-message)
(progn
(file-in-spam-folder current-message)
(update-spam-database current-message)))
That's not too horrible. But given the number of times you'll likely have to use this idiom, it's not hard to imagine that you'd get tired of it after a while. "Why," you might ask yourself, "doesn't Lisp provide a way to say what I really want, namely, 'When x is true, do this, that, and the other thing'?" In other words, after a while you'd notice the pattern of an IF
plus a PROGN
and wish for a way to abstract away the details rather than writing them out every time.
This is exactly what macros provide. In this case, Common Lisp comes with a standard macro, WHEN
, which lets you write this:
(when (spam-p current-message)
(file-in-spam-folder current-message)
(update-spam-database current-message))
But if it wasn't built into the standard library, you could define WHEN
yourself with a macro such as this, using the backquote notation I discussed in Chapter 3:[87]
(defmacro when (condition &rest body)
`(if ,condition (progn ,@body)))
A counterpart to the WHEN
macro is UNLESS
, which reverses the condition, evaluating its body forms only if the condition is false. In other words:
(defmacro unless (condition &rest body)
`(if (not ,condition) (progn ,@body)))
Admittedly, these are pretty trivial macros. There's no deep black magic here; they just abstract away a few language-level bookkeeping details, allowing you to express your true intent a bit more clearly. But their very triviality makes an important point: because the macro system is built right into the language, you can write trivial macros like WHEN
and UNLESS
that give you small but real gains in clarity that are then multiplied by the thousands of times you use them. In Chapters 24, 26, and 31 you'll see how macros can also be used on a larger scale, creating whole domain-specific embedded languages. But first let's finish our discussion of the standard control-construct macros.
COND
Another time raw IF
expressions can get ugly is when you have a multibranch conditional: if a do x, else if b do y; else do z. There's no logical problem writing such a chain of conditional expressions with just IF
, but it's not pretty.
(if a
(do-x)
(if b
(do-y)
(do-z)))
And it would be even worse if you needed to include multiple forms in the then clauses, requiring PROGN
s. So, not surprisingly, Common Lisp provides a macro for expressing multibranch conditionals: COND
. This is the basic skeleton:
(cond
(test-1 form*)
.
.
.
(test-N form*))
Each element of the body represents one branch of the conditional and consists of a list containing a condition form and zero or more forms to be evaluated if that branch is chosen. The conditions are evaluated in the order the branches appear in the body until one of them evaluates to true. At that point, the remaining forms in that branch are evaluated, and the value of the last form in the branch is returned as the value of the COND
as a whole. If the branch contains no forms after the condition, the value of the condition is returned instead. By convention, the branch representing the final else clause in an if/else-if chain is written with a condition of T
. Any non-NIL
value will work, but a T
serves as a useful landmark when reading the code. Thus, you can write the previous nested IF
expression using COND
like this:
(cond (a (do-x))
(b (do-y))
(t (do-z)))
AND, OR, and NOT
When writing the conditions in IF
, WHEN
, UNLESS
, and COND
forms, three operators that will come in handy are the boolean logic operators, AND
, OR
, and NOT
.
NOT
is a function so strictly speaking doesn't belong in this chapter, but it's closely tied to AND
and OR
. It takes a single argument and inverts its truth value, returning T
if the argument is NIL
and NIL
otherwise.
AND
and OR
, however, are macros. They implement logical conjunction and disjunction of any number of subforms and are defined as macros so they can short-circuit. That is, they evaluate only as many of their subforms—in left-to-right order—as necessary to determine the overall truth value. Thus, AND
stops and returns NIL
as soon as one of its subforms evaluates to NIL
. If all the subforms evaluate to non-NIL
, it returns the value of the last subform. OR
, on the other hand, stops as soon as one of its subforms evaluates to non-NIL
and returns the resulting value. If none of the subforms evaluate to true, OR
returns NIL
. Here are some examples:
(not nil) ==> T
(not (= 1 1)) ==> NIL
(and (= 1 2) (= 3 3)) ==> NIL
(or (= 1 2) (= 3 3)) ==> T
Looping
Control constructs are the other main kind of looping constructs. Common Lisp's looping facilities are—in addition to being quite powerful and flexible—an interesting lesson in the have-your-cake-and-eat-it-too style of programming that macros provide.
As it turns out, none of Lisp's 25 special operators directly support structured looping. All of Lisp's looping control constructs are macros built on top of a pair of special operators that provide a primitive goto facility.[88] Like many good abstractions, syntactic or otherwise, Lisp's looping macros are built as a set of layered abstractions starting from the base provided by those two special operators.
At the bottom (leaving aside the special operators) is a very general looping construct, DO
. While very powerful, DO
suffers, as do many general-purpose abstractions, from being overkill for simple situations. So Lisp also provides two other macros, DOLIST
and DOTIMES
, that are less flexible than DO
but provide convenient support for the common cases of looping over the elements of a list and counting loops. While an implementation can implement these macros however it wants, they're typically implemented as macros that expand into an equivalent DO
loop. Thus, DO
provides a basic structured looping construct on top of the underlying primitives provided by Common Lisp's special operators, and DOLIST
and DOTIMES
provide two easier-to-use, if less general, constructs. And, as you'll see in the next chapter, you can build your own looping constructs on top of DO
for situations where DOLIST
and DOTIMES
don't meet your needs.
Finally, the LOOP
macro provides a full-blown mini-language for expressing looping constructs in a non-Lispy, English-like (or at least Algol-like) language. Some Lisp hackers love LOOP
; others hate it. LOOP
's fans like it because it provides a concise way to express certain commonly needed looping constructs. Its detractors dislike it because it's not Lispy enough. But whichever side one comes down on, it's a remarkable example of the power of macros to add new constructs to the language.
DOLIST and DOTIMES
I'll start with the easy-to-use DOLIST
and DOTIMES
macros.
DOLIST
loops across the items of a list, executing the loop body with a variable holding the successive items of the list.[89] This is the basic skeleton (leaving out some of the more esoteric options):
(dolist (var list-form)
body-form*)
When the loop starts, the list-form is evaluated once to produce a list. Then the body of the loop is evaluated once for each item in the list with the variable var holding the value of the item. For instance:
CL-USER> (dolist (x '(1 2 3)) (print x))
1
2
3
NIL
Used this way, the DOLIST
form as a whole evaluates to NIL
.
If you want to break out of a DOLIST
loop before the end of the list, you can use RETURN
.
CL-USER> (dolist (x '(1 2 3)) (print x) (if (evenp x) (return)))
1
2
NIL
DOTIMES
is the high-level looping construct for counting loops. The basic template is much the same as DOLIST
's.
(dotimes (var count-form)
body-form*)
The count-form must evaluate to an integer. Each time through the loop var holds successive integers from 0 to one less than that number. For instance:
CL-USER> (dotimes (i 4) (print i))
0
1
2
3
NIL
As with DOLIST
, you can use RETURN
to break out of the loop early.
Because the body of both DOLIST
and DOTIMES
loops can contain any kind of expressions, you can also nest loops. For example, to print out the times tables from 1 × 1 = 1
to 20 × 20 = 400
, you can write this pair of nested DOTIMES
loops:
(dotimes (x 20)
(dotimes (y 20)
(format t "~3d " (* (1+ x) (1+ y))))
(format t "~%"))
DO
While DOLIST
and DOTIMES
are convenient and easy to use, they aren't flexible enough to use for all loops. For instance, what if you want to step multiple variables in parallel? Or use an arbitrary expression to test for the end of the loop? If neither DOLIST
nor DOTIMES
meet your needs, you still have access to the more general DO
loop.
Where DOLIST
and DOTIMES
provide only one loop variable, DO
lets you bind any number of variables and gives you complete control over how they change on each step through the loop. You also get to define the test that determines when to end the loop and can provide a form to evaluate at the end of the loop to generate a return value for the DO
expression as a whole. The basic template looks like this:
(do (variable-definition*)
(end-test-form result-form*)
statement*)
Each variable-definition introduces a variable that will be in scope in the body of the loop. The full form of a single variable definition is a list containing three elements.
(var init-form step-form)
The init-form will be evaluated at the beginning of the loop and the resulting values bound to the variable var. Before each subsequent iteration of the loop, the step-form will be evaluated and the new value assigned to var. The step-form is optional; if it's left out, the variable will keep its value from iteration to iteration unless you explicitly assign it a new value in the loop body. As with the variable definitions in a LET
, if the init-form is left out, the variable is bound to NIL
. Also as with LET
, you can use a plain variable name as shorthand for a list containing just the name.
At the beginning of each iteration, after all the loop variables have been given their new values, the end-test-form is evaluated. As long as it evaluates to NIL
, the iteration proceeds, evaluating the statements in order.
When the end-test-form evaluates to true, the result-forms are evaluated, and the value of the last result form is returned as the value of the DO
expression.
At each step of the iteration the step forms for all the variables are evaluated before assigning any of the values to the variables. This means you can refer to any of the other loop variables in the step forms.[90] That is, in a loop like this:
(do ((n 0 (1+ n))
(cur 0 next)
(next 1 (+ cur next)))
((= 10 n) cur))
the step forms (1+ n)
, next
, and (+ cur next)
are all evaluated using the old values of n
, cur
, and next
. Only after all the step forms have been evaluated are the variables given their new values. (Mathematically inclined readers may notice that this is a particularly efficient way of computing the eleventh Fibonacci number.)
This example also illustrates another characteristic of DO
—because you can step multiple variables, you often don't need a body at all. Other times, you may leave out the result form, particularly if you're just using the loop as a control construct. This flexibility, however, is the reason that DO
expressions can be a bit cryptic. Where exactly do all the parentheses go? The best way to understand a DO
expression is to keep in mind the basic template.
(do (variable-definition*)
(end-test-form result-form*)
statement*)
The six parentheses in that template are the only ones required by the DO
itself. You need one pair to enclose the variable declarations, one pair to enclose the end test and result forms, and one pair to enclose the whole expression. Other forms within the DO
may require their own parentheses—variable definitions are usually lists, for instance. And the test form is often a function call. But the skeleton of a DO
loop will always be the same. Here are some example DO
loops with the skeleton in bold:
(do ((i 0 (1+ i)))
((>= i 4))
(print i))
Notice that the result form has been omitted. This is, however, not a particularly idiomatic use of DO
, as this loop is much more simply written using DOTIMES
.[91]
(dotimes (i 4) (print i))
As another example, here's the bodiless Fibonacci-computing loop:
(do ((n 0 (1+ n))
(cur 0 next)
(next 1 (+ cur next)))
((= 10 n) cur))
Finally, the next loop demonstrates a DO
loop that binds no variables. It loops while the current time is less than the value of a global variable, printing "Waiting" once a minute. Note that even with no loop variables, you still need the empty variables list.
(do ()
((> (get-universal-time) *some-future-date*))
(format t "Waiting~%")
(sleep 60))
The Mighty LOOP
For the simple cases you have DOLIST
and DOTIMES
. And if they don't suit your needs, you can fall back on the completely general DO
. What more could you want?
Well, it turns out a handful of looping idioms come up over and over again, such as looping over various data structures: lists, vectors, hash tables, and packages. Or accumulating values in various ways while looping: collecting, counting, summing, minimizing, or maximizing. If you need a loop to do one of these things (or several at the same time), the LOOP
macro may give you an easier way to express it.
The LOOP
macro actually comes in two flavors—simple and extended. The simple version is as simple as can be—an infinite loop that doesn't bind any variables. The skeleton looks like this:
(loop
body-form*)
The forms in body are evaluated each time through the loop, which will iterate forever unless you use RETURN
to break out. For example, you could write the previous DO
loop with a simple LOOP
.
(loop
(when (> (get-universal-time) *some-future-date*)
(return))
(format t "Waiting~%")
(sleep 60))
The extended LOOP
is quite a different beast. It's distinguished by the use of certain loop keywords that implement a special-purpose language for expressing looping idioms. It's worth noting that not all Lispers love the extended LOOP
language. At least one of Common Lisp's original designers hated it. LOOP
's detractors complain that its syntax is totally un-Lispy (in other words, not enough parentheses). LOOP
's fans counter that that's the point: complicated looping constructs are hard enough to understand without wrapping them up in DO
's cryptic syntax. It's better, they say, to have a slightly more verbose syntax that gives you some clues what the heck is going on.
For instance, here's an idiomatic DO
loop that collects the numbers from 1 to 10 into a list:
(do ((nums nil) (i 1 (1+ i)))
((> i 10) (nreverse nums))
(push i nums)) ==> (1 2 3 4 5 6 7 8 9 10)
A seasoned Lisper won't have any trouble understanding that code—it's just a matter of understanding the basic form of a DO
loop and recognizing the PUSH
/NREVERSE
idiom for building up a list. But it's not exactly transparent. The LOOP
version, on the other hand, is almost understandable as an English sentence.
(loop for i from 1 to 10 collecting i) ==> (1 2 3 4 5 6 7 8 9 10)
The following are some more examples of simple uses of LOOP
. This sums the first ten squares:
(loop for x from 1 to 10 summing (expt x 2)) ==> 385
This counts the number of vowels in a string:
(loop for x across "the quick brown fox jumps over the lazy dog"
counting (find x "aeiou")) ==> 11
This computes the eleventh Fibonacci number, similar to the DO
loop used earlier:
(loop for i below 10
and a = 0 then b
and b = 1 then (+ b a)
finally (return a))
The symbols across
, and
, below
, collecting
, counting
, finally
, for
, from
, summing
, then
, and to
are some of the loop keywords whose presence identifies these as instances of the extended LOOP
.[92]
I'll save the details of LOOP
for Chapter 22, but it's worth noting here as another example of the way macros can be used to extend the base language. While LOOP
provides its own language for expressing looping constructs, it doesn't cut you off from the rest of Lisp. The loop keywords are parsed according to loop's grammar, but the rest of the code in a LOOP
is regular Lisp code.
And it's worth pointing out one more time that while the LOOP
macro is quite a bit more complicated than macros such as WHEN
or UNLESS
, it is just another macro. If it hadn't been included in the standard library, you could implement it yourself or get a third-party library that does.
With that I'll conclude our tour of the basic control-construct macros. Now you're ready to take a closer look at how to define your own macros.
8. Macros: Defining Your Own
Now it's time to start writing your own macros. The standard macros I covered in the previous chapter hint at some of the things you can do with macros, but that's just the beginning. Common Lisp doesn't support macros so every Lisp programmer can create their own variants of standard control constructs any more than C supports functions so every C programmer can write trivial variants of the functions in the C standard library. Macros are part of the language to allow you to create abstractions on top of the core language and standard library that move you closer toward being able to directly express the things you want to express.
Perhaps the biggest barrier to a proper understanding of macros is, ironically, that they're so well integrated into the language. In many ways they seem like just a funny kind of function—they're written in Lisp, they take arguments and return results, and they allow you to abstract away distracting details. Yet despite these many similarities, macros operate at a different level than functions and create a totally different kind of abstraction.
Once you understand the difference between macros and functions, the tight integration of macros in the language will be a huge benefit. But in the meantime, it's a frequent source of confusion for new Lispers. The following story, while not true in a historical or technical sense, tries to alleviate the confusion by giving you a way to think about how macros work.
The Story of Mac: A Just-So Story
Once upon a time, long ago, there was a company of Lisp programmers. It was so long ago, in fact, that Lisp had no macros. Anything that couldn't be defined with a function or done with a special operator had to be written in full every time, which was rather a drag. Unfortunately, the programmers in this company—though brilliant—were also quite lazy. Often in the middle of their programs—when the tedium of writing a bunch of code got to be too much—they would instead write a note describing the code they needed to write at that place in the program. Even more unfortunately, because they were lazy, the programmers also hated to go back and actually write the code described by the notes. Soon the company had a big stack of programs that nobody could run because they were full of notes about code that still needed to be written.
In desperation, the big bosses hired a junior programmer, Mac, whose job was to find the notes, write the required code, and insert it into the program in place of the notes. Mac never ran the programs—they weren't done yet, of course, so he couldn't. But even if they had been completed, Mac wouldn't have known what inputs to feed them. So he just wrote his code based on the contents of the notes and sent it back to the original programmer.
With Mac's help, all the programs were soon completed, and the company made a ton of money selling them—so much money that the company could double the size of its programming staff. But for some reason no one thought to hire anyone to help Mac; soon he was single- handedly assisting several dozen programmers. To avoid spending all his time searching for notes in source code, Mac made a small modification to the compiler the programmers used. Thereafter, whenever the compiler hit a note, it would e-mail him the note and wait for him to e-mail back the replacement code. Unfortunately, even with this change, Mac had a hard time keeping up with the programmers. He worked as carefully as he could, but sometimes— especially when the notes weren't clear—he would make mistakes.
The programmers noticed, however, that the more precisely they wrote their notes, the more likely it was that Mac would send back correct code. One day, one of the programmers, having a hard time describing in words the code he wanted, included in one of his notes a Lisp program that would generate the code he wanted. That was fine by Mac; he just ran the program and sent the result to the compiler.
The next innovation came when a programmer put a note at the top of one of his programs containing a function definition and a comment that said, "Mac, don't write any code here, but keep this function for later; I'm going to use it in some of my other notes." Other notes in the same program said things such as, "Mac, replace this note with the result of running that other function with the symbols x
and y
as arguments."
This technique caught on so quickly that within a few days, most programs contained dozens of notes defining functions that were only used by code in other notes. To make it easy for Mac to pick out the notes containing only definitions that didn't require any immediate response, the programmers tagged them with the standard preface: "Definition for Mac, Read Only." This—as the programmers were still quite lazy—was quickly shortened to "DEF. MAC. R/O" and then "DEFMACRO."
Pretty soon, there was no actual English left in the notes for Mac. All he did all day was read and respond to e-mails from the compiler containing DEFMACRO notes and calls to the functions defined in the DEFMACROs. Since the Lisp programs in the notes did all the real work, keeping up with the e-mails was no problem. Mac suddenly had a lot of time on his hands and would sit in his office daydreaming about white-sand beaches, clear blue ocean water, and drinks with little paper umbrellas in them.
Several months later the programmers realized nobody had seen Mac for quite some time. When they went to his office, they found a thin layer of dust over everything, a desk littered with travel brochures for various tropical locations, and the computer off. But the compiler still worked—how could it be? It turned out Mac had made one last change to the compiler: instead of e-mailing notes to Mac, the compiler now saved the functions defined by DEFMACRO notes and ran them when called for by the other notes. The programmers decided there was no reason to tell the big bosses Mac wasn't coming to the office anymore. So to this day, Mac draws a salary and from time to time sends the programmers a postcard from one tropical locale or another.
Macro Expansion Time vs. Runtime
The key to understanding macros is to be quite clear about the distinction between the code that generates code (macros) and the code that eventually makes up the program (everything else). When you write macros, you're writing programs that will be used by the compiler to generate the code that will then be compiled. Only after all the macros have been fully expanded and the resulting code compiled can the program actually be run. The time when macros run is called macro expansion time; this is distinct from runtime, when regular code, including the code generated by macros, runs.
It's important to keep this distinction firmly in mind because code running at macro expansion time runs in a very different environment than code running at runtime. Namely, at macro expansion time, there's no way to access the data that will exist at runtime. Like Mac, who couldn't run the programs he was working on because he didn't know what the correct inputs were, code running at macro expansion time can deal only with the data that's inherent in the source code. For instance, suppose the following source code appears somewhere in a program:
(defun foo (x)
(when (> x 10) (print 'big)))
Normally you'd think of x
as a variable that will hold the argument passed in a call to foo
. But at macro expansion time, such as when the compiler is running the WHEN
macro, the only data available is the source code. Since the program isn't running yet, there's no call to foo
and thus no value associated with x
. Instead, the values the compiler passes to WHEN
are the Lisp lists representing the source code, namely, (> x 10)
and (print 'big)
. Suppose that WHEN
is defined, as you saw in the previous chapter, with something like the following macro:
(defmacro when (condition &rest body)
`(if ,condition (progn ,@body)))
When the code in foo
is compiled, the WHEN
macro will be run with those two forms as arguments. The parameter condition
will be bound to the form (> x 10)
, and the form (print 'big)
will be collected into a list that will become the value of the &rest body
parameter. The backquote expression will then generate this code:
(if (> x 10) (progn (print 'big)))
by interpolating in the value of condition
and splicing the value of body
into the PROGN
.
When Lisp is interpreted, rather than compiled, the distinction between macro expansion time and runtime is less clear because they're temporally intertwined. Also, the language standard doesn't specify exactly how an interpreter must handle macros—it could expand all the macros in the form being interpreted and then interpret the resulting code, or it could start right in on interpreting the form and expand macros when it hits them. In either case, macros are always passed the unevaluated Lisp objects representing the subforms of the macro form, and the job of the macro is still to produce code that will do something rather than to do anything directly.
DEFMACRO
As you saw in Chapter 3, macros really are defined with DEFMACRO
forms, though it stands—of course—for DEFine MACRO, not Definition for Mac. The basic skeleton of a DEFMACRO
is quite similar to the skeleton of a DEFUN
.
(defmacro name (parameter*)
"Optional documentation string."
body-form*)
Like a function, a macro consists of a name, a parameter list, an optional documentation string, and a body of Lisp expressions.[93] However, as I just discussed, the job of a macro isn't to do anything directly—its job is to generate code that will later do what you want.
Macros can use the full power of Lisp to generate their expansion, which means in this chapter I can only scratch the surface of what you can do with macros. I can, however, describe a general process for writing macros that works for all macros from the simplest to the most complex.
The job of a macro is to translate a macro form—in other words, a Lisp form whose first element is the name of the macro—into code that does a particular thing. Sometimes you write a macro starting with the code you'd like to be able to write, that is, with an example macro form. Other times you decide to write a macro after you've written the same pattern of code several times and realize you can make your code clearer by abstracting the pattern.
Regardless of which end you start from, you need to figure out the other end before you can start writing a macro: you need to know both where you're coming from and where you're going before you can hope to write code to do it automatically. Thus, the first step of writing a macro is to write at least one example of a call to the macro and the code into which that call should expand.
Once you have an example call and the desired expansion, you're ready for the second step: writing the actual macro code. For simple macros this will be a trivial matter of writing a backquoted template with the macro parameters plugged into the right places. Complex macros will be significant programs in their own right, complete with helper functions and data structures.
After you've written code to translate the example call to the appropriate expansion, you need to make sure the abstraction the macro provides doesn't "leak" details of its implementation. Leaky macro abstractions will work fine for certain arguments but not others or will interact with code in the calling environment in undesirable ways. As it turns out, macros can leak in a small handful of ways, all of which are easily avoided as long as you know to check for them. I'll discuss how in the section "Plugging the Leaks."
To sum up, the steps to writing a macro are as follows:
1. Write a sample call to the macro and the code it should expand into, or vice versa.
2. Write code that generates the handwritten expansion from the arguments in the sample call.
3. Make sure the macro abstraction doesn't "leak."
A Sample Macro: do-primes
To see how this three-step process works, you'll write a macro do-primes
that provides a looping construct similar to DOTIMES
and DOLIST
except that instead of iterating over integers or elements of a list, it iterates over successive prime numbers. This isn't meant to be an example of a particularly useful macro—it's just a vehicle for demonstrating the process.
First, you'll need two utility functions, one to test whether a given number is prime and another that returns the next prime number greater or equal to its argument. In both cases you can use a simple, but inefficient, brute-force approach.
(defun primep (number)
(when (> number 1)
(loop for fac from 2 to (isqrt number) never (zerop (mod number fac)))))
(defun next-prime (number)
(loop for n from number when (primep n) return n))
Now you can write the macro. Following the procedure outlined previously, you need at least one example of a call to the macro and the code into which it should expand. Suppose you start with the idea that you want to be able to write this:
(do-primes (p 0 19)
(format t "~d " p))
to express a loop that executes the body once each for each prime number greater or equal to 0 and less than or equal to 19, with the variable p
holding the prime number. It makes sense to model this macro on the form of the standard DOLIST
and DOTIMES
macros; macros that follow the pattern of existing macros are easier to understand and use than macros that introduce gratuitously novel syntax.
Without the do-primes
macro, you could write such a loop with DO
(and the two utility functions defined previously) like this:
(do ((p (next-prime 0) (next-prime (1+ p))))
((> p 19))
(format t "~d " p))
Now you're ready to start writing the macro code that will translate from the former to the latter.
Macro Parameters
Since the arguments passed to a macro are Lisp objects representing the source code of the macro call, the first step in any macro is to extract whatever parts of those objects are needed to compute the expansion. For macros that simply interpolate their arguments directly into a template, this step is trivial: simply defining the right parameters to hold the different arguments is sufficient.
But this approach, it seems, will not suffice for do-primes
. The first argument to the do-primes
call is a list containing the name of the loop variable, p
; the lower bound, 0
; and the upper bound, 19
. But if you look at the expansion, the list as a whole doesn't appear in the expansion; the three element are split up and put in different places.
You could define do-primes
with two parameters, one to hold the list and a &rest
parameter to hold the body forms, and then take apart the list by hand, something like this:
(defmacro do-primes (var-and-range &rest body)
(let ((var (first var-and-range))
(start (second var-and-range))
(end (third var-and-range)))
`(do ((,var (next-prime ,start) (next-prime (1+ ,var))))
((> ,var ,end))
,@body)))
In a moment I'll explain how the body generates the correct expansion; for now you can just note that the variables var
, start
, and end
each hold a value, extracted from var-and-range
, that's then interpolated into the backquote expression that generates do-primes
's expansion.
However, you don't need to take apart var-and-range
"by hand" because macro parameter lists are what are called destructuring parameter lists. Destructuring, as the name suggests, involves taking apart a structure—in this case the list structure of the forms passed to a macro.
Within a destructuring parameter list, a simple parameter name can be replaced with a nested parameter list. The parameters in the nested parameter list will take their values from the elements of the expression that would have been bound to the parameter the list replaced. For instance, you can replace var-and-range
with a list (var start end)
, and the three elements of the list will automatically be destructured into those three parameters.
Another special feature of macro parameter lists is that you can use &body
as a synonym for &rest
. Semantically &body
and &rest
are equivalent, but many development environments will use the presence of a &body
parameter to modify how they indent uses of the macro—typically &body
parameters are used to hold a list of forms that make up the body of the macro.
So you can streamline the definition of do-primes
and give a hint to both human readers and your development tools about its intended use by defining it like this:
(defmacro do-primes ((var start end) &body body)
`(do ((,var (next-prime ,start) (next-prime (1+ ,var))))
((> ,var ,end))
,@body))
In addition to being more concise, destructuring parameter lists also give you automatic error checking—with do-primes
defined this way, Lisp will be able to detect a call whose first argument isn't a three-element list and will give you a meaningful error message just as if you had called a function with too few or too many arguments. Also, in development environments such as SLIME that indicate what arguments are expected as soon as you type the name of a function or macro, if you use a destructuring parameter list, the environment will be able to tell you more specifically the syntax of the macro call. With the original definition, SLIME would tell you do-primes
is called like this:
(do-primes var-and-range &rest body)
But with the new definition, it can tell you that a call should look like this:
(do-primes (var start end) &body body)
Destructuring parameter lists can contain &optional
, &key
, and &rest
parameters and can contain nested destructuring lists. However, you don't need any of those options to write do-primes
.
Generating the Expansion
Because do-primes
is a fairly simple macro, after you've destructured the arguments, all that's left is to interpolate them into a template to get the expansion.
For simple macros like do-primes
, the special backquote syntax is perfect. To review, a backquoted expression is similar to a quoted expression except you can "unquote" particular subexpressions by preceding them with a comma, possibly followed by an at (@) sign. Without an at sign, the comma causes the value of the subexpression to be included as is. With an at sign, the value—which must be a list—is "spliced" into the enclosing list.
Another useful way to think about the backquote syntax is as a particularly concise way of writing code that generates lists. This way of thinking about it has the benefit of being pretty much exactly what's happening under the covers—when the reader reads a backquoted expression, it translates it into code that generates the appropriate list structure. For instance, `(,a b)
might be read as (list a 'b)
. The language standard doesn't specify exactly what code the reader must produce as long as it generates the right list structure.
Table 8-1 shows some examples of backquoted expressions along with equivalent list-building code and the result you'd get if you evaluated either the backquoted expression or the equivalent code.[94]
Table 8-1. Backquote Examples
Backquote Syntax | Equivalent List-Building Code | Result |
`(a (+ 1 2) c) |
(list 'a '(+ 1 2) 'c) |
(a (+ 1 2) c) |
`(a ,(+ 1 2) c) |
(list 'a (+ 1 2) 'c) |
(a 3 c) |
`(a (list 1 2) c) |
(list 'a '(list 1 2) 'c) |
(a (list 1 2) c) |
`(a ,(list 1 2) c) |
(list 'a (list 1 2) 'c) |
(a (1 2) c) |
`(a ,@(list 1 2) c) |
(append (list 'a) (list 1 2) (list 'c)) |
(a 1 2 c) |
It's important to note that backquote is just a convenience. But it's a big convenience. To appreciate how big, compare the backquoted version of do-primes
to the following version, which uses explicit list-building code:
(defmacro do-primes-a ((var start end) &body body)
(append '(do)
(list (list (list var
(list 'next-prime start)
(list 'next-prime (list '1+ var)))))
(list (list (list '> var end)))
body))
As you'll see in a moment, the current implementation of do-primes
doesn't handle certain edge cases correctly. But first you should verify that it at least works for the original example. You can test it in two ways. You can test it indirectly by simply using it—presumably, if the resulting behavior is correct, the expansion is correct. For instance, you can type the original example's use of do-primes
to the REPL and see that it indeed prints the right series of prime numbers.
CL-USER> (do-primes (p 0 19) (format t "~d " p))
2 3 5 7 11 13 17 19
NIL
Or you can check the macro directly by looking at the expansion of a particular call. The function MACROEXPAND-1
takes any Lisp expression as an argument and returns the result of doing one level of macro expansion.[95] Because MACROEXPAND-1
is a function, to pass it a literal macro form you must quote it. You can use it to see the expansion of the previous call.[96]
CL-USER> (macroexpand-1 '(do-primes (p 0 19) (format t "~d " p)))
(DO ((P (NEXT-PRIME 0) (NEXT-PRIME (1+ P))))
((> P 19))
(FORMAT T "~d " P))
T
Or, more conveniently, in SLIME you can check a macro's expansion by placing the cursor on the opening parenthesis of a macro form in your source code and typing C-c RET
to invoke the Emacs function slime-macroexpand-1
, which will pass the macro form to MACROEXPAND-1
and "pretty print" the result in a temporary buffer.
However you get to it, you can see that the result of macro expansion is the same as the original handwritten expansion, so it seems that do-primes
works.
Plugging the Leaks
In his essay "The Law of Leaky Abstractions," Joel Spolsky coined the term leaky abstraction to describe an abstraction that "leaks" details it's supposed to be abstracting away. Since writing a macro is a way of creating an abstraction, you need to make sure your macros don't leak needlessly.[97]
As it turns out, a macro can leak details of its inner workings in three ways. Luckily, it's pretty easy to tell whether a given macro suffers from any of those leaks and to fix them.
The current definition suffers from one of the three possible macro leaks: namely, it evaluates the end
subform too many times. Suppose you were to call do-primes
with, instead of a literal number such as 19
, an expression such as (random 100)
in the end
position.
(do-primes (p 0 (random 100))
(format t "~d " p))
Presumably the intent here is to loop over the primes from zero to whatever random number is returned by (random 100)
. However, this isn't what the current implementation does, as MACROEXPAND-1
shows.
CL-USER> (macroexpand-1 '(do-primes (p 0 (random 100)) (format t "~d " p)))
(DO ((P (NEXT-PRIME 0) (NEXT-PRIME (1+ P))))
((> P (RANDOM 100)))
(FORMAT T "~d " P))
T
When this expansion code is run, RANDOM
will be called each time the end test for the loop is evaluated. Thus, instead of looping until p
is greater than an initially chosen random number, this loop will iterate until it happens to draw a random number less than or equal to the current value of p
. While the total number of iterations will still be random, it will be drawn from a much different distribution than the uniform distribution RANDOM
returns.
This is a leak in the abstraction because, to use the macro correctly, the caller needs to be aware that the end
form is going to be evaluated more than once. One way to plug this leak would be to simply define this as the behavior of do-primes
. But that's not very satisfactory—you should try to observe the Principle of Least Astonishment when implementing macros. And programmers will typically expect the forms they pass to macros to be evaluated no more times than absolutely necessary.[98] Furthermore, since do-primes
is built on the model of the standard macros, DOTIMES
and DOLIST
, neither of which causes any of the forms except those in the body to be evaluated more than once, most programmers will expect do-primes
to behave similarly.
You can fix the multiple evaluation easily enough; you just need to generate code that evaluates end
once and saves the value in a variable to be used later. Recall that in a DO
loop, variables defined with an initialization form and no step form don't change from iteration to iteration. So you can fix the multiple evaluation problem with this definition:
(defmacro do-primes ((var start end) &body body)
`(do ((ending-value ,end)
(,var (next-prime ,start) (next-prime (1+ ,var))))
((> ,var ending-value))
,@body))
Unfortunately, this fix introduces two new leaks to the macro abstraction.
One new leak is similar to the multiple-evaluation leak you just fixed. Because the initialization forms for variables in a DO
loop are evaluated in the order the variables are defined, when the macro expansion is evaluated, the expression passed as end
will be evaluated before the expression passed as start
, opposite to the order they appear in the macro call. This leak doesn't cause any problems when start
and end
are literal values like 0 and 19. But when they're forms that can have side effects, evaluating them out of order can once again run afoul of the Principle of Least Astonishment.
This leak is trivially plugged by swapping the order of the two variable definitions.
(defmacro do-primes ((var start end) &body body)
`(do ((,var (next-prime ,start) (next-prime (1+ ,var)))
(ending-value ,end))
((> ,var ending-value))
,@body))
The last leak you need to plug was created by using the variable name ending-value
. The problem is that the name, which ought to be a purely internal detail of the macro implementation, can end up interacting with code passed to the macro or in the context where the macro is called. The following seemingly innocent call to do-primes
doesn't work correctly because of this leak:
(do-primes (ending-value 0 10)
(print ending-value))
Neither does this one:
(let ((ending-value 0))
(do-primes (p 0 10)
(incf ending-value p))
ending-value)
Again, MACROEXPAND-1
can show you the problem. The first call expands to this:
(do ((ending-value (next-prime 0) (next-prime (1+ ending-value)))
(ending-value 10))
((> ending-value ending-value))
(print ending-value))
Some Lisps may reject this code because ending-value
is used twice as a variable name in the same DO
loop. If not rejected outright, the code will loop forever since ending-value
will never be greater than itself.
The second problem call expands to the following:
(let ((ending-value 0))
(do ((p (next-prime 0) (next-prime (1+ p)))
(ending-value 10))
((> p ending-value))
(incf ending-value p))
ending-value)
In this case the generated code is perfectly legal, but the behavior isn't at all what you want. Because the binding of ending-value
established by the LET
outside the loop is shadowed by the variable with the same name inside the DO
, the form (incf ending-value p)
increments the loop variable ending-value
instead of the outer variable with the same name, creating another infinite loop.[99]
Clearly, what you need to patch this leak is a symbol that will never be used outside the code generated by the macro. You could try using a really unlikely name, but that's no guarantee. You could also protect yourself to some extent by using packages, as described in Chapter 21. But there's a better solution.
The function GENSYM
returns a unique symbol each time it's called. This is a symbol that has never been read by the Lisp reader and never will be because it isn't interned in any package. Thus, instead of using a literal name like ending-value
, you can generate a new symbol each time do-primes
is expanded.
(defmacro do-primes ((var start end) &body body)
(let ((ending-value-name (gensym)))
`(do ((,var (next-prime ,start) (next-prime (1+ ,var)))
(,ending-value-name ,end))
((> ,var ,ending-value-name))
,@body)))
Note that the code that calls GENSYM
isn't part of the expansion; it runs as part of the macro expander and thus creates a new symbol each time the macro is expanded. This may seem a bit strange at first—ending-value-name
is a variable whose value is the name of another variable. But really it's no different from the parameter var
whose value is the name of a variable—the difference is the value of var
was created by the reader when the macro form was read, and the value of ending-value-name
is generated programmatically when the macro code runs.
With this definition the two previously problematic forms expand into code that works the way you want. The first form:
(do-primes (ending-value 0 10)
(print ending-value))
expands into the following:
(do ((ending-value (next-prime 0) (next-prime (1+ ending-value)))
(#:g2141 10))
((> ending-value #:g2141))
(print ending-value))
Now the variable used to hold the ending value is the gensymed symbol, #:g2141
. The name of the symbol, G2141
, was generated by GENSYM
but isn't significant; the thing that matters is the object identity of the symbol. Gensymed symbols are printed in the normal syntax for uninterned symbols, with a leading #:
.
The other previously problematic form:
(let ((ending-value 0))
(do-primes (p 0 10)
(incf ending-value p))
ending-value)
looks like this if you replace the do-primes
form with its expansion:
(let ((ending-value 0))
(do ((p (next-prime 0) (next-prime (1+ p)))
(#:g2140 10))
((> p #:g2140))
(incf ending-value p))
ending-value)
Again, there's no leak since the ending-value
variable bound by the LET
surrounding the do-primes
loop is no longer shadowed by any variables introduced in the expanded code.
Not all literal names used in a macro expansion will necessarily cause a problem—as you get more experience with the various binding forms, you'll be able to determine whether a given name is being used in a position that could cause a leak in a macro abstraction. But there's no real downside to using a gensymed name just to be safe.
With that fix, you've plugged all the leaks in the implementation of do-primes
. Once you've gotten a bit of macro-writing experience under your belt, you'll learn to write macros with these kinds of leaks preplugged. It's actually fairly simple if you follow these rules of thumb:
• Unless there's a particular reason to do otherwise, include any subforms in the expansion in positions that will be evaluated in the same order as the subforms appear in the macro call.
• Unless there's a particular reason to do otherwise, make sure subforms are evaluated only once by creating a variable in the expansion to hold the value of evaluating the argument form and then using that variable anywhere else the value is needed in the expansion.
• Use GENSYM
at macro expansion time to create variable names used in the expansion.
Macro-Writing Macros
Of course, there's no reason you should be able to take advantage of macros only when writing functions. The job of macros is to abstract away common syntactic patterns, and certain patterns come up again and again in writing macros that can also benefit from being abstracted away.
In fact, you've already seen one such pattern—many macros will, like the last version of do-primes
, start with a LET
that introduces a few variables holding gensymed symbols to be used in the macro's expansion. Since this is such a common pattern, why not abstract it away with its own macro?
In this section you'll write a macro, with-gensyms
, that does just that. In other words, you'll write a macro-writing macro: a macro that generates code that generates code. While complex macro-writing macros can be a bit confusing until you get used to keeping the various levels of code clear in your mind, with-gensyms
is fairly straightforward and will serve as a useful but not too strenuous mental limbering exercise.
You want to be able to write something like this:
(defmacro do-primes ((var start end) &body body)
(with-gensyms (ending-value-name)
`(do ((,var (next-prime ,start) (next-prime (1+ ,var)))
(,ending-value-name ,end))
((> ,var ,ending-value-name))
,@body)))
and have it be equivalent to the previous version of do-primes
. In other words, the with-gensyms
needs to expand into a LET
that binds each named variable, ending-value-name
in this case, to a gensymed symbol. That's easy enough to write with a simple backquote template.
(defmacro with-gensyms ((&rest names) &body body)
`(let ,(loop for n in names collect `(,n (gensym)))
,@body))
Note how you can use a comma to interpolate the value of the LOOP
expression. The loop generates a list of binding forms where each binding form consists of a list containing one of the names given to with-gensyms
and the literal code (gensym)
. You can test what code the LOOP
expression would generate at the REPL by replacing names
with a list of symbols.
CL-USER> (loop for n in '(a b c) collect `(,n (gensym)))
((A (GENSYM)) (B (GENSYM)) (C (GENSYM)))
After the list of binding forms, the body argument to with-gensyms
is spliced in as the body of the LET
. Thus, in the code you wrap in a with-gensyms
you can refer to any of the variables named in the list of variables passed to with-gensyms
.
If you macro-expand the with-gensyms
form in the new definition of do-primes
, you should see something like this:
(let ((ending-value-name (gensym)))
`(do ((,var (next-prime ,start) (next-prime (1+ ,var)))
(,ending-value-name ,end))
((> ,var ,ending-value-name))
,@body))
Looks good. While this macro is fairly trivial, it's important to keep clear about when the different macros are expanded: when you compile the DEFMACRO
of do-primes
, the with-gensyms
form is expanded into the code just shown and compiled. Thus, the compiled version of do-primes
is just the same as if you had written the outer LET
by hand. When you compile a function that uses do-primes
, the code generated by with-gensyms
runs generating the do-primes
expansion, but with-gensyms
itself isn't needed to compile a do-primes
form since it has already been expanded, back when do-primes
was compiled.
Another classic macro-writing MACRO: ONCE-ONLY |
Another classic macro-writing macro is
However, the implementation of
|
Beyond Simple Macros
I could, of course, say a lot more about macros. All the macros you've seen so far have been fairly simple examples that save you a bit of typing but don't provide radical new ways of expressing things. In upcoming chapters you'll see examples of macros that allow you to express things in ways that would be virtually impossible without macros. You'll start in the very next chapter, in which you'll build a simple but effective unit test framework.
9. Practical: Building a Unit Test Framework
In this chapter you'll return to cutting code and develop a simple unit testing framework for Lisp. This will give you a chance to use some of the features you've learned about since Chapter 3, including macros and dynamic variables, in real code.
The main design goal of the test framework will be to make it as easy as possible to add new tests, to run various suites of tests, and to track down test failures. For now you'll focus on designing a framework you can use during interactive development.
The key feature of an automated testing framework is that the framework is responsible for telling you whether all the tests passed. You don't want to spend your time slogging through test output checking answers when the computer can do it much more quickly and accurately. Consequently, each test case must be an expression that yields a boolean value—true or false, pass or fail. For instance, if you were writing tests for the built-in +
function, these might be reasonable test cases:[100]
(= (+ 1 2) 3)
(= (+ 1 2 3) 6)
(= (+ -1 -3) -4)
Functions that have side effects will be tested slightly differently—you'll have to call the function and then check for evidence of the expected side effects.[101] But in the end, every test case has to boil down to a boolean expression, thumbs up or thumbs down.
Two First Tries
If you were doing ad hoc testing, you could enter these expressions at the REPL and check that they return T
. But you want a framework that makes it easy to organize and run these test cases whenever you want. If you want to start with the simplest thing that could possibly work, you can just write a function that evaluates the test cases and AND
s the results together.
(defun test-+ ()
(and
(= (+ 1 2) 3)
(= (+ 1 2 3) 6)
(= (+ -1 -3) -4)))
Whenever you want to run this set of test cases, you can call test-+
.
CL-USER> (test-+)
T
As long as it returns T
, you know the test cases are passing. This way of organizing tests is also pleasantly concise—you don't have to write a bunch of test bookkeeping code. However, as you'll discover the first time a test case fails, the result reporting leaves something to be desired. When test-+
returns NIL
, you'll know something failed, but you'll have no idea which test case it was.
So let's try another simple—even simpleminded—approach. To find out what happens to each test case, you could write something like this:
(defun test-+ ()
(format t "~:[FAIL~;pass~] ... ~a~%" (= (+ 1 2) 3) '(= (+ 1 2) 3))
(format t "~:[FAIL~;pass~] ... ~a~%" (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6))
(format t "~:[FAIL~;pass~] ... ~a~%" (= (+ -1 -3) -4) '(= (+ -1 -3) -4)))
Now each test case will be reported individually. The ~:[FAIL~;pass~]
part of the FORMAT
directive causes FORMAT
to print "FAIL" if the first format argument is false and "pass" otherwise.[102] Then you label the result with the test expression itself. Now running test-+
shows you exactly what's going on.
CL-USER> (test-+)
pass ... (= (+ 1 2) 3)
pass ... (= (+ 1 2 3) 6)
pass ... (= (+ -1 -3) -4)
NIL
This time the result reporting is more like what you want, but the code itself is pretty gross. The repeated calls to FORMAT
as well as the tedious duplication of the test expression cry out to be refactored. The duplication of the test expression is particularly grating because if you mistype it, the test results will be mislabeled.
Another problem is that you don't get a single indicator whether all the test cases passed. It's easy enough, with only three test cases, to scan the output looking for "FAIL"; however, when you have hundreds of test cases, it'll be more of a hassle.
Refactoring
What you'd really like is a way to write test functions as streamlined as the first test-+
that return a single T
or NIL
value but that also report on the results of individual test cases like the second version. Since the second version is close to what you want in terms of functionality, your best bet is to see if you can factor out some of the annoying duplication.
The simplest way to get rid of the repeated similar calls to FORMAT
is to create a new function.
(defun report-result (result form)
(format t "~:[FAIL~;pass~] ... ~a~%" result form))
Now you can write test-+
with calls to report-result
instead of FORMAT
. It's not a huge improvement, but at least now if you decide to change the way you report results, there's only one place you have to change.
(defun test-+ ()
(report-result (= (+ 1 2) 3) '(= (+ 1 2) 3))
(report-result (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6))
(report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4)))
Next you need to get rid of the duplication of the test case expression, with its attendant risk of mislabeling of results. What you'd really like is to be able to treat the expression as both code (to get the result) and data (to use as the label). Whenever you want to treat code as data, that's a sure sign you need a macro. Or, to look at it another way, what you need is a way to automate writing the error-prone report-result
calls. You'd like to be able to say something like this:
(check (= (+ 1 2) 3))
and have it mean the following:
(report-result (= (+ 1 2) 3) '(= (+ 1 2) 3))
Writing a macro to do this translation is trivial.
(defmacro check (form)
`(report-result ,form ',form))
Now you can change test-+
to use check
.
(defun test-+ ()
(check (= (+ 1 2) 3))
(check (= (+ 1 2 3) 6))
(check (= (+ -1 -3) -4)))
Since you're on the hunt for duplication, why not get rid of those repeated calls to check
? You can define check
to take an arbitrary number of forms and wrap them each in a call to report-result
.
(defmacro check (&body forms)
`(progn
,@(loop for f in forms collect `(report-result ,f ',f))))
This definition uses a common macro idiom of wrapping a PROGN
around a series of forms in order to turn them into a single form. Notice also how you can use ,@
to splice in the result of an expression that returns a list of expressions that are themselves generated with a backquote template.
With the new version of check
you can write a new version of test-+
like this:
(defun test-+ ()
(check
(= (+ 1 2) 3)
(= (+ 1 2 3) 6)
(= (+ -1 -3) -4)))
that is equivalent to the following code:
(defun test-+ ()
(progn
(report-result (= (+ 1 2) 3) '(= (+ 1 2) 3))
(report-result (= (+ 1 2 3) 6) '(= (+ 1 2 3) 6))
(report-result (= (+ -1 -3) -4) '(= (+ -1 -3) -4))))
Thanks to check
, this version is as concise as the first version of test-+
but expands into code that does the same thing as the second version. And now any changes you want to make to how test-+
behaves, you can make by changing check
.
Fixing the Return Value
You can start with fixing test-+
so its return value indicates whether all the test cases passed. Since check
is responsible for generating the code that ultimately runs the test cases, you just need to change it to generate code that also keeps track of the results.
As a first step, you can make a small change to report-result
so it returns the result of the test case it's reporting.
(defun report-result (result form)
(format t "~:[FAIL~;pass~] ... ~a~%" result form)
result)
Now that report-result
returns the result of its test case, it might seem you could just change the PROGN
to an AND
to combine the results. Unfortunately, AND
doesn't do quite what you want in this case because of its short-circuiting behavior: as soon as one test case fails, AND
will skip the rest. On the other hand, if you had a construct that worked like AND
without the short-circuiting, you could use it in the place of PROGN
, and you'd be done. Common Lisp doesn't provide such a construct, but that's no reason you can't use it: it's a trivial matter to write a macro to provide it yourself.
Leaving test cases aside for a moment, what you want is a macro—let's call it combine-results
—that will let you say this:
(combine-results
(foo)
(bar)
(baz))
and have it mean something like this:
(let ((result t))
(unless (foo) (setf result nil))
(unless (bar) (setf result nil))
(unless (baz) (setf result nil))
result)
The only tricky bit to writing this macro is that you need to introduce a variable—result
in the previous code—in the expansion. As you saw in the previous chapter, using a literal name for variables in macro expansions can introduce a leak in your macro abstraction, so you'll need to create a unique name. This is a job for with-gensyms
. You can define combine-results
like this:
(defmacro combine-results (&body forms)
(with-gensyms (result)
`(let ((,result t))
,@(loop for f in forms collect `(unless ,f (setf ,result nil)))
,result)))
Now you can fix check
by simply changing the expansion to use combine-results
instead of PROGN
.
(defmacro check (&body forms)
`(combine-results
,@(loop for f in forms collect `(report-result ,f ',f))))
With that version of check
, test-+
should emit the results of its three test expressions and then return T
to indicate that everything passed.[103]
CL-USER> (test-+)
pass ... (= (+ 1 2) 3)
pass ... (= (+ 1 2 3) 6)
pass ... (= (+ -1 -3) -4)
T
And if you change one of the test cases so it fails,[104] the final return value changes to NIL
.
CL-USER> (test-+)
pass ... (= (+ 1 2) 3)
pass ... (= (+ 1 2 3) 6)
FAIL ... (= (+ -1 -3) -5)
NIL
Better Result Reporting
As long as you have only one test function, the current result reporting is pretty clear. If a particular test case fails, all you have to do is find the test case in the check
form and figure out why it's failing. But if you write a lot of tests, you'll probably want to organize them somehow, rather than shoving them all into one function. For instance, suppose you wanted to add some test cases for the *
function. You might write a new test function.
(defun test-* ()
(check
(= (* 2 2) 4)
(= (* 3 5) 15)))
Now that you have two test functions, you'll probably want another function that runs all the tests. That's easy enough.
(defun test-arithmetic ()
(combine-results
(test-+)
(test-*)))
In this function you use combine-results
instead of check
since both test-+
and test-*
will take care of reporting their own results. When you run test-arithmetic
, you'll get the following results:
CL-USER> (test-arithmetic)
pass ... (= (+ 1 2) 3)
pass ... (= (+ 1 2 3) 6)
pass ... (= (+ -1 -3) -4)
pass ... (= (* 2 2) 4)
pass ... (= (* 3 5) 15)
T
Now imagine that one of the test cases failed and you need to track down the problem. With only five test cases and two test functions, it won't be too hard to find the code of the failing test case. But suppose you had 500 test cases spread across 20 functions. It might be nice if the results told you what function each test case came from.
Since the code that prints the results is centralized in report-result
, you need a way to pass information about what test function you're in to report-result
. You could add a parameter to report-result
to pass this information, but check
, which generates the calls to report-result
, doesn't know what function it's being called from, which means you'd also have to change the way you call check
, passing it an argument that it simply passes onto report-result
.
This is exactly the kind of problem dynamic variables were designed to solve. If you create a dynamic variable that each test function binds to the name of the function before calling check
, then report-result
can use it without check
having to know anything about it.
Step one is to declare the variable at the top level.
(defvar *test-name* nil)
Now you need to make another tiny change to report-result
to include *test-name*
in the FORMAT
output.
(format t "~:[FAIL~;pass~] ... ~a: ~a~%" result *test-name* form)
With those changes, the test functions will still work but will produce the following output because *test-name*
is never rebound:
CL-USER> (test-arithmetic)
pass ... NIL: (= (+ 1 2) 3)
pass ... NIL: (= (+ 1 2 3) 6)
pass ... NIL: (= (+ -1 -3) -4)
pass ... NIL: (= (* 2 2) 4)
pass ... NIL: (= (* 3 5) 15)
T
For the name to be reported properly, you need to change the two test functions.
(defun test-+ ()
(let ((*test-name* 'test-+))
(check
(= (+ 1 2) 3)
(= (+ 1 2 3) 6)
(= (+ -1 -3) -4))))
(defun test-* ()
(let ((*test-name* 'test-*))
(check
(= (* 2 2) 4)
(= (* 3 5) 15))))
Now the results are properly labeled.
CL-USER> (test-arithmetic)
pass ... TEST-+: (= (+ 1 2) 3)
pass ... TEST-+: (= (+ 1 2 3) 6)
pass ... TEST-+: (= (+ -1 -3) -4)
pass ... TEST-*: (= (* 2 2) 4)
pass ... TEST-*: (= (* 3 5) 15)
T
An Abstraction Emerges
In fixing the test functions, you've introduced several new bits of duplication. Not only does each function have to include the name of the function twice—once as the name in the DEFUN
and once in the binding of *test-name*
—but the same three-line code pattern is duplicated between the two functions. You could remove the duplication simply on the grounds that duplication is bad. But if you look more closely at the root cause of the duplication, you can learn an important lesson about how to use macros.
The reason both these functions start the same way is because they're both test functions. The duplication arises because, at the moment, test function is only half an abstraction. The abstraction exists in your mind, but in the code there's no way to express "this is a test function" other than to write code that follows a particular pattern.
Unfortunately, partial abstractions are a crummy tool for building software. Because a half abstraction is expressed in code by a manifestation of the pattern, you're guaranteed to have massive code duplication with all the normal bad consequences that implies for maintainability. More subtly, because the abstraction exists only in the minds of programmers, there's no mechanism to make sure different programmers (or even the same programmer working at different times) actually understand the abstraction the same way. To make a complete abstraction, you need a way to express "this is a test function" and have all the code required by the pattern be generated for you. In other words, you need a macro.
Because the pattern you're trying to capture is a DEFUN
plus some boilerplate code, you need to write a macro that will expand into a DEFUN
. You'll then use this macro, instead of a plain DEFUN
to define test functions, so it makes sense to call it deftest
.
(defmacro deftest (name parameters &body body)
`(defun ,name ,parameters
(let ((*test-name* ',name))
,@body)))
With this macro you can rewrite test-+
as follows:
(deftest test-+ ()
(check
(= (+ 1 2) 3)
(= (+ 1 2 3) 6)
(= (+ -1 -3) -4)))
A Hierarchy of Tests
Now that you've established test functions as first-class citizens, the question might arise, should test-arithmetic
be a test function? As things stand, it doesn't really matter—if you did define it with deftest
, its binding of *test-name*
would be shadowed by the bindings in test-+
and test-*
before any results are reported.
But now imagine you've got thousands of test cases to organize. The first level of organization is provided by test functions such as test-+
and test-*
that directly call check
. But with thousands of test cases, you'll likely need other levels of organization. Functions such as test-arithmetic
can group related test functions into test suites. Now suppose some low-level test functions are called from multiple test suites. It's not unheard of for a test case to pass in one context but fail in another. If that happens, you'll probably want to know more than just what low-level test function contains the test case.
If you define the test suite functions such as test-arithmetic
with deftest
and make a small change to the *test-name*
bookkeeping, you can have results reported with a "fully qualified" path to the test case, something like this:
pass ... (TEST-ARITHMETIC TEST-+): (= (+ 1 2) 3)
Because you've already abstracted the process of defining a test function, you can change the bookkeeping details without modifying the code of the test functions.[105] To make *test-name*
hold a list of test function names instead of just the name of the most recently entered test function, you just need to change this binding form:
(let ((*test-name* ',name))
to the following:
(let ((*test-name* (append *test-name* (list ',name))))
Since APPEND
returns a new list made up of the elements of its arguments, this version will bind *test-name*
to a list containing the old contents of *test-name*
with the new name tacked onto the end.[106] When each test function returns, the old value of *test-name*
will be restored.
Now you can redefine test-arithmetic
with deftest
instead of DEFUN
.
(deftest test-arithmetic ()
(combine-results
(test-+)
(test-*)))
The results now show exactly how you got to each test expression.
CL-USER> (test-arithmetic)
pass ... (TEST-ARITHMETIC TEST-+): (= (+ 1 2) 3)
pass ... (TEST-ARITHMETIC TEST-+): (= (+ 1 2 3) 6)
pass ... (TEST-ARITHMETIC TEST-+): (= (+ -1 -3) -4)
pass ... (TEST-ARITHMETIC TEST-*): (= (* 2 2) 4)
pass ... (TEST-ARITHMETIC TEST-*): (= (* 3 5) 15)
T
As your test suite grows, you can add new layers of test functions; as long as they're defined with deftest
, the results will be reported correctly. For instance, the following:
(deftest test-math ()
(test-arithmetic))
would generate these results:
CL-USER> (test-math)
pass ... (TEST-MATH TEST-ARITHMETIC TEST-+): (= (+ 1 2) 3)
pass ... (TEST-MATH TEST-ARITHMETIC TEST-+): (= (+ 1 2 3) 6)
pass ... (TEST-MATH TEST-ARITHMETIC TEST-+): (= (+ -1 -3) -4)
pass ... (TEST-MATH TEST-ARITHMETIC TEST-*): (= (* 2 2) 4)
pass ... (TEST-MATH TEST-ARITHMETIC TEST-*): (= (* 3 5) 15)
T
Wrapping Up
You could keep going, adding more features to this test framework. But as a framework for writing tests with a minimum of busywork and easily running them from the REPL, this is a reasonable start. Here's the complete code, all 26 lines of it:
(defvar *test-name* nil)
(defmacro deftest (name parameters &body body)
"Define a test function. Within a test function we can call
other test functions or use 'check' to run individual test
cases."
`(defun ,name ,parameters
(let ((*test-name* (append *test-name* (list ',name))))
,@body)))
(defmacro check (&body forms)
"Run each expression in 'forms' as a test case."
`(combine-results
,@(loop for f in forms collect `(report-result ,f ',f))))
(defmacro combine-results (&body forms)
"Combine the results (as booleans) of evaluating 'forms' in order."
(with-gensyms (result)
`(let ((,result t))
,@(loop for f in forms collect `(unless ,f (setf ,result nil)))
,result)))
(defun report-result (result form)
"Report the results of a single test case. Called by 'check'."
(format t "~:[FAIL~;pass~] ... ~a: ~a~%" result *test-name* form)
result)
It's worth reviewing how you got here because it's illustrative of how programming in Lisp often goes.
You started by defining a simple version of your problem—how to evaluate a bunch of boolean expressions and find out if they all returned true. Just AND
ing them together worked and was syntactically clean but revealed the need for better result reporting. So you wrote some really simpleminded code, chock-full of duplication and error-prone idioms that reported the results the way you wanted.
The next step was to see if you could refactor the second version into something as clean as the former. You started with a standard refactoring technique of extracting some code into a function, report-result
. Unfortunately, you could see that using report-result
was going to be tedious and error-prone since you had to pass the test expression twice, once for the value and once as quoted data. So you wrote the check
macro to automate the details of calling report-result
correctly.
While writing check
, you realized as long as you were generating code, you could make a single call to check
to generate multiple calls to report-result
, getting you back to a version of test-+
about as concise as the original AND
version.
At that point you had the check
API nailed down, which allowed you to start mucking with how it worked on the inside. The next task was to fix check
so the code it generated would return a boolean indicating whether all the test cases had passed. Rather than immediately hacking away at check
, you paused to indulge in a little language design by fantasy. What if—you fantasized—there was already a non-short-circuiting AND
construct. Then fixing check
would be trivial. Returning from fantasyland you realized there was no such construct but that you could write one in a few lines. After writing combine-results
, the fix to check
was indeed trivial.
At that point all that was left was to make a few more improvements to the way you reported test results. Once you started making changes to the test functions, you realized those functions represented a special category of function that deserved its own abstraction. So you wrote deftest
to abstract the pattern of code that turns a regular function into a test function.
With deftest
providing an abstraction barrier between the test definitions and the underlying machinery, you were able to enhance the result reporting without touching the test functions.
Now, with the basics of functions, variables, and macros mastered, and a little practical experience using them, you're ready to start exploring Common Lisp's rich standard library of functions and data types.
10. Numbers, Characters, and Strings
While functions, variables, macros, and 25 special operators provide the basic building blocks of the language itself, the building blocks of your programs will be the data structures you use. As Fred Brooks observed in The Mythical Man-Month, "Representation is the essence of programming."[107]
Common Lisp provides built-in support for most of the data types typically found in modern languages: numbers (integer, floating point, and complex), characters, strings, arrays (including multidimensional arrays), lists, hash tables, input and output streams, and an abstraction for portably representing filenames. Functions are also a first-class data type in Lisp—they can be stored in variables, passed as arguments, returned as return values, and created at runtime.
And these built-in types are just the beginning. They're defined in the language standard so programmers can count on them being available and because they tend to be easier to implement efficiently when tightly integrated with the rest of the implementation. But, as you'll see in later chapters, Common Lisp also provides several ways for you to define new data types, define operations on them, and integrate them with the built-in data types.
For now, however, you can start with the built-in data types. Because Lisp is a high-level language, the details of exactly how different data types are implemented are largely hidden. From your point of view as a user of the language, the built-in data types are defined by the functions that operate on them. So to learn a data type, you just have to learn about the functions you can use with it. Additionally, most of the built-in data types have a special syntax that the Lisp reader understands and that the Lisp printer uses. That's why, for instance, you can write strings as "foo"
; numbers as 123
, 1/23
, and 1.23
; and lists as (a b c)
. I'll describe the syntax for different kinds of objects when I describe the functions for manipulating them.
In this chapter, I'll cover the built-in "scalar" data types: numbers, characters, and strings. Technically, strings aren't true scalars—a string is a sequence of characters, and you can access individual characters and manipulate strings with a function that operates on sequences. But I'll discuss strings here because most of the string-specific functions manipulate them as single values and also because of the close relation between several of the string functions and their character counterparts.
Numbers
Math, as Barbie says, is hard.[108] Common Lisp can't make the math part any easier, but it does tend to get in the way a lot less than other programming languages. That's not surprising given its mathematical heritage. Lisp was originally designed by a mathematician as a tool for studying mathematical functions. And one of the main projects of the MAC project at MIT was the Macsyma symbolic algebra system, written in Maclisp, one of Common Lisp's immediate predecessors. Additionally, Lisp has been used as a teaching language at places such as MIT where even the computer science professors cringe at the thought of telling their students that 10/4 = 2, leading to Lisp's support for exact ratios. And at various times Lisp has been called upon to compete with FORTRAN in the high-performance numeric computing arena.
One of the reasons Lisp is a nice language for math is its numbers behave more like true mathematical numbers than the approximations of numbers that are easy to implement in finite computer hardware. For instance, integers in Common Lisp can be almost arbitrarily large rather than being limited by the size of a machine word.[109] And dividing two integers results in an exact ratio, not a truncated value. And since ratios are represented as pairs of arbitrarily sized integers, ratios can represent arbitrarily precise fractions.[110]
On the other hand, for high-performance numeric programming, you may be willing to trade the exactitude of rationals for the speed offered by using the hardware's underlying floating-point operations. So, Common Lisp also offers several types of floating-point numbers, which are mapped by the implementation to the appropriate hardware-supported floating-point representations.[111] Floats are also used to represent the results of a computation whose true mathematical value would be an irrational number.
Finally, Common Lisp supports complex numbers—the numbers that result from doing things such as taking square roots and logarithms of negative numbers. The Common Lisp standard even goes so far as to specify the principal values and branch cuts for irrational and transcendental functions on the complex domain.
Numeric Literals
You can write numeric literals in a variety of ways; you saw a few examples in Chapter 4. However, it's important to keep in mind the division of labor between the Lisp reader and the Lisp evaluator—the reader is responsible for translating text into Lisp objects, and the Lisp evaluator then deals only with those objects. For a given number of a given type, there can be many different textual representations, all of which will be translated to the same object representation by the Lisp reader. For instance, you can write the integer 10 as 10
, 20/2
, #xA
, or any of a number of other ways, but the reader will translate all these to the same object. When numbers are printed back out—say, at the REPL—they're printed in a canonical textual syntax that may be different from the syntax used to enter the number. For example:
CL-USER> 10
10
CL-USER> 20/2
10
CL-USER> #xa
10
The syntax for integer values is an optional sign (+
or -
) followed by one or more digits. Ratios are written as an optional sign and a sequence of digits, representing the numerator, a slash (/
), and another sequence of digits representing the denominator. All rational numbers are "canonicalized" as they're read—that's why 10
and 20/2
are both read as the same number, as are 3/4
and 6/8
. Rationals are printed in "reduced" form—integer values are printed in integer syntax and ratios with the numerator and denominator reduced to lowest terms.
It's also possible to write rationals in bases other than 10. If preceded by #B
or #b
, a rational literal is read as a binary number with 0
and 1
as the only legal digits. An #O
or #o
indicates an octal number (legal digits 0
-7
), and #X
or #x
indicates hexadecimal (legal digits 0
-F
or 0
-f
). You can write rationals in other bases from 2 to 36 with #nR
where n is the base (always written in decimal). Additional "digits" beyond 9 are taken from the letters A
-Z
or a
-z
. Note that these radix indicators apply to the whole rational—it's not possible to write a ratio with the numerator in one base and denominator in another. Also, you can write integer values, but not ratios, as decimal digits terminated with a decimal point.[112] Some examples of rationals, with their canonical, decimal representation are as follows:
123 ==> 123
+123 ==> 123
-123 ==> -123
123. ==> 123
2/3 ==> 2/3
-2/3 ==> -2/3
4/6 ==> 2/3
6/3 ==> 2
#b10101 ==> 21
#b1010/1011 ==> 10/11
#o777 ==> 511
#xDADA ==> 56026
#36rABCDEFGHIJKLMNOPQRSTUVWXYZ ==> 8337503854730415241050377135811259267835
You can also write floating-point numbers in a variety of ways. Unlike rational numbers, the syntax used to notate a floating-point number can affect the actual type of number read. Common Lisp defines four subtypes of floating-point number: short, single, double, and long. Each subtype can use a different number of bits in its representation, which means each subtype can represent values spanning a different range and with different precision. More bits gives a wider range and more precision.[113]
The basic format for floating-point numbers is an optional sign followed by a nonempty sequence of decimal digits possibly with an embedded decimal point. This sequence can be followed by an exponent marker for "computerized scientific notation."[114] The exponent marker consists of a single letter followed by an optional sign and a sequence of digits, which are interpreted as the power of ten by which the number before the exponent marker should be multiplied. The letter does double duty: it marks the beginning of the exponent and indicates what floating- point representation should be used for the number. The exponent markers s, f, d, l (and their uppercase equivalents) indicate short, single, double, and long floats, respectively. The letter e indicates that the default representation (initially single-float) should be used.
Numbers with no exponent marker are read in the default representation and must contain a decimal point followed by at least one digit to distinguish them from integers. The digits in a floating-point number are always treated as base 10 digits—the #B
, #X
, #O
, and #R
syntaxes work only with rationals. The following are some example floating-point numbers along with their canonical representation:
1.0 ==> 1.0
1e0 ==> 1.0
1d0 ==> 1.0d0
123.0 ==> 123.0
123e0 ==> 123.0
0.123 ==> 0.123
.123 ==> 0.123
123e-3 ==> 0.123
123E-3 ==> 0.123
0.123e20 ==> 1.23e+19
123d23 ==> 1.23d+25
Finally, complex numbers are written in their own syntax, namely, #C
or #c
followed by a list of two real numbers representing the real and imaginary part of the complex number. There are actually five kinds of complex numbers because the real and imaginary parts must either both be rational or both be the same kind of floating-point number.
But you can write them however you want—if a complex is written with one rational and one floating-point part, the rational is converted to a float of the appropriate representation. Similarly, if the real and imaginary parts are both floats of different representations, the one in the smaller representation will be upgraded.
However, no complex numbers have a rational real component and a zero imaginary part—since such values are, mathematically speaking, rational, they're represented by the appropriate rational value. The same mathematical argument could be made for complex numbers with floating-point components, but for those complex types a number with a zero imaginary part is always a different object than the floating-point number representing the real component. Here are some examples of numbers written the complex number syntax:
#c(2 1) ==> #c(2 1)
#c(2/3 3/4) ==> #c(2/3 3/4)
#c(2 1.0) ==> #c(2.0 1.0)
#c(2.0 1.0d0) ==> #c(2.0d0 1.0d0)
#c(1/2 1.0) ==> #c(0.5 1.0)
#c(3 0) ==> 3
#c(3.0 0.0) ==> #c(3.0 0.0)
#c(1/2 0) ==> 1/2
#c(-6/3 0) ==> -2
Basic Math
The basic arithmetic operations—addition, subtraction, multiplication, and division—are supported for all the different kinds of Lisp numbers with the functions +
, -
, *
, and /
. Calling any of these functions with more than two arguments is equivalent to calling the same function on the first two arguments and then calling it again on the resulting value and the rest of the arguments. For example, (+ 1 2 3)
is equivalent to (+ (+ 1 2) 3)
. With only one argument, +
and *
return the value; -
returns its negation and /
its reciprocal.[115]
(+ 1 2) ==> 3
(+ 1 2 3) ==> 6
(+ 10.0 3.0) ==> 13.0
(+ #c(1 2) #c(3 4)) ==> #c(4 6)
(- 5 4) ==> 1
(- 2) ==> -2
(- 10 3 5) ==> 2
(* 2 3) ==> 6
(* 2 3 4) ==> 24
(/ 10 5) ==> 2
(/ 10 5 2) ==> 1
(/ 2 3) ==> 2/3
(/ 4) ==> 1/4
If all the arguments are the same type of number (rational, floating point, or complex), the result will be the same type except in the case where the result of an operation on complex numbers with rational components yields a number with a zero imaginary part, in which case the result will be a rational. However, floating-point and complex numbers are contagious—if all the arguments are reals but one or more are floating-point numbers, the other arguments are converted to the nearest floating-point value in a "largest" floating-point representation of the actual floating-point arguments. Floating-point numbers in a "smaller" representation are also converted to the larger representation. Similarly, if any of the arguments are complex, any real arguments are converted to the complex equivalents.
(+ 1 2.0) ==> 3.0
(/ 2 3.0) ==> 0.6666667
(+ #c(1 2) 3) ==> #c(4 2)
(+ #c(1 2) 3/2) ==> #c(5/2 2)
(+ #c(1 1) #c(2 -1)) ==> 3
Because /
doesn't truncate, Common Lisp provides four flavors of truncating and rounding for converting a real number (rational or floating point) to an integer: FLOOR
truncates toward negative infinity, returning the largest integer less than or equal to the argument. CEILING
truncates toward positive infinity, returning the smallest integer greater than or equal to the argument. TRUNCATE
truncates toward zero, making it equivalent to FLOOR
for positive arguments and to CEILING
for negative arguments. And ROUND
rounds to the nearest integer. If the argument is exactly halfway between two integers, it rounds to the nearest even integer.
Two related functions are MOD
and REM
, which return the modulus and remainder of a truncating division on real numbers. These two functions are related to the FLOOR
and TRUNCATE
functions as follows:
(+ (* (floor (/ x y)) y) (mod x y)) === x
(+ (* (truncate (/ x y)) y) (rem x y)) === x
Thus, for positive quotients they're equivalent, but for negative quotients they produce different results.[116]
The functions 1+
and 1-
provide a shorthand way to express adding and subtracting one from a number. Note that these are different from the macros INCF
and DECF
. 1+
and 1-
are just functions that return a new value, but INCF
and DECF
modify a place. The following equivalences show the relation between INCF
/DECF
, 1+
/1-
, and +
/-
:
(incf x) === (setf x (1+ x)) === (setf x (+ x 1))
(decf x) === (setf x (1- x)) === (setf x (- x 1))
(incf x 10) === (setf x (+ x 10))
(decf x 10) === (setf x (- x 10))
Numeric Comparisons
The function =
is the numeric equality predicate. It compares numbers by mathematical value, ignoring differences in type. Thus, =
will consider mathematically equivalent values of different types equivalent while the generic equality predicate EQL
would consider them inequivalent because of the difference in type. (The generic equality predicate EQUALP
, however, uses =
to compare numbers.) If it's called with more than two arguments, it returns true only if they all have the same value. Thus:
(= 1 1) ==> T
(= 10 20/2) ==> T
(= 1 1.0 #c(1.0 0.0) #c(1 0)) ==> T
The /=
function, conversely, returns true only if all its arguments are different values.
(/= 1 1) ==> NIL
(/= 1 2) ==> T
(/= 1 2 3) ==> T
(/= 1 2 3 1) ==> NIL
(/= 1 2 3 1.0) ==> NIL
The functions <
, >
, <=
, and >=
order rationals and floating-point numbers (in other words, the real numbers.) Like =
and /=
, these functions can be called with more than two arguments, in which case each argument is compared to the argument to its right.
(< 2 3) ==> T
(> 2 3) ==> NIL
(> 3 2) ==> T
(< 2 3 4) ==> T
(< 2 3 3) ==> NIL
(<= 2 3 3) ==> T
(<= 2 3 3 4) ==> T
(<= 2 3 4 3) ==> NIL
To pick out the smallest or largest of several numbers, you can use the function MIN
or MAX
, which takes any number of real number arguments and returns the minimum or maximum value.
(max 10 11) ==> 11
(min -12 -10) ==> -12
(max -1 2 -3) ==> 2
Some other handy functions are ZEROP
, MINUSP
, and PLUSP
, which test whether a single real number is equal to, less than, or greater than zero. Two other predicates, EVENP
and ODDP
, test whether a single integer argument is even or odd. The P suffix on the names of these functions is a standard naming convention for predicate functions, functions that test some condition and return a boolean.
Higher Math
The functions you've seen so far are the beginning of the built-in mathematical functions. Lisp also supports logarithms: LOG
; exponentiation: EXP
and EXPT
; the basic trigonometric functions: SIN
, COS
, and TAN
; their inverses: ASIN
, ACOS
, and ATAN
; hyperbolic functions: SINH
, COSH
, and TANH
; and their inverses: ASINH
, ACOSH
, and ATANH
. It also provides functions to get at the individual bits of an integer and to extract the parts of a ratio or a complex number. For a complete list, see any Common Lisp reference.
Characters
Common Lisp characters are a distinct type of object from numbers. That's as it should be—characters are not numbers, and languages that treat them as if they are tend to run into problems when character encodings change, say, from 8-bit ASCII to 21-bit Unicode.[117] Because the Common Lisp standard didn't mandate a particular representation for characters, today several Lisp implementations use Unicode as their "native" character encoding despite Unicode being only a gleam in a standards body's eye at the time Common Lisp's own standardization was being wrapped up.
The read syntax for characters objects is simple: #\
followed by the desired character. Thus, #\x
is the character x
. Any character can be used after the #\
, including otherwise special characters such as "
, (
, and whitespace. However, writing whitespace characters this way isn't very (human) readable; an alternative syntax for certain characters is #\
followed by the character's name. Exactly what names are supported depends on the character set and on the Lisp implementation, but all implementations support the names Space and Newline. Thus, you should write #\Space
instead of #\
, though the latter is technically legal. Other semistandard names (that implementations must use if the character set has the appropriate characters) are Tab, Page, Rubout, Linefeed, Return, and Backspace.
Character Comparisons
The main thing you can do with characters, other than putting them into strings (which I'll get to later in this chapter), is to compare them with other characters. Since characters aren't numbers, you can't use the numeric comparison functions, such as <
and >
. Instead, two sets of functions provide character-specific analogs to the numeric comparators; one set is case-sensitive and the other case-insensitive.
The case-sensitive analog to the numeric =
is the function CHAR=
. Like =
, CHAR=
can take any number of arguments and returns true only if they're all the same character. The case- insensitive version is CHAR-EQUAL
.
The rest of the character comparators follow this same naming scheme: the case-sensitive comparators are named by prepending the analogous numeric comparator with CHAR
; the case-insensitive versions spell out the comparator name, separated from the CHAR
with a hyphen. Note, however, that <=
and >=
are "spelled out" with the logical equivalents NOT-GREATERP
and NOT-LESSP
rather than the more verbose LESSP-OR-EQUALP
and GREATERP-OR-EQUALP
. Like their numeric counterparts, all these functions can take one or more arguments. Table 10-1 summarizes the relation between the numeric and character comparison functions.
Table 10-1. Character Comparison Functions
Numeric Analog | Case-Sensitive | Case-Insensitive |
= |
CHAR= |
CHAR-EQUAL |
/= |
CHAR/= |
CHAR-NOT-EQUAL |
< |
CHAR< |
CHAR-LESSP |
> |
CHAR> |
CHAR-GREATERP |
<= |
CHAR<= |
CHAR-NOT-GREATERP |
>= |
CHAR>= |
CHAR-NOT-LESSP |
among other things, testing whether given character is alphabetic or a digit character, testing the case of a character, obtaining a corresponding character in a different case, and translating between numeric values representing character codes and actual character objects. Again, for complete details, see your favorite Common Lisp reference.
Strings
As mentioned earlier, strings in Common Lisp are really a composite data type, namely, a one-dimensional array of characters. Consequently, I'll cover many of the things you can do with strings in the next chapter when I discuss the many functions for manipulating sequences, of which strings are just one type. But strings also have their own literal syntax and a library of functions for performing string-specific operations. I'll discuss these aspects of strings in this chapter and leave the others for Chapter 11.
As you've seen, literal strings are written enclosed in double quotes. You can include any character supported by the character set in a literal string except double quote ("
) and backslash (\). And you can include these two as well if you escape them with a backslash. In fact, backslash always escapes the next character, whatever it is, though this isn't necessary for any character except for "
and itself. Table 10-2 shows how various literal strings will be read by the Lisp reader.
Table 10-2. Literal Strings
Literal | Contents | Comment |
"foobar" |
foobar | Plain string. |
"foo\"bar" |
foo"bar | The backslash escapes quote. |
"foo\\bar" |
foo\bar | The first backslash escapes second backslash. |
"\"foobar\"" |
"foobar" | The backslashes escape quotes. |
"foo\bar" |
foobar | The backslash "escapes" b |
Note that the REPL will ordinarily print strings in readable form, adding the enclosing quotation marks and any necessary escaping backslashes, so if you want to see the actual contents of a string, you need to use function such as FORMAT
designed to print human-readable output. For example, here's what you see if you type a string containing an embedded quotation mark at the REPL:
CL-USER> "foo\"bar"
"foo\"bar"
FORMAT
, on the other hand, will show you the actual string contents:[118]
CL-USER> (format t "foo\"bar")
foo"bar
NIL
String Comparisons
You can compare strings using a set of functions that follow the same naming convention as the character comparison functions except with STRING
as the prefix rather than CHAR
(see Table 10-3).
Table 10-3. String Comparison Functions
Numeric Analog | Case-Sensitive | Case-Insensitive |
= |
STRING= |
STRING-EQUAL |
/= |
STRING/= |
STRING-NOT-EQUAL |
< |
STRING< |
STRING-LESSP |
> |
STRING> |
STRING-GREATERP |
<= |
STRING<= |
STRING-NOT-GREATERP |
>= |
STRING>= |
STRING-NOT-LESSP |
However, unlike the character and number comparators, the string comparators can compare only two strings. That's because they also take keyword arguments that allow you to restrict the comparison to a substring of either or both strings. The arguments—:start1
, :end1
, :start2
, and :end2
—specify the starting (inclusive) and ending (exclusive) indices of substrings in the first and second string arguments. Thus, the following:
(string= "foobarbaz" "quuxbarfoo" :start1 3 :end1 6 :start2 4 :end2 7)
compares the substring "bar" in the two arguments and returns true. The :end1
and :end2
arguments can be NIL
(or the keyword argument omitted altogether) to indicate that the corresponding substring extends to the end of the string.
The comparators that return true when their arguments differ—that is, all of them except STRING=
and STRING-EQUAL
—return the index in the first string where the mismatch was detected.
(string/= "lisp" "lissome") ==> 3
If the first string is a prefix of the second, the return value will be the length of the first string, that is, one greater than the largest valid index into the string.
(string< "lisp" "lisper") ==> 4
When comparing substrings, the resulting value is still an index into the string as a whole. For instance, the following compares the substrings "bar" and "baz" but returns 5 because that's the index of the r in the first string:
(string< "foobar" "abaz" :start1 3 :start2 1) ==> 5 ; N.B. not 2
Other string functions allow you to convert the case of strings and trim characters from one or both ends of a string. And, as I mentioned previously, since strings are really a kind of sequence, all the sequence functions I'll discuss in the next chapter can be used with strings. For instance, you can discover the length of a string with the LENGTH
function and can get and set individual characters of a string with the generic sequence element accessor function, ELT
, or the generic array element accessor function, AREF
. Or you can use the string-specific accessor, CHAR
. But those functions, and others, are the topic of the next chapter, so let's move on.
11. Collections
Like most programming languages, Common Lisp provides standard data types that collect multiple values into a single object. Every language slices up the collection problem a little bit differently, but the basic collection types usually boil down to an integer-indexed array type and a table type that can be used to map more or less arbitrary keys to values. The former are variously called arrays, lists, or tuples; the latter go by the names hash tables, associative arrays, maps, and dictionaries.
Lisp is, of course, famous for its list data structure, and most Lisp books, following the ontogeny-recapitulates-phylogeny principle of language instruction, start their discussion of Lisp's collections with lists. However, that approach often leads readers to the mistaken conclusion that lists are Lisp's only collection type. To make matters worse, because Lisp's lists are such a flexible data structure, it is possible to use them for many of the things arrays and hash tables are used for in other languages. But it's a mistake to focus too much on lists; while they're a crucial data structure for representing Lisp code as Lisp data, in many situations other data structures are more appropriate.
To keep lists from stealing the show, in this chapter I'll focus on Common Lisp's other collection types: vectors and hash tables.[119] However, vectors and lists share enough characteristics that Common Lisp treats them both as subtypes of a more general abstraction, the sequence. Thus, you can use many of the functions I'll discuss in this chapter with both vectors and lists.
Vectors
Vectors are Common Lisp's basic integer-indexed collection, and they come in two flavors. Fixed-size vectors are a lot like arrays in a language such as Java: a thin veneer over a chunk of contiguous memory that holds the vector's elements.[120] Resizable vectors, on the other hand, are more like arrays in Perl or Ruby, lists in Python, or the ArrayList class in Java: they abstract the actual storage, allowing the vector to grow and shrink as elements are added and removed.
You can make fixed-size vectors containing specific values with the function VECTOR
, which takes any number of arguments and returns a freshly allocated fixed-size vector containing those arguments.
(vector) ==> #()
(vector 1) ==> #(1)
(vector 1 2) ==> #(1 2)
The #(...)
syntax is the literal notation for vectors used by the Lisp printer and reader. This syntax allows you to save and restore vectors by PRINT
ing them out and READ
ing them back in. You can use the #(...)
syntax to include literal vectors in your code, but as the effects of modifying literal objects aren't defined, you should always use VECTOR
or the more general function MAKE-ARRAY
to create vectors you plan to modify.
MAKE-ARRAY
is more general than VECTOR
since you can use it to create arrays of any dimensionality as well as both fixed-size and resizable vectors. The one required argument to MAKE-ARRAY
is a list containing the dimensions of the array. Since a vector is a one-dimensional array, this list will contain one number, the size of the vector. As a convenience, MAKE-ARRAY
will also accept a plain number in the place of a one-item list. With no other arguments, MAKE-ARRAY
will create a vector with uninitialized elements that must be set before they can be accessed.[121] To create a vector with the elements all set to a particular value, you can pass an :initial-element
argument. Thus, to make a five-element vector with its elements initialized to NIL
, you can write the following:
(make-array 5 :initial-element nil) ==> #(NIL NIL NIL NIL NIL)
MAKE-ARRAY
is also the function to use to make a resizable vector. A resizable vector is a slightly more complicated object than a fixed-size vector; in addition to keeping track of the memory used to hold the elements and the number of slots available, a resizable vector also keeps track of the number of elements actually stored in the vector. This number is stored in the vector's fill pointer, so called because it's the index of the next position to be filled when you add an element to the vector.
To make a vector with a fill pointer, you pass MAKE-ARRAY
a :fill-pointer
argument. For instance, the following call to MAKE-ARRAY
makes a vector with room for five elements; but it looks empty because the fill pointer is zero:
(make-array 5 :fill-pointer 0) ==> #()
To add an element to the end of a resizable vector, you can use the function VECTOR-PUSH
. It adds the element at the current value of the fill pointer and then increments the fill pointer by one, returning the index where the new element was added. The function VECTOR-POP
returns the most recently pushed item, decrementing the fill pointer in the process.
(defparameter *x* (make-array 5 :fill-pointer 0))
(vector-push 'a *x*) ==> 0
*x* ==> #(A)
(vector-push 'b *x*) ==> 1
*x* ==> #(A B)
(vector-push 'c *x*) ==> 2
*x* ==> #(A B C)
(vector-pop *x*) ==> C
*x* ==> #(A B)
(vector-pop *x*) ==> B
*x* ==> #(A)
(vector-pop *x*) ==> A
*x* ==> #()
However, even a vector with a fill pointer isn't completely resizable. The vector *x*
can hold at most five elements. To make an arbitrarily resizable vector, you need to pass MAKE-ARRAY
another keyword argument: :adjustable
.
(make-array 5 :fill-pointer 0 :adjustable t) ==> #()
This call makes an adjustable vector whose underlying memory can be resized as needed. To add elements to an adjustable vector, you use VECTOR-PUSH-EXTEND
, which works just like VECTOR-PUSH
except it will automatically expand the array if you try to push an element onto a full vector—one whose fill pointer is equal to the size of the underlying storage.[122]
Subtypes of Vector
All the vectors you've dealt with so far have been general vectors that can hold any type of object. It's also possible to create specialized vectors that are restricted to holding certain types of elements. One reason to use specialized vectors is they may be stored more compactly and can provide slightly faster access to their elements than general vectors. However, for the moment let's focus on a couple kinds of specialized vectors that are important data types in their own right.
One of these you've seen already—strings are vectors specialized to hold characters. Strings are important enough to get their own read/print syntax (double quotes) and the set of string-specific functions I discussed in the previous chapter. But because they're also vectors, all the functions I'll discuss in the next few sections that take vector arguments can also be used with strings. These functions will fill out the string library with functions for things such as searching a string for a substring, finding occurrences of a character within a string, and more.
Literal strings, such as "foo"
, are like literal vectors written with the #()
syntax—their size is fixed, and they must not be modified. However, you can use MAKE-ARRAY
to make resizable strings by adding another keyword argument, :element-type
. This argument takes a type descriptor. I won't discuss all the possible type descriptors you can use here; for now it's enough to know you can create a string by passing the symbol CHARACTER
as the :element-type
argument. Note that you need to quote the symbol to prevent it from being treated as a variable name. For example, to make an initially empty but resizable string, you can write this:
(make-array 5 :fill-pointer 0 :adjustable t :element-type 'character) ""
Bit vectors—vectors whose elements are all zeros or ones—also get some special treatment. They have a special read/print syntax that looks like #*00001111
and a fairly large library of functions, which I won't discuss, for performing bit-twiddling operations such as "anding" together two bit arrays. The type descriptor to pass as the :element-type
to create a bit vector is the symbol BIT
.
Vectors As Sequences
As mentioned earlier, vectors and lists are the two concrete subtypes of the abstract type sequence. All the functions I'll discuss in the next few sections are sequence functions; in addition to being applicable to vectors—both general and specialized—they can also be used with lists.
The two most basic sequence functions are LENGTH
, which returns the length of a sequence, and ELT
, which allows you to access individual elements via an integer index. LENGTH
takes a sequence as its only argument and returns the number of elements it contains. For vectors with a fill pointer, this will be the value of the fill pointer. ELT
, short for element, takes a sequence and an integer index between zero (inclusive) and the length of the sequence (exclusive) and returns the corresponding element. ELT
will signal an error if the index is out of bounds. Like LENGTH
, ELT
treats a vector with a fill pointer as having the length specified by the fill pointer.
(defparameter *x* (vector 1 2 3))
(length *x*) ==> 3
(elt *x* 0) ==> 1
(elt *x* 1) ==> 2
(elt *x* 2) ==> 3
(elt *x* 3) ==> error
ELT
is also a SETF
able place, so you can set the value of a particular element like this:
(setf (elt *x* 0) 10)
*x* ==> #(10 2 3)
Sequence Iterating Functions
While in theory all operations on sequences boil down to some combination of LENGTH
, ELT
, and SETF
of ELT
operations, Common Lisp provides a large library of sequence functions.
One group of sequence functions allows you to express certain operations on sequences such as finding or filtering specific elements without writing explicit loops. Table 11-1 summarizes them.
Table 11-1.Basic Sequence Functions
Name | Required Arguments | Returns |
COUNT |
Item and sequence | Number of times item appears in sequence |
FIND |
Item and sequence | Item or NIL |
POSITION |
Item and sequence | Index into sequence or NIL |
REMOVE |
Item and sequence | Sequence with instances of item removed |
SUBSTITUTE |
New item, item, and sequence | Sequence with instances of item replaced with new item |
Here are some simple examples of how to use these functions:
(count 1 #(1 2 1 2 3 1 2 3 4)) ==> 3
(remove 1 #(1 2 1 2 3 1 2 3 4)) ==> #(2 2 3 2 3 4)
(remove 1 '(1 2 1 2 3 1 2 3 4)) ==> (2 2 3 2 3 4)
(remove #\a "foobarbaz") ==> "foobrbz"
(substitute 10 1 #(1 2 1 2 3 1 2 3 4)) ==> #(10 2 10 2 3 10 2 3 4)
(substitute 10 1 '(1 2 1 2 3 1 2 3 4)) ==> (10 2 10 2 3 10 2 3 4)
(substitute #\x #\b "foobarbaz") ==> "fooxarxaz"
(find 1 #(1 2 1 2 3 1 2 3 4)) ==> 1
(find 10 #(1 2 1 2 3 1 2 3 4)) ==> NIL
(position 1 #(1 2 1 2 3 1 2 3 4)) ==> 0
Note how REMOVE
and SUBSTITUTE
always return a sequence of the same type as their sequence argument.
You can modify the behavior of these five functions in a variety of ways using keyword arguments. For instance, these functions, by default, look for elements in the sequence that are the same object as the item argument. You can change this in two ways: First, you can use the :test
keyword to pass a function that accepts two arguments and returns a boolean. If provided, it will be used to compare item to each element instead of the default object equality test, EQL
.[123] Second, with the :key
keyword you can pass a one-argument function to be called on each element of the sequence to extract a key value, which will then be compared to the item in the place of the element itself. Note, however, that functions such as FIND
that return elements of the sequence continue to return the actual element, not just the extracted key.
(count "foo" #("foo" "bar" "baz") :test #'string=) ==> 1
(find 'c #((a 10) (b 20) (c 30) (d 40)) :key #'first) ==> (C 30)
To limit the effects of these functions to a particular subsequence of the sequence argument, you can provide bounding indices with :start
and :end
arguments. Passing NIL
for :end
or omitting it is the same as specifying the length of the sequence.[124]
If a non-NIL :from-end
argument is provided, then the elements of the sequence will be examined in reverse order. By itself :from-end
can affect the results of only FIND
and POSITION
. For instance:
(find 'a #((a 10) (b 20) (a 30) (b 40)) :key #'first) ==> (A 10)
(find 'a #((a 10) (b 20) (a 30) (b 40)) :key #'first :from-end t) ==> (A 30)
However, the :from-end
argument can affect REMOVE
and SUBSTITUTE
in conjunction with another keyword parameter, :count
, that's used to specify how many elements to remove or substitute. If you specify a :count
lower than the number of matching elements, then it obviously matters which end you start from:
(remove #\a "foobarbaz" :count 1) ==> "foobrbaz"
(remove #\a "foobarbaz" :count 1 :from-end t) ==> "foobarbz"
And while :from-end
can't change the results of the COUNT
function, it does affect the order the elements are passed to any :test
and :key
functions, which could possibly have side effects. For example:
CL-USER> (defparameter *v* #((a 10) (b 20) (a 30) (b 40)))
*V*
CL-USER> (defun verbose-first (x) (format t "Looking at ~s~%" x) (first x))
VERBOSE-FIRST
CL-USER> (count 'a *v* :key #'verbose-first)
Looking at (A 10)
Looking at (B 20)
Looking at (A 30)
Looking at (B 40)
2
CL-USER> (count 'a *v* :key #'verbose-first :from-end t)
Looking at (B 40)
Looking at (A 30)
Looking at (B 20)
Looking at (A 10)
2
Table 11-2 summarizes these arguments.
Table 11-2. Standard Sequence Function Keyword Arguments
Argument | Meaning | Default |
:test |
Two-argument function used to compare item (or value extracted by :key function) to element. |
EQL |
:key |
One-argument function to extract key value from actual sequence element. NIL means use element as is. |
NIL |
:start |
Starting index (inclusive) of subsequence. | 0 |
:end |
Ending index (exclusive) of subsequence. NIL indicates end of sequence. |
NIL |
:from-end |
If true, the sequence will be traversed in reverse order, from end to start. | NIL |
:count |
Number indicating the number of elements to remove or substitute or NIL to indicate all (REMOVE and SUBSTITUTE only). |
NIL |
Higher-Order Function Variants
For each of the functions just discussed, Common Lisp provides two higher-order function variants that, in the place of the item argument, take a function to be called on each element of the sequence. One set of variants are named the same as the basic function with an -IF
appended. These functions count, find, remove, and substitute elements of the sequence for which the function argument returns true. The other set of variants are named with an -IF-NOT
suffix and count, find, remove, and substitute elements for which the function argument does not return true.
(count-if #'evenp #(1 2 3 4 5)) ==> 2
(count-if-not #'evenp #(1 2 3 4 5)) ==> 3
(position-if #'digit-char-p "abcd0001") ==> 4
(remove-if-not #'(lambda (x) (char= (elt x 0) #\f))
#("foo" "bar" "baz" "foom")) ==> #("foo" "foom")
According to the language standard, the -IF-NOT
variants are deprecated. However, that deprecation is generally considered to have itself been ill-advised. If the standard is ever revised, it's more likely the deprecation will be removed than the -IF-NOT
functions. For one thing, the REMOVE-IF-NOT
variant is probably used more often than REMOVE-IF
. Despite its negative-sounding name, REMOVE-IF-NOT
is actually the positive variant—it returns the elements that do satisfy the predicate.[125]
The -IF
and -IF-NOT
variants accept all the same keyword arguments as their vanilla counterparts except for :test
, which isn't needed since the main argument is already a function.[126] With a :key
argument, the value extracted by the :key
function is passed to the function instead of the actual element.
(count-if #'evenp #((1 a) (2 b) (3 c) (4 d) (5 e)) :key #'first) ==> 2
(count-if-not #'evenp #((1 a) (2 b) (3 c) (4 d) (5 e)) :key #'first) ==> 3
(remove-if-not #'alpha-char-p
#("foo" "bar" "1baz") :key #'(lambda (x) (elt x 0))) ==> #("foo" "bar")
The REMOVE
family of functions also support a fourth variant, REMOVE-DUPLICATES
, that has only one required argument, a sequence, from which it removes all but one instance of each duplicated element. It takes the same keyword arguments as REMOVE
, except for :count
, since it always removes all duplicates.
(remove-duplicates #(1 2 1 2 3 1 2 3 4)) ==> #(1 2 3 4)
Whole Sequence Manipulations
A handful of functions perform operations on a whole sequence (or sequences) at a time. These tend to be simpler than the other functions I've described so far. For instance, COPY-SEQ
and REVERSE
each take a single argument, a sequence, and each returns a new sequence of the same type. The sequence returned by COPY-SEQ
contains the same elements as its argument while the sequence returned by REVERSE
contains the same elements but in reverse order. Note that neither function copies the elements themselves—only the returned sequence is a new object.
The CONCATENATE
function creates a new sequence containing the concatenation of any number of sequences. However, unlike REVERSE
and COPY-SEQ
, which simply return a sequence of the same type as their single argument, CONCATENATE
must be told explicitly what kind of sequence to produce in case the arguments are of different types. Its first argument is a type descriptor, like the :element-type
argument to MAKE-ARRAY
. In this case, the type descriptors you'll most likely use are the symbols VECTOR
, LIST
, or STRING
.[127] For example:
(concatenate 'vector #(1 2 3) '(4 5 6)) ==> #(1 2 3 4 5 6)
(concatenate 'list #(1 2 3) '(4 5 6)) ==> (1 2 3 4 5 6)
(concatenate 'string "abc" '(#\d #\e #\f)) ==> "abcdef"
Sorting and Merging
The functions SORT
and STABLE-SORT
provide two ways of sorting a sequence. They both take a sequence and a two-argument predicate and return a sorted version of the sequence.
(sort (vector "foo" "bar" "baz") #'string<) ==> #("bar" "baz" "foo")
The difference is that STABLE-SORT
is guaranteed to not reorder any elements considered equivalent by the predicate while SORT
guarantees only that the result is sorted and may reorder equivalent elements.
Both these functions are examples of what are called destructive functions. Destructive functions are allowed—typically for reasons of efficiency—to modify their arguments in more or less arbitrary ways. This has two implications: one, you should always do something with the return value of these functions (such as assign it to a variable or pass it to another function), and, two, unless you're done with the object you're passing to the destructive function, you should pass a copy instead. I'll say more about destructive functions in the next chapter.
Typically you won't care about the unsorted version of a sequence after you've sorted it, so it makes sense to allow SORT
and STABLE-SORT
to destroy the sequence in the course of sorting it. But it does mean you need to remember to write the following:[128]
(setf my-sequence (sort my-sequence #'string<))
rather than just this:
(sort my-sequence #'string<)
Both these functions also take a keyword argument, :key
, which, like the :key
argument in other sequence functions, should be a function and will be used to extract the values to be passed to the sorting predicate in the place of the actual elements. The extracted keys are used only to determine the ordering of elements; the sequence returned will contain the actual elements of the argument sequence.
The MERGE
function takes two sequences and a predicate and returns a sequence produced by merging the two sequences, according to the predicate. It's related to the two sorting functions in that if each sequence is already sorted by the same predicate, then the sequence returned by MERGE
will also be sorted. Like the sorting functions, MERGE
takes a :key
argument. Like CONCATENATE
, and for the same reason, the first argument to MERGE
must be a type descriptor specifying the type of sequence to produce.
(merge 'vector #(1 3 5) #(2 4 6) #'<) ==> #(1 2 3 4 5 6)
(merge 'list #(1 3 5) #(2 4 6) #'<) ==> (1 2 3 4 5 6)
Subsequence Manipulations
Another set of functions allows you to manipulate subsequences of existing sequences. The most basic of these is SUBSEQ
, which extracts a subsequence starting at a particular index and continuing to a particular ending index or the end of the sequence. For instance:
(subseq "foobarbaz" 3) ==> "barbaz"
(subseq "foobarbaz" 3 6) ==> "bar"
SUBSEQ
is also SETF
able, but it won't extend or shrink a sequence; if the new value and the subsequence to be replaced are different lengths, the shorter of the two determines how many characters are actually changed.
(defparameter *x* (copy-seq "foobarbaz"))
(setf (subseq *x* 3 6) "xxx") ; subsequence and new value are same length
*x* ==> "fooxxxbaz"
(setf (subseq *x* 3 6) "abcd") ; new value too long, extra character ignored.
*x* ==> "fooabcbaz"
(setf (subseq *x* 3 6) "xx") ; new value too short, only two characters changed
*x* ==> "fooxxcbaz"
You can use the FILL
function to set multiple elements of a sequence to a single value. The required arguments are a sequence and the value with which to fill it. By default every element of the sequence is set to the value; :start
and :end
keyword arguments can limit the effects to a given subsequence.
If you need to find a subsequence within a sequence, the SEARCH
function works like POSITION
except the first argument is a sequence rather than a single item.
(position #\b "foobarbaz") ==> 3
(search "bar" "foobarbaz") ==> 3
On the other hand, to find where two sequences with a common prefix first diverge, you can use the MISMATCH
function. It takes two sequences and returns the index of the first pair of mismatched elements.
(mismatch "foobarbaz" "foom") ==> 3
It returns NIL
if the strings match. MISMATCH
also takes many of the standard keyword arguments: a :key
argument for specifying a function to use to extract the values to be compared; a :test
argument to specify the comparison function; and :start1
, :end1
, :start2
, and :end2
arguments to specify subsequences within the two sequences. And a :from-end
argument of T
specifies the sequences should be searched in reverse order, causing MISMATCH
to return the index, in the first sequence, where whatever common suffix the two sequences share begins.
(mismatch "foobar" "bar" :from-end t) ==> 3
Sequence Predicates
Four other handy functions are EVERY
, SOME
, NOTANY
, and NOTEVERY
, which iterate over sequences testing a boolean predicate. The first argument to all these functions is the predicate, and the remaining arguments are sequences. The predicate should take as many arguments as the number of sequences passed. The elements of the sequences are passed to the predicate—one element from each sequence—until one of the sequences runs out of elements or the overall termination test is met: EVERY
terminates, returning false, as soon as the predicate fails. If the predicate is always satisfied, it returns true. SOME
returns the first non-NIL
value returned by the predicate or returns false if the predicate is never satisfied. NOTANY
returns false as soon as the predicate is satisfied or true if it never is. And NOTEVERY
returns true as soon as the predicate fails or false if the predicate is always satisfied. Here are some examples of testing just one sequence:
(every #'evenp #(1 2 3 4 5)) ==> NIL
(some #'evenp #(1 2 3 4 5)) ==> T
(notany #'evenp #(1 2 3 4 5)) ==> NIL
(notevery #'evenp #(1 2 3 4 5)) ==> T
These calls compare elements of two sequences pairwise:
(every #'> #(1 2 3 4) #(5 4 3 2)) ==> NIL
(some #'> #(1 2 3 4) #(5 4 3 2)) ==> T
(notany #'> #(1 2 3 4) #(5 4 3 2)) ==> NIL
(notevery #'> #(1 2 3 4) #(5 4 3 2)) ==> T
Sequence Mapping Functions
Finally, the last of the sequence functions are the generic mapping functions. MAP
, like the sequence predicate functions, takes a n-argument function and n sequences. But instead of a boolean value, MAP
returns a new sequence containing the result of applying the function to subsequent elements of the sequences. Like CONCATENATE
and MERGE
, MAP
needs to be told what kind of sequence to create.
(map 'vector #'* #(1 2 3 4 5) #(10 9 8 7 6)) ==> #(10 18 24 28 30)
MAP-INTO
is like MAP
except instead of producing a new sequence of a given type, it places the results into a sequence passed as the first argument. This sequence can be the same as one of the sequences providing values for the function. For instance, to sum several vectors—a
, b
, and c
—into one, you could write this:
(map-into a #'+ a b c)
If the sequences are different lengths, MAP-INTO
affects only as many elements as are present in the shortest sequence, including the sequence being mapped into. However, if the sequence being mapped into is a vector with a fill pointer, the number of elements affected isn't limited by the fill pointer but rather by the actual size of the vector. After a call to MAP-INTO
, the fill pointer will be set to the number of elements mapped. MAP-INTO
won't, however, extend an adjustable vector.
The last sequence function is REDUCE
, which does another kind of mapping: it maps over a single sequence, applying a two-argument function first to the first two elements of the sequence and then to the value returned by the function and subsequent elements of the sequence. Thus, the following expression sums the numbers from one to ten:
(reduce #'+ #(1 2 3 4 5 6 7 8 9 10)) ==> 55
REDUCE
is a surprisingly useful function—whenever you need to distill a sequence down to a single value, chances are you can write it with REDUCE
, and it will often be quite a concise way to express what you want. For instance, to find the maximum value in a sequence of numbers, you can write (reduce #'max numbers)
. REDUCE
also takes a full complement of keyword arguments (:key
, :from-end
, :start
, and :end
) and one unique to REDUCE
(:initial-value
). The latter specifies a value that's logically placed before the first element of the sequence (or after the last if you also specify a true :from-end
argument).
Hash Tables
The other general-purpose collection provided by Common Lisp is the hash table. Where vectors provide an integer-indexed data structure, hash tables allow you to use arbitrary objects as the indexes, or keys. When you add a value to a hash table, you store it under a particular key. Later you can use the same key to retrieve the value. Or you can associate a new value with the same key—each key maps to a single value.
With no arguments MAKE-HASH-TABLE
makes a hash table that considers two keys equivalent if they're the same object according to EQL
. This is a good default unless you want to use strings as keys, since two strings with the same contents aren't necessarily EQL
. In that case you'll want a so-called EQUAL
hash table, which you can get by passing the symbol EQUAL
as the :test
keyword argument to MAKE-HASH-TABLE
. Two other possible values for the :test
argument are the symbols EQ
and EQUALP
. These are, of course, the names of the standard object comparison functions, which I discussed in Chapter 4. However, unlike the :test
argument passed to sequence functions, MAKE-HASH-TABLE
's :test
can't be used to specify an arbitrary function—only the values EQ
, EQL
, EQUAL
, and EQUALP
. This is because hash tables actually need two functions, an equivalence function and a hash function that computes a numerical hash code from the key in a way compatible with how the equivalence function will ultimately compare two keys. However, although the language standard provides only for hash tables that use the standard equivalence functions, most implementations provide some mechanism for defining custom hash tables.
The GETHASH
function provides access to the elements of a hash table. It takes two arguments—a key and the hash table—and returns the value, if any, stored in the hash table under that key or NIL
.[129] For example:
(defparameter *h* (make-hash-table))
(gethash 'foo *h*) ==> NIL
(setf (gethash 'foo *h*) 'quux)
(gethash 'foo *h*) ==> QUUX
Since GETHASH
returns NIL
if the key isn't present in the table, there's no way to tell from the return value the difference between a key not being in a hash table at all and being in the table with the value NIL
. GETHASH
solves this problem with a feature I haven't discussed yet—multiple return values. GETHASH
actually returns two values; the primary value is the value stored under the given key or NIL
. The secondary value is a boolean indicating whether the key is present in the hash table. Because of the way multiple values work, the extra return value is silently discarded unless the caller explicitly handles it with a form that can "see" multiple values.
I'll discuss multiple return values in greater detail in Chapter 20, but for now I'll give you a sneak preview of how to use the MULTIPLE-VALUE-BIND
macro to take advantage of GETHASH
's extra return value. MULTIPLE-VALUE-BIND
creates variable bindings like LET
does, filling them with the multiple values returned by a form.
The following function shows how you might use MULTIPLE-VALUE-BIND
; the variables it binds are value
and present
:
(defun show-value (key hash-table)
(multiple-value-bind (value present) (gethash key hash-table)
(if present
(format nil "Value ~a actually present." value)
(format nil "Value ~a because key not found." value))))
(setf (gethash 'bar *h*) nil) ; provide an explicit value of NIL
(show-value 'foo *h*) ==> "Value QUUX actually present."
(show-value 'bar *h*) ==> "Value NIL actually present."
(show-value 'baz *h*) ==> "Value NIL because key not found."
Since setting the value under a key to NIL
leaves the key in the table, you'll need another function to completely remove a key/value pair. REMHASH
takes the same arguments as GETHASH
and removes the specified entry. You can also completely clear a hash table of all its key/value pairs with CLRHASH
.
Hash Table Iteration
Common Lisp provides a couple ways to iterate over the entries in a hash table. The simplest of these is via the function MAPHASH
. Analogous to the MAP
function, MAPHASH
takes a two-argument function and a hash table and invokes the function once for each key/value pair in the hash table. For instance, to print all the key/value pairs in a hash table, you could use MAPHASH
like this:
(maphash #'(lambda (k v) (format t "~a => ~a~%" k v)) *h*)
The consequences of adding or removing elements from a hash table while iterating over it aren't specified (and are likely to be bad) with two exceptions: you can use SETF
with GETHASH
to change the value of the current entry, and you can use REMHASH
to remove the current entry. For instance, to remove all the entries whose value is less than ten, you could write this:
(maphash #'(lambda (k v) (when (< v 10) (remhash k *h*))) *h*)
The other way to iterate over a hash table is with the extended LOOP
macro, which I'll discuss in Chapter 22.[130] The LOOP
equivalent of the first MAPHASH
expression would look like this:
(loop for k being the hash-keys in *h* using (hash-value v)
do (format t "~a => ~a~%" k v))
I could say a lot more about the nonlist collections supported by Common Lisp. For instance, I haven't discussed multidimensional arrays at all or the library of functions for manipulating bit arrays. However, what I've covered in this chapter should suffice for most of your general-purpose programming needs. Now it's finally time to look at Lisp's eponymous data structure: lists.
12. They Called It LISP for a Reason: List Processing
Lists play an important role in Lisp—for reasons both historical and practical. Historically, lists were Lisp's original composite data type, though it has been decades since they were its only such data type. These days, a Common Lisp programmer is as likely to use a vector, a hash table, or a user-defined class or structure as to use a list.
Practically speaking, lists remain in the language because they're an excellent solution to certain problems. One such problem—how to represent code as data in order to support code-transforming and code-generating macros—is particular to Lisp, which may explain why other languages don't feel the lack of Lisp-style lists. More generally, lists are an excellent data structure for representing any kind of heterogeneous and/or hierarchical data. They're also quite lightweight and support a functional style of programming that's another important part of Lisp's heritage.
Thus, you need to understand lists on their own terms; as you gain a better understanding of how lists work, you'll be in a better position to appreciate when you should and shouldn't use them.
"There Is No List"
Spoon Boy: Do not try and bend the list. That's impossible. Instead . . . only try to realize the truth.
Neo: What truth?
Spoon Boy: There is no list.
Neo: There is no list?
Spoon Boy: Then you'll see that it is not the list that bends; it is only yourself.[131]
The key to understanding lists is to understand that they're largely an illusion built on top of objects that are instances of a more primitive data type. Those simpler objects are pairs of values called cons cells, after the function CONS
used to create them.
CONS
takes two arguments and returns a new cons cell containing the two values.[132] These values can be references to any kind of object. Unless the second value is NIL
or another cons cell, a cons is printed as the two values in parentheses separated by a dot, a so-called dotted pair.
(cons 1 2) ==> (1 . 2)
The two values in a cons cell are called the CAR
and the CDR
after the names of the functions used to access them. At the dawn of time, these names were mnemonic, at least to the folks implementing the first Lisp on an IBM 704. But even then they were just lifted from the assembly mnemonics used to implement the operations. However, it's not all bad that these names are somewhat meaningless—when considering individual cons cells, it's best to think of them simply as an arbitrary pair of values without any particular semantics. Thus:
(car (cons 1 2)) ==> 1
(cdr (cons 1 2)) ==> 2
Both CAR
and CDR
are also SETF
able places—given an existing cons cell, it's possible to assign a new value to either of its values.[133]
(defparameter *cons* (cons 1 2))
*cons* ==> (1 . 2)
(setf (car *cons*) 10) ==> 10
*cons* ==> (10 . 2)
(setf (cdr *cons*) 20) ==> 20
*cons* ==> (10 . 20)
Because the values in a cons cell can be references to any kind of object, you can build larger structures out of cons cells by linking them together. Lists are built by linking together cons cells in a chain. The elements of the list are held in the CAR
s of the cons cells while the links to subsequent cons cells are held in the CDR
s. The last cell in the chain has a CDR
of NIL
, which—as I mentioned in Chapter 4—represents the empty list as well as the boolean value false.
This arrangement is by no means unique to Lisp; it's called a singly linked list. However, few languages outside the Lisp family provide such extensive support for this humble data type.
So when I say a particular value is a list, what I really mean is it's either NIL
or a reference to a cons cell. The CAR
of the cons cell is the first item of the list, and the CDR
is a reference to another list, that is, another cons cell or NIL
, containing the remaining elements. The Lisp printer understands this convention and prints such chains of cons cells as parenthesized lists rather than as dotted pairs.
(cons 1 nil) ==> (1)
(cons 1 (cons 2 nil)) ==> (1 2)
(cons 1 (cons 2 (cons 3 nil))) ==> (1 2 3)
When talking about structures built out of cons cells, a few diagrams can be a big help. Box-and-arrow diagrams represent cons cells as a pair of boxes like this:
The box on the left represents the CAR
, and the box on the right is the CDR
. The values stored in a particular cons cell are either drawn in the appropriate box or represented by an arrow from the box to a representation of the referenced value.[134] For instance, the list (1 2 3)
, which consists of three cons cells linked together by their CDR
s, would be diagrammed like this:
However, most of the time you work with lists you won't have to deal with individual cons cells—the functions that create and manipulate lists take care of that for you. For example, the LIST
function builds a cons cells under the covers for you and links them together; the following LIST
expressions are equivalent to the previous CONS
expressions:
(list 1) ==> (1)
(list 1 2) ==> (1 2)
(list 1 2 3) ==> (1 2 3)
Similarly, when you're thinking in terms of lists, you don't have to use the meaningless names CAR
and CDR
; FIRST
and REST
are synonyms for CAR
and CDR
that you should use when you're dealing with cons cells as lists.
(defparameter *list* (list 1 2 3 4))
(first *list*) ==> 1
(rest *list*) ==> (2 3 4)
(first (rest *list*)) ==> 2
Because cons cells can hold any kind of values, so can lists. And a single list can hold objects of different types.
(list "foo" (list 1 2) 10) ==> ("foo" (1 2) 10)
The structure of that list would look like this:
Because lists can have other lists as elements, you can also use them to represent trees of arbitrary depth and complexity. As such, they make excellent representations for any heterogeneous, hierarchical data. Lisp-based XML processors, for instance, usually represent XML documents internally as lists. Another obvious example of tree-structured data is Lisp code itself. In Chapters 30 and 31 you'll write an HTML generation library that uses lists of lists to represent the HTML to be generated. I'll talk more next chapter about using cons cells to represent other data structures.
Common Lisp provides quite a large library of functions for manipulating lists. In the sections "List-Manipulation Functions" and "Mapping," you'll look at some of the more important of these functions. However, they will be easier to understand in the context of a few ideas borrowed from functional programming.
Functional Programming and Lists
The essence of functional programming is that programs are built entirely of functions with no side effects that compute their results based solely on the values of their arguments. The advantage of the functional style is that it makes programs easier to understand. Eliminating side effects eliminates almost all possibilities for action at a distance. And since the result of a function is determined only by the values of its arguments, its behavior is easier to understand and test. For instance, when you see an expression such as (+ 3 4)
, you know the result is uniquely determined by the definition of the +
function and the values 3
and 4
. You don't have to worry about what may have happened earlier in the execution of the program since there's nothing that can change the result of evaluating that expression.
Functions that deal with numbers are naturally functional since numbers are immutable. A list, on the other hand, can be mutated, as you've just seen, by SETF
ing the CAR
s and CDR
s of the cons cells that make up its backbone. However, lists can be treated as a functional data type if you consider their value to be determined by the elements they contain. Thus, any list of the form (1 2 3 4)
is functionally equivalent to any other list containing those four values, regardless of what cons cells are actually used to represent the list. And any function that takes a list as an argument and returns a value based solely on the contents of the list can likewise be considered functional. For instance, the REVERSE
sequence function, given the list (1 2 3 4)
, always returns a list (4 3 2 1)
. Different calls to REVERSE
with functionally equivalent lists as the argument will return functionally equivalent result lists. Another aspect of functional programming, which I'll discuss in the section "Mapping," is the use of higher-order functions: functions that treat other functions as data, taking them as arguments or returning them as results.
Most of Common Lisp's list-manipulation functions are written in a functional style. I'll discuss later how to mix functional and other coding styles, but first you should understand a few subtleties of the functional style as applied to lists.
The reason most list functions are written functionally is it allows them to return results that share cons cells with their arguments. To take a concrete example, the function APPEND
takes any number of list arguments and returns a new list containing the elements of all its arguments. For instance:
(append (list 1 2) (list 3 4)) ==> (1 2 3 4)
From a functional point of view, APPEND
's job is to return the list (1 2 3 4)
without modifying any of the cons cells in the lists (1 2)
and (3 4)
. One obvious way to achieve that goal is to create a completely new list consisting of four new cons cells. However, that's more work than is necessary. Instead, APPEND
actually makes only two new cons cells to hold the values 1
and 2
, linking them together and pointing the CDR
of the second cons cell at the head of the last argument, the list (3 4)
. It then returns the cons cell containing the 1
. None of the original cons cells has been modified, and the result is indeed the list (1 2 3 4)
. The only wrinkle is that the list returned by APPEND
shares some cons cells with the list (3 4)
. The resulting structure looks like this:
In general, APPEND
must copy all but its last argument, but it can always return a result that shares structure with the last argument.
Other functions take similar advantage of lists' ability to share structure. Some, like APPEND
, are specified to always return results that share structure in a particular way. Others are simply allowed to return shared structure at the discretion of the implementation.
"Destructive" Operations
If Common Lisp were a purely functional language, that would be the end of the story. However, because it's possible to modify a cons cell after it has been created by SETF
ing its CAR
or CDR
, you need to think a bit about how side effects and structure sharing mix.
Because of Lisp's functional heritage, operations that modify existing objects are called destructive—in functional programming, changing an object's state "destroys" it since it no longer represents the same value. However, using the same term to describe all state-modifying operations leads to a certain amount of confusion since there are two very different kinds of destructive operations, for-side-effect operations and recycling operations.[135]
For-side-effect operations are those used specifically for their side effects. All uses of SETF
are destructive in this sense, as are functions that use SETF
under the covers to change the state of an existing object such as VECTOR-PUSH
or VECTOR-POP
. But it's a bit unfair to describe these operations as destructive—they're not intended to be used in code written in a functional style, so they shouldn't be described using functional terminology. However, if you mix nonfunctional, for-side-effect operations with functions that return structure-sharing results, then you need to be careful not to inadvertently modify the shared structure. For instance, consider these three definitions:
(defparameter *list-1* (list 1 2))
(defparameter *list-2* (list 3 4))
(defparameter *list-3* (append *list-1* *list-2*))
After evaluating these forms, you have three lists, but *list-3*
and *list-2*
share structure just like the lists in the previous diagram.
*list-1* ==> (1 2)
*list-2* ==> (3 4)
*list-3* ==> (1 2 3 4)
Now consider what happens when you modify *list-2*
.
(setf (first *list-2*) 0) ==> 0
*list-2* ==> (0 4) ; as expected
*list-3* ==> (1 2 0 4) ; maybe not what you wanted
The change to *list-2*
also changes *list-3*
because of the shared structure: the first cons cell in *list-2*
is also the third cons cell in *list-3*
. SETF
ing the FIRST
of *list-2*
changes the value in the CAR
of that cons cell, affecting both lists.
On the other hand, the other kind of destructive operations, recycling operations, are intended to be used in functional code. They use side effects only as an optimization. In particular, they reuse certain cons cells from their arguments when building their result. However, unlike functions such as APPEND
that reuse cons cells by including them, unmodified, in the list they return, recycling functions reuse cons cells as raw material, modifying the CAR
and CDR
as necessary to build the desired result. Thus, recycling functions can be used safely only when the original lists aren't going to be needed after the call to the recycling function.
To see how a recycling function works, let's compare REVERSE
, the nondestructive function that returns a reversed version of a sequence, to NREVERSE
, a recycling version of the same function. Because REVERSE
doesn't modify its argument, it must allocate a new cons cell for each element in the list being reversed. But suppose you write something like this:
(setf *list* (reverse *list*))
By assigning the result of REVERSE
back to *list*
, you've removed the reference to the original value of *list*
. Assuming the cons cells in the original list aren't referenced anywhere else, they're now eligible to be garbage collected. However, in many Lisp implementations it'd be more efficient to immediately reuse the existing cons cells rather than allocating new ones and letting the old ones become garbage.
NREVERSE
allows you to do exactly that. The N stands for non-consing, meaning it doesn't need to allocate any new cons cells. The exact side effects of NREVERSE
are intentionally not specified—it's allowed to modify any CAR
or CDR
of any cons cell in the list—but a typical implementation might walk down the list changing the CDR
of each cons cell to point to the previous cons cell, eventually returning the cons cell that was previously the last cons cell in the old list and is now the head of the reversed list. No new cons cells need to be allocated, and no garbage is created.
Most recycling functions, like NREVERSE
, have nondestructive counterparts that compute the same result. In general, the recycling functions have names that are the same as their non-destructive counterparts except with a leading N. However, not all do, including several of the more commonly used recycling functions such as NCONC
, the recycling version of APPEND
, and DELETE
, DELETE-IF
, DELETE-IF-NOT
, and DELETE-DUPLICATES
, the recycling versions of the REMOVE
family of sequence functions.
In general, you use recycling functions in the same way you use their nondestructive counterparts except it's safe to use them only when you know the arguments aren't going to be used after the function returns. The side effects of most recycling functions aren't specified tightly enough to be relied upon.
However, the waters are further muddied by a handful of recycling functions with specified side effects that can be relied upon. They are NCONC
, the recycling version of APPEND
, and NSUBSTITUTE
and its -IF
and -IF-NOT
variants, the recycling versions of the sequence functions SUBSTITUTE
and friends.
Like APPEND
, NCONC
returns a concatenation of its list arguments, but it builds its result in the following way: for each nonempty list it's passed, NCONC
sets the CDR
of the list's last cons cell to point to the first cons cell of the next nonempty list. It then returns the first list, which is now the head of the spliced-together result. Thus:
(defparameter *x* (list 1 2 3))
(nconc *x* (list 4 5 6)) ==> (1 2 3 4 5 6)
*x* ==> (1 2 3 4 5 6)
NSUBSTITUTE
and variants can be relied on to walk down the list structure of the list argument and to SETF
the CAR
s of any cons cells holding the old value to the new value and to otherwise leave the list intact. It then returns the original list, which now has the same value as would've been computed by SUBSTITUTE
.[136]
The key thing to remember about NCONC
and NSUBSTITUTE
is that they're the exceptions to the rule that you can't rely on the side effects of recycling functions. It's perfectly acceptable—and arguably good style—to ignore the reliability of their side effects and use them, like any other recycling function, only for the value they return.