Lazy sequences in (a few lines of) Common Lisp

Copyright (C) 2016 F. Peschanski, CC-BY-SA 3.0

The Clojure programming language comes with a versatile lazy sequence abstraction, which also exists under other names such as generators in e.g. Python, (implicitly) lazy lists in e.g. Haskell, streams in e.g. Ocaml, etc.

There is nothing like in the Common Lisp standard, in particular the CL sequences are IMHO less interesting (providing a too restricted kind of polymorphism).

In this document I address the following question:

  • is it easy to implement such an abstraction in Common Lisp ?

And as it is almost always the case in Common Lisp, the answer is :

  • yes it is !

And the remaining of this document is the way I would do it.

Disclaimer : the Lisp code below is not of production quality, and it is by no mean a port of the Clojure code, although I guess there aren't a million ways to implement the lazy sequence abstraction...

What is a lazy sequence ?

Let's start with the basics.

A mathematical detour

According to https://en.wikipedia.org/wiki/Sequence,

a (mathematical) sequence is an ordered collection of objects in which repetitions are allowed.

Put in other terms, this is an ordered way to encode a multiset, and the simplest encoding is to associate to each element of the multiset a natural number giving its "position" in the ordering relation. The first element is at position 0, the second at position 1, etc. This gives a functional interpretation of a sequence. In this interpretation a sequence can have a finite or infinite domain, in which case we say that the sequence is finite or infinite.

There is also an inductive definition of a finite sequence, being the smallest set composed of either :

  • the empty sequence nil

  • or a non-empty sequence, constructed from a first element and the rest of the sequence.

You then obtained the definition of a list, our ubiquitous data structure (well, its proper, immutable variant) ! The guarantee that the sequence is finite here is that we take the smallest set satisfying the rules.

But what about infinite sequences ?

There is indeed a related mathematical definition of an infinite sequence, being the largest set composed of

  • a head elements followed by an infinite tail sequence

Note that there is no base case and moreover we consider this time the largest set satisfying the unique rule. Such definition is said coinductive and though it is a very interesting theory, a very concrete take on infinity, we have everything we need to go back to the practical side of things.

Interestingly, the inductive and coinductive definitions are compatible, identifying first with head, and rest with tail.

So our short mathematical detour leads to the following, conceptually important but somewhat bare Lisp code :


In [1]:
(defgeneric head (sequence)
    (:documentation "Taking the head of the `sequence`."))


Out[1]:
#<STANDARD-GENERIC-FUNCTION CL-JUPYTER-USER::HEAD (0)>

In [2]:
(defgeneric tail (sequence)
    (:documentation "Taking the tail of the `sequence`"))


Out[2]:
#<STANDARD-GENERIC-FUNCTION CL-JUPYTER-USER::TAIL (0)>

For pragmatic reasons we will also need a helper function for printing sequences.


In [3]:
(defgeneric print-cell (c out)
    (:documentation "A printer for cells."))


Out[3]:
#<STANDARD-GENERIC-FUNCTION CL-JUPYTER-USER::PRINT-CELL (0)>

Strict vs. lazy sequences

Let's go back to computers and programming... In a computer everything's, in a way, is finite so the finite/infinite dichotomy is at best a view of the mind. We can perhaps more interestingly talk about stict vs. lazy sequences. Both are in fact finite, even though it is sometimes useful to consider lazy sequences as "infinite" (please do mind the double quotes).

A strict sequence follows the inductive definition above: it is either empty or non-empty, providing a head value and a strict tail sequence. A consequence is that all the elements of strict sequences are computed values.

As long as you abolish setf-ing things, we can safely identify strict sequences with proper lists, hence the following Lisp definitions.


In [4]:
(defmethod head ((l list))
    (first l))


Out[4]:
#<STANDARD-METHOD CL-JUPYTER-USER::HEAD (LIST) {10072B5C03}>

In [5]:
(defmethod tail ((l list))
    (rest l))


Out[5]:
#<STANDARD-METHOD CL-JUPYTER-USER::TAIL (LIST) {100736F163}>

In [6]:
(defmethod print-cell ((l list) out)
  (when l
    (format out "~A" (first l))
    (when (rest l)
      (format out " ")
      (print-cell (rest l) out))))


Out[6]:
#<STANDARD-METHOD CL-JUPYTER-USER::PRINT-CELL (LIST T) {10074A14C3}>

In [7]:
(head '(1 2 3 4 5))


Out[7]:
1

In [8]:
(tail '(1 2 3 4 5))


Out[8]:
(2 3 4 5)

In [9]:
(print-cell '(1 2 3 4 5) *standard-output*)


1 2 3 4 5
Out[9]:
NIL

Now, a lazy sequence is designed after the coinductive case: we know that it has a head and a tail and since we want to be lazy, we do not want to say too early what exactly are the head and tail. In particular, the head is not considered a computed value until we need it, and the only thing we know about the tail is that it is itself a lazy sequence.

Strict sequences in Lisp

It is now time to get some actual code supporting the notions discussed above. First, we consider the case of strict sequences. Of course, implementing strict sequences is neither difficult nor increddibly usefull since, as we've seen previously, proper lists serve as quite a good representation of strict sequences. However, I think it is interesting to show the slight modifications involved when going from the strict to the lazy case.

According to the c2 wiki (cf. http://c2.com/cgi/wiki?ConsCell):

ConsCells are the building blocks of lists in LispLanguages.

Our version of a cons cell structure for strict sequences is as follows:


In [10]:
(defstruct cell
    (hd nil)
    (tl nil))


Out[10]:
CELL

So a strict cell has two slots: a computed head hd and a strict tail tl.

And much of what is already done for lists can be done for this simple structure.


In [11]:
(defun ccons (hd tl)
    (make-cell :hd hd :tl tl))


Out[11]:
CCONS

In [12]:
(defmethod head ((c cell))
    (cell-hd c)))


Out[12]:
#<STANDARD-METHOD CL-JUPYTER-USER::HEAD (CELL) {1007C776C3}>

In [13]:
(defmethod tail ((c cell))
    (cell-tl c))


Out[13]:
#<STANDARD-METHOD CL-JUPYTER-USER::TAIL (CELL) {1007D4AC53}>

In [14]:
(defmethod print-cell ((c cell) out)
    (format out "~A" (cell-hd c))
    (when (cell-tl c)
        (format out " ")
        (print-cell (cell-tl c) out))))


Out[14]:
#<STANDARD-METHOD CL-JUPYTER-USER::PRINT-CELL (CELL T) {1007E98D73}>

In [15]:
(defmethod print-object ((c cell) out)
    (format out "#<strict:")
    (print-cell c out)
    (format out ">"))


Out[15]:
#<STANDARD-METHOD COMMON-LISP:PRINT-OBJECT (CELL T) {1007FCE7D3}>

In [16]:
(defparameter s1 (ccons 1 (ccons 2 (ccons 3 nil))))


Out[16]:
S1

In [17]:
s1


Out[17]:
#<strict:1 2 3>

In [18]:
(head s1)


Out[18]:
1

In [19]:
(tail s1)


Out[19]:
#<strict:2 3>

We will stop right now to reimplement a list-like datastructure in Lisp: there's something disturbing about the whole exercise!

Lazy sequences in Lisp

Instead of seeing sctrict sequences are completely disjoint from the lazy ones, I would rather emphasis their similarities. And we can in fact upgrade our strict cells to make them lazier.


In [20]:
(defstruct (lazy-cell (:include cell))
    (genfn nil))


Out[20]:
LAZY-CELL

A lazy cell is a specific kind of cell with two possible states:

  • either it is computed, in which case the genfn slot is nil and its hd and tl slots (included from the cell structure) are directly accessible,

  • or it is not yet computed in which case both the hd and tl slots are nil, and the genfn slot contains a generator function.

Initially a lazy cell must be in the not yet computed state, that's the whole point of laziness !

Before asking for a head or tail, it is necessary to switch to the computed state, which happens thanks to the following function.


In [21]:
(defun compute-lazy-cell (c)
  (let ((val (funcall (lazy-cell-genfn c))))
    (setf (lazy-cell-hd c) (head val))
    (setf (lazy-cell-tl c) (tail val))
    (setf (lazy-cell-genfn c) nil)))


Out[21]:
COMPUTE-LAZY-CELL

An important requirement is that the function above may only be called in the not yet computed state. When it is actually called, the generator function is itself called with the cell as argument. The generator should itself return a sequence with a computed head and a sequence as tail. For truely lazy sequences, the tail should be either empty or a lazy cell. We will see how to fulfill these requirements below. For now, let's just remember that the extracted informations is stored in the two slots hd and tl.

Now we can define the head and tail methods in a straightforward way.


In [22]:
(defmethod head ((c lazy-cell))
  (when (lazy-cell-genfn c)
    (compute-lazy-cell c))
  (lazy-cell-hd c))


Out[22]:
#<STANDARD-METHOD CL-JUPYTER-USER::HEAD (LAZY-CELL) {10088CEF13}>

In [23]:
(defmethod tail ((c lazy-cell))
  (when (lazy-cell-genfn c)
    (compute-lazy-cell c))
  (lazy-cell-tl c))


Out[23]:
#<STANDARD-METHOD CL-JUPYTER-USER::TAIL (LAZY-CELL) {10089A4A03}>

In both cases the first step is to check wether we are in the computed state (i.e. genfn is nil) or not. If not, the compute-lazy-cell function is called, and then the value of the hd (resp. tl) slot is returned.

And that is about all we need for manipulating lazy sequences, only showing things gets sensitive.


In [24]:
(defmethod print-cell ((c lazy-cell) out)
    (let ((cell (loop :for cell = c :then (lazy-cell-tl cell)
                      :for sep = "" :then " "
                      :while (and cell (null (lazy-cell-genfn cell)))
                      :do (progn (princ sep out)
                                 (princ(lazy-cell-hd cell) out))
                      :finally (return cell))))
         (when (and cell (lazy-cell-genfn cell))
            (princ "..." out))))


Out[24]:
#<STANDARD-METHOD CL-JUPYTER-USER::PRINT-CELL (LAZY-CELL T) {1004959443}>

In [25]:
(defmethod print-object ((c lazy-cell) out)
    (format out "#<lazy:") (print-cell c out) (format out ">"))


Out[25]:
#<STANDARD-METHOD COMMON-LISP:PRINT-OBJECT (LAZY-CELL T) {10053DB643}>

Constructing lazy sequences

In order to construct lazy sequences, we can define a lazy variant of cons.


In [27]:
(defmacro lazy-cons (hd tl)
    `(make-lazy-cell :hd nil :tl nil :genfn (lambda () (cons ,hd ,tl))))


SIMPLE-STYLE-WARNING: 
  LAZY-CONS is being redefined as a macro when it was previously assumed to be a function.
Out[27]:
LAZY-CONS

Note that we have to define it as a macro, otherwise we would compute both the hd and tl parameters. The (lambda () ..) is the generator function used to delay the computation of the head and tail. That's almost where all the magic (say "lambda") is.

Constructing a finite sequence

Any finite sequence should be constructible in a lazy way, so let's try.


In [28]:
(defparameter ls1 (lazy-cons (+ 1 0) (lazy-cons (+ 1 1) (lazy-cons (+ 1 2) nil))))


Out[28]:
LS1

In [29]:
LS1


Out[29]:
#<lazy:...>

The output above confirms that nothing has been computed yet. The ellipsis ... marks the not yet computed part of the sequence (everything, for now). Of course, taking either the head or the tail of the lazy sequence will trigger some computation.


In [30]:
(head ls1)


Out[30]:
1

The computed value of the head is 1 as expected, and we can see the effect on the sequence itself.


In [31]:
ls1


Out[31]:
#<lazy:1...>

The computed part of the sequence now comprises the first element. Of course, we can dig a little deeper.


In [32]:
(head (tail ls1))


Out[32]:
2

In [33]:
ls1


Out[33]:
#<lazy:1 2...>

To make the sequence fully computed, we need to dig a little deeper.


In [34]:
(tail (tail (tail ls1)))


Out[34]:
NIL

In [35]:
ls1


Out[35]:
#<lazy:1 2 3>

Note that the list is still considered lazy but it is in computed state now so there's no computation to delay anymore. There's sometimes a point where lazy and strict sequences coincide (another example is the empty list, which is both strict and lazy since there is no computation to perform at all).

One important take away of the previous example is that laziness can only be obtained through side effects although very controlled (and overally "safe") ones.

Constructing an "infinite" sequence

Once computed a sequence must of course be finite, but the lazy sequence abstraction allows to define intentionally inifinite sequences. A good example is the sequence of natural numbers. Let's try a definition using the lazy-cons macro.


In [36]:
(defun nats (&optional (n 0))
    (lazy-cons n (nats (1+ n))))


Out[36]:
NATS

In [37]:
(defparameter ls2 (nats))


Out[37]:
LS2

In [38]:
LS2


Out[38]:
#<lazy:...>

In [39]:
(head ls2)


Out[39]:
0

In [40]:
(head (tail ls2))


Out[40]:
1

In [41]:
(head (tail (tail ls2)))


Out[41]:
2

In [42]:
ls2


Out[42]:
#<lazy:0 1 2...>

In this state, only the finite prefix 0 1 2 of the lazy sequence has been computed. To dig deeper, we can start to implement some utility functions.

The first one consists in taking a finite prefix of the sequence, materializing it (of course!) as a list.


In [43]:
(defun take (n s)
    "Returns the list of the `n` first elements of the sequence `s`."
    (loop repeat n
          for cell = s then (tail cell)
          when cell
      collect (head cell)))


Out[43]:
TAKE

In [44]:
(take 10 ls2)


Out[44]:
(0 1 2 3 4 5 6 7 8 9)

In [45]:
ls2


Out[45]:
#<lazy:0 1 2 3 4 5 6 7 8 9...>

Note that since we keep a reference to the first cell (with head 0), the more we consume informations in the LS2 sequence the more we increase the amount of memory required to store the sequence cells. There is indeed a Clojure proverb saying:

"Don't hold to your head!"

The example above is a good illustration of what happens when you do.

One way to skip unwanted cells is to use the drop function defined below:


In [46]:
(defun drop (n s)
    "Drops the first `n` elements of the sequence `s`."
    (loop repeat n
          for cell = s then (tail cell)
          when (not cell) do (return cell)
          finally (return cell)))


Out[46]:
DROP

Let's try to drop a few natural numbers.


In [47]:
(drop 10 (nats 0))


Out[47]:
#<lazy:...>

The result is a lazy sequence without any computed cell... Of course, it's more useful if we take some information from the sequence:


In [48]:
(head (drop 10 (nats 0)))


Out[48]:
9

In [49]:
(take 10 (drop 10000 (nats 1)))


Out[49]:
(10000 10001 10002 10003 10004 10005 10006 10007 10008 10009)

Of course, be careful not holding to your head as in the take case.


In [50]:
(drop 100 ls2)


Out[50]:
#<lazy:...>

In [51]:
ls2


Out[51]:
#<lazy:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98...>

The lazy-seq macro

Clojure does not provide (anymore) a lazy-cons macro, but rather allows to mix strict and lazy sequences using a simple albeit useful lazy-seq macro.

The way I understand it, is that the cons part remains implicit, as follows:


In [52]:
(defmacro lazy-seq (expr)
    `(make-lazy-cell :hd nil :tl nil :genfn (lambda () ,expr)))


Out[52]:
LAZY-SEQ

The consing part is now the responsability of the expr argument of lazy-seq.

Let's port the iterate function of the Clojure standard library using this macro.


In [53]:
(defun iterate (f x)
    (lazy-seq (cons x (iterate f (funcall f x)))))


Out[53]:
ITERATE

In [54]:
(take 10 (iterate #'1+ 1))


Out[54]:
(1 2 3 4 5 6 7 8 9 10)

The three stooges: map, filter and reduce

Lazy sequences, as you can see, are not difficult to introduce in Common Lisp. And thanks to the lazy-seq macro it is rather easy to write generator functions such as iterate above. We will also see that sequences are an adequate tool to perform traversals of structured data.

But once the sequences have been created, what we want to do is to manipulate them. The three most important manipulations are: mappings, filterings and reductions. The corresponding functions are only a few lines aways.

Because there is already a map in Common Lisp (although not that useful), let call our version maps for mapping sequence.


In [56]:
(defun maps (f s)
    (when s
        (lazy-seq (cons (funcall f (head s)) (maps f (tail s))))))


Out[56]:
MAPS

In [57]:
(take 10 (maps (lambda (x) (* x 10)) (nats 1)))


Out[57]:
(10 20 30 40 50 60 70 80 90 100)

Filtering is not much more difficult.


In [58]:
(defun filters (pred s)
    (when s
        (if (funcall pred (head s))
            (lazy-seq (cons (head s) (filters pred (tail s))))
            (filters pred (tail s)))))


Out[58]:
FILTERS

In [59]:
(take 10 (filters #'evenp (nats 1)))


Out[59]:
(2 4 6 8 10 12 14 16 18 20)

In [60]:
(take 10 (filters #'oddp (nats 1)))


Out[60]:
(1 3 5 7 9 11 13 15 17 19)

The reduction function is very powerfull, although unlike the map and filter cases, it requires the whole sequence to be computed. Efficiency is prominent here, and we have to be careful not feeding the reduction with an infinite sequence.


In [61]:
(defun reduces (f init s)
    (loop for cell = s then (tail cell)
          and acc = init then (funcall f acc (head cell))
          when (not cell) do (return acc)))


Out[61]:
REDUCES

In [62]:
(reduces #'+ 0 (take 10 (nats 1))))


Out[62]:
55

Reading list

Of course there are many things to read if you are interested in lazy sequences and related topics (and I will happily read more and update this section if neeed).

On the theoretical side of things, there is a rich literature on the subject but this is not an academic paper so I will only point again to an excellent starting point: the book Introduction to Bisimulation and Coinduction, cf. http://www.cs.unibo.it/~sangio/IntroBook.html.

The Scheme literature is a good source of information where lazy sequences are there most generally called (computed) streams. The SICP (https://mitpress.mit.edu/sicp/) is of course quite an interesting source of information.

Clojure (https://clojure.org/) is interesting in that it is a modern take on the subject. A significant part of the Clojure library is based on a similar lazy sequence abstraction, and it is obviously my main source of inspiration. If there are interesting books about Clojure (well, not all of them), they are in general rather short on the technical details concering some aspects of the implementation, especially lazy sequences. This is probably related to the fact that most of these technical details are on the Java side of things... Of course, Haskell is an excellent example, in that lazy sequences are simply ... lists since (almost) everything is lazy there. There are related concepts in many languages, e.g. Python generators, Ruby blocks, Java streams, etc.

As far as Common Lisp is concerned, there is an interesting discussion in PAIP (http://norvig.com/paip.html) where the focus is on minimizing the memory usage. I would say that it is less critical now than it was in the 90's so I prefer to rely on structures (and I think it would not hurt much to use classes, although I like to compartmentalize a little bit: see classes as classes, and structures as (extensible) records).

The final word ...

In a few lines of Lisp we were able to implement (a prototypal version of) the lazy sequence abstraction. The lisp-lazy-seq library contains much more facilities for your enjoyment and you now know how all this works.

That's all folks!


In [ ]: