Copyright (C) 2016 F. Peschanski, CC-BY-SA 3.0
The Clojure programming language comes with a versatile lazy sequence abstraction, which also exists under other names such as generators in e.g. Python, (implicitly) lazy lists in e.g. Haskell, streams in e.g. Ocaml, etc.
There is nothing like in the Common Lisp standard, in particular the CL sequences are IMHO less interesting (providing a too restricted kind of polymorphism).
In this document I address the following question:
And as it is almost always the case in Common Lisp, the answer is :
And the remaining of this document is the way I would do it.
Disclaimer : the Lisp code below is not of production quality, and it is by no mean a port of the Clojure code, although I guess there aren't a million ways to implement the lazy sequence abstraction...
Let's start with the basics.
According to https://en.wikipedia.org/wiki/Sequence,
a (mathematical) sequence is an ordered collection of objects in which repetitions are allowed.
Put in other terms, this is an ordered way to encode a multiset, and the simplest encoding is to associate to each element of the multiset a natural number giving its "position" in the ordering relation. The first element is at position 0, the second at position 1, etc. This gives a functional interpretation of a sequence. In this interpretation a sequence can have a finite or infinite domain, in which case we say that the sequence is finite or infinite.
There is also an inductive definition of a finite sequence, being the smallest set composed of either :
the empty sequence nil
or a non-empty sequence, cons
tructed from a first
element and the rest
of the sequence.
You then obtained the definition of a list
, our ubiquitous data structure (well, its proper, immutable variant) ! The guarantee that the sequence is finite here is that we take the smallest set satisfying the rules.
But what about infinite sequences ?
There is indeed a related mathematical definition of an infinite sequence, being the largest set composed of
head
elements followed by an infinite tail
sequenceNote that there is no base case and moreover we consider this time the largest set satisfying the unique rule. Such definition is said coinductive and though it is a very interesting theory, a very concrete take on infinity, we have everything we need to go back to the practical side of things.
Interestingly, the inductive and coinductive definitions are compatible, identifying first
with head
, and rest
with tail
.
So our short mathematical detour leads to the following, conceptually important but somewhat bare Lisp code :
In [1]:
(defgeneric head (sequence)
(:documentation "Taking the head of the `sequence`."))
Out[1]:
In [2]:
(defgeneric tail (sequence)
(:documentation "Taking the tail of the `sequence`"))
Out[2]:
For pragmatic reasons we will also need a helper function for printing sequences.
In [3]:
(defgeneric print-cell (c out)
(:documentation "A printer for cells."))
Out[3]:
Let's go back to computers and programming... In a computer everything's, in a way, is finite so the finite/infinite dichotomy is at best a view of the mind. We can perhaps more interestingly talk about stict vs. lazy sequences. Both are in fact finite, even though it is sometimes useful to consider lazy sequences as "infinite" (please do mind the double quotes).
A strict sequence follows the inductive definition above: it is either empty or non-empty, providing a head value and a strict tail sequence. A consequence is that all the elements of strict sequences are computed values.
As long as you abolish setf
-ing things, we can safely identify strict sequences with proper lists, hence the following Lisp definitions.
In [4]:
(defmethod head ((l list))
(first l))
Out[4]:
In [5]:
(defmethod tail ((l list))
(rest l))
Out[5]:
In [6]:
(defmethod print-cell ((l list) out)
(when l
(format out "~A" (first l))
(when (rest l)
(format out " ")
(print-cell (rest l) out))))
Out[6]:
In [7]:
(head '(1 2 3 4 5))
Out[7]:
In [8]:
(tail '(1 2 3 4 5))
Out[8]:
In [9]:
(print-cell '(1 2 3 4 5) *standard-output*)
Out[9]:
Now, a lazy sequence is designed after the coinductive case: we know that it has a head and a tail and since we want to be lazy, we do not want to say too early what exactly are the head and tail. In particular, the head is not considered a computed value until we need it, and the only thing we know about the tail is that it is itself a lazy sequence.
It is now time to get some actual code supporting the notions discussed above. First, we consider the case of strict sequences. Of course, implementing strict sequences is neither difficult nor increddibly usefull since, as we've seen previously, proper lists serve as quite a good representation of strict sequences. However, I think it is interesting to show the slight modifications involved when going from the strict to the lazy case.
According to the c2 wiki (cf. http://c2.com/cgi/wiki?ConsCell):
ConsCells are the building blocks of lists in LispLanguages.
Our version of a cons cell structure for strict sequences is as follows:
In [10]:
(defstruct cell
(hd nil)
(tl nil))
Out[10]:
So a strict cell has two slots: a computed head hd
and a strict tail tl
.
And much of what is already done for lists can be done for this simple structure.
In [11]:
(defun ccons (hd tl)
(make-cell :hd hd :tl tl))
Out[11]:
In [12]:
(defmethod head ((c cell))
(cell-hd c)))
Out[12]:
In [13]:
(defmethod tail ((c cell))
(cell-tl c))
Out[13]:
In [14]:
(defmethod print-cell ((c cell) out)
(format out "~A" (cell-hd c))
(when (cell-tl c)
(format out " ")
(print-cell (cell-tl c) out))))
Out[14]:
In [15]:
(defmethod print-object ((c cell) out)
(format out "#<strict:")
(print-cell c out)
(format out ">"))
Out[15]:
In [16]:
(defparameter s1 (ccons 1 (ccons 2 (ccons 3 nil))))
Out[16]:
In [17]:
s1
Out[17]:
In [18]:
(head s1)
Out[18]:
In [19]:
(tail s1)
Out[19]:
We will stop right now to reimplement a list-like datastructure in Lisp: there's something disturbing about the whole exercise!
In [20]:
(defstruct (lazy-cell (:include cell))
(genfn nil))
Out[20]:
A lazy cell is a specific kind of cell with two possible states:
either it is computed, in which case the genfn
slot is nil
and its hd
and tl
slots (included from the cell
structure) are directly accessible,
or it is not yet computed in which case both the hd
and tl
slots are nil
, and the genfn
slot contains a generator function.
Initially a lazy cell must be in the not yet computed state, that's the whole point of laziness !
Before asking for a head or tail, it is necessary to switch to the computed state, which happens thanks to the following function.
In [21]:
(defun compute-lazy-cell (c)
(let ((val (funcall (lazy-cell-genfn c))))
(setf (lazy-cell-hd c) (head val))
(setf (lazy-cell-tl c) (tail val))
(setf (lazy-cell-genfn c) nil)))
Out[21]:
An important requirement is that the function above may only be called in the not yet computed state. When it is actually called, the generator function is itself called with the cell as argument. The generator should itself return a sequence with a computed head and a sequence as tail. For truely lazy sequences, the tail should be either empty or a lazy cell. We will see how to fulfill these requirements below. For now, let's just remember that the extracted informations is stored in the two slots hd
and tl
.
Now we can define the head and tail methods in a straightforward way.
In [22]:
(defmethod head ((c lazy-cell))
(when (lazy-cell-genfn c)
(compute-lazy-cell c))
(lazy-cell-hd c))
Out[22]:
In [23]:
(defmethod tail ((c lazy-cell))
(when (lazy-cell-genfn c)
(compute-lazy-cell c))
(lazy-cell-tl c))
Out[23]:
In both cases the first step is to check wether we are in the computed state (i.e. genfn
is nil
) or not. If not, the compute-lazy-cell
function is called, and then the value of the hd
(resp. tl
) slot is returned.
And that is about all we need for manipulating lazy sequences, only showing things gets sensitive.
In [24]:
(defmethod print-cell ((c lazy-cell) out)
(let ((cell (loop :for cell = c :then (lazy-cell-tl cell)
:for sep = "" :then " "
:while (and cell (null (lazy-cell-genfn cell)))
:do (progn (princ sep out)
(princ(lazy-cell-hd cell) out))
:finally (return cell))))
(when (and cell (lazy-cell-genfn cell))
(princ "..." out))))
Out[24]:
In [25]:
(defmethod print-object ((c lazy-cell) out)
(format out "#<lazy:") (print-cell c out) (format out ">"))
Out[25]:
In [27]:
(defmacro lazy-cons (hd tl)
`(make-lazy-cell :hd nil :tl nil :genfn (lambda () (cons ,hd ,tl))))
Out[27]:
Note that we have to define it as a macro, otherwise we would compute both the hd
and tl
parameters. The (lambda () ..)
is the generator function used to delay the computation of the head and tail. That's almost where all the magic (say "lambda") is.
Any finite sequence should be constructible in a lazy way, so let's try.
In [28]:
(defparameter ls1 (lazy-cons (+ 1 0) (lazy-cons (+ 1 1) (lazy-cons (+ 1 2) nil))))
Out[28]:
In [29]:
LS1
Out[29]:
The output above confirms that nothing has been computed yet. The ellipsis ...
marks the not yet computed part of the sequence (everything, for now). Of course, taking either the head or the tail of the lazy sequence will trigger some computation.
In [30]:
(head ls1)
Out[30]:
The computed value of the head is 1 as expected, and we can see the effect on the sequence itself.
In [31]:
ls1
Out[31]:
The computed part of the sequence now comprises the first element. Of course, we can dig a little deeper.
In [32]:
(head (tail ls1))
Out[32]:
In [33]:
ls1
Out[33]:
To make the sequence fully computed, we need to dig a little deeper.
In [34]:
(tail (tail (tail ls1)))
Out[34]:
In [35]:
ls1
Out[35]:
Note that the list is still considered lazy but it is in computed state now so there's no computation to delay anymore. There's sometimes a point where lazy and strict sequences coincide (another example is the empty list, which is both strict and lazy since there is no computation to perform at all).
One important take away of the previous example is that laziness can only be obtained through side effects although very controlled (and overally "safe") ones.
In [36]:
(defun nats (&optional (n 0))
(lazy-cons n (nats (1+ n))))
Out[36]:
In [37]:
(defparameter ls2 (nats))
Out[37]:
In [38]:
LS2
Out[38]:
In [39]:
(head ls2)
Out[39]:
In [40]:
(head (tail ls2))
Out[40]:
In [41]:
(head (tail (tail ls2)))
Out[41]:
In [42]:
ls2
Out[42]:
In this state, only the finite prefix 0 1 2
of the lazy sequence has been computed. To dig deeper, we can start to implement some utility functions.
The first one consists in taking a finite prefix of the sequence, materializing it (of course!) as a list.
In [43]:
(defun take (n s)
"Returns the list of the `n` first elements of the sequence `s`."
(loop repeat n
for cell = s then (tail cell)
when cell
collect (head cell)))
Out[43]:
In [44]:
(take 10 ls2)
Out[44]:
In [45]:
ls2
Out[45]:
Note that since we keep a reference to the first cell (with head 0), the more we consume informations in the LS2
sequence the more we increase the amount of memory required to store the sequence cells. There is indeed a Clojure proverb saying:
"Don't hold to your head!"
The example above is a good illustration of what happens when you do.
One way to skip unwanted cells is to use the drop
function defined below:
In [46]:
(defun drop (n s)
"Drops the first `n` elements of the sequence `s`."
(loop repeat n
for cell = s then (tail cell)
when (not cell) do (return cell)
finally (return cell)))
Out[46]:
Let's try to drop a few natural numbers.
In [47]:
(drop 10 (nats 0))
Out[47]:
The result is a lazy sequence without any computed cell... Of course, it's more useful if we take some information from the sequence:
In [48]:
(head (drop 10 (nats 0)))
Out[48]:
In [49]:
(take 10 (drop 10000 (nats 1)))
Out[49]:
Of course, be careful not holding to your head as in the take
case.
In [50]:
(drop 100 ls2)
Out[50]:
In [51]:
ls2
Out[51]:
In [52]:
(defmacro lazy-seq (expr)
`(make-lazy-cell :hd nil :tl nil :genfn (lambda () ,expr)))
Out[52]:
The cons
ing part is now the responsability of the expr
argument of lazy-seq
.
Let's port the iterate
function of the Clojure standard library using this macro.
In [53]:
(defun iterate (f x)
(lazy-seq (cons x (iterate f (funcall f x)))))
Out[53]:
In [54]:
(take 10 (iterate #'1+ 1))
Out[54]:
Lazy sequences, as you can see, are not difficult to introduce in Common Lisp. And thanks to the lazy-seq
macro it is rather easy to write generator functions such as iterate
above. We will also see that sequences are an adequate tool to perform traversals of structured data.
But once the sequences have been created, what we want to do is to manipulate them. The three most important manipulations are: mappings, filterings and reductions. The corresponding functions are only a few lines aways.
Because there is already a map
in Common Lisp (although not that useful), let call our version maps
for mapping sequence.
In [56]:
(defun maps (f s)
(when s
(lazy-seq (cons (funcall f (head s)) (maps f (tail s))))))
Out[56]:
In [57]:
(take 10 (maps (lambda (x) (* x 10)) (nats 1)))
Out[57]:
Filtering is not much more difficult.
In [58]:
(defun filters (pred s)
(when s
(if (funcall pred (head s))
(lazy-seq (cons (head s) (filters pred (tail s))))
(filters pred (tail s)))))
Out[58]:
In [59]:
(take 10 (filters #'evenp (nats 1)))
Out[59]:
In [60]:
(take 10 (filters #'oddp (nats 1)))
Out[60]:
The reduction function is very powerfull, although unlike the map and filter cases, it requires the whole sequence to be computed. Efficiency is prominent here, and we have to be careful not feeding the reduction with an infinite sequence.
In [61]:
(defun reduces (f init s)
(loop for cell = s then (tail cell)
and acc = init then (funcall f acc (head cell))
when (not cell) do (return acc)))
Out[61]:
In [62]:
(reduces #'+ 0 (take 10 (nats 1))))
Out[62]:
Of course there are many things to read if you are interested in lazy sequences and related topics (and I will happily read more and update this section if neeed).
On the theoretical side of things, there is a rich literature on the subject but this is not an academic paper so I will only point again to an excellent starting point: the book Introduction to Bisimulation and Coinduction, cf. http://www.cs.unibo.it/~sangio/IntroBook.html.
The Scheme literature is a good source of information where lazy sequences are there most generally called (computed) streams. The SICP (https://mitpress.mit.edu/sicp/) is of course quite an interesting source of information.
Clojure (https://clojure.org/) is interesting in that it is a modern take on the subject. A significant part of the Clojure library is based on a similar lazy sequence abstraction, and it is obviously my main source of inspiration. If there are interesting books about Clojure (well, not all of them), they are in general rather short on the technical details concering some aspects of the implementation, especially lazy sequences. This is probably related to the fact that most of these technical details are on the Java side of things... Of course, Haskell is an excellent example, in that lazy sequences are simply ... lists since (almost) everything is lazy there. There are related concepts in many languages, e.g. Python generators, Ruby blocks, Java streams, etc.
As far as Common Lisp is concerned, there is an interesting discussion in PAIP (http://norvig.com/paip.html) where the focus is on minimizing the memory usage. I would say that it is less critical now than it was in the 90's so I prefer to rely on structures (and I think it would not hurt much to use classes, although I like to compartmentalize a little bit: see classes as classes, and structures as (extensible) records).
In [ ]: