Lists and collections are the basis of Clojure. But, as a Clojure noob, I was constantly getting back unexpected results from applying conj
, cons
, concat
, mapcat
, into
, etc. to my data structures. My elements would appear at the head or tail position, or the returned collection would be nested. Sometimes my code would even throw exceptions. I found numerous resources and stack overflow posts to answer my individual questions, but nothing that answered all of them. So, I wrote up the following summary and cheat sheet.
Handy Tables
Intro to Clojure data structures
First, what’s the difference between a seq and a collection? At a high level, seq is an interface that provides a way to operate on individual items as a stream, while a collection is simply a data structure. Since a seq is a stream, it can be lazy, but doesn’t have to be. The formal definition of the seq interface is anything that responds to first
and rest
. At a practical level, all of the common data structures (vector, list, map, set) are collections, and list is also a concrete implementation of seq. Furthermore, all of the non-seq collections can be converted to a seq using (seq coll)
. A note on performance: count is an expensive operation, O(n), on a seq (which might be infinite). In contrast, count is constant, O(1), for a collection.
Vectors and lists behave differently under different circumstances, and respond differently to conj. More information on StackOverflow. Generally, we prefer using vectors over lists, but be aware the Clojure code itself is a list, not a vector. From Clojure Brave and True, “A good rule of thumb is that if you need to easily add items to the beginning of a sequence or if you’re writing a macro, you should use a list. Otherwise, you should use a vector.“
Code Examples
Below is a summary of a few of the common commands applied to different data structures. Note that the methods often do things that you probably don’t want when applied to maps and sets. Also, be aware that the return type from these functions is often NOT what you put in, which has some interesting consequences, especially when chaining operations.
Conj
Conj (conj coll x & xs)
— “Conjoin” onto a collection or seq, returns a new collection of the type of coll.
- Existing collection is passed first.
- Adds the elements(s) in the most efficient location possible.
Vector — adds the element at the end.
user=> (conj [1 2] 3)
[1 2 3]
List — adds the element at the beginning.
user=> (conj `(1 2) 3)
(3 1 2)
Map — combines the maps, overwriting values for existing keys. Can add vectors of length 2 to a map.
user=> (conj {:a 1 :b 2} {:c 3 :b 4})
{:a 1, :b 4, :c 3}
user=> (conj {:a 1 :b 2} [:c 3])
{:a 1, :b 2, :c 3}
Set — adds the element if not present.
user=> (conj #{1 2 3} 3 4)
#{1 4 3 2}
Cons
Cons (cons x seq)
— “Construct” a sequence, returns a seq.
- Existing collection is passed second.
- First item is not expanded.
- Usually the returned seq is a cons object, but not always.
Vector — adds the element at the beginning.
user=> (cons 3 [1 2])
(3 1 2)
user=> (cons [1 2] [3 4])
([1 2] 3 4)
List — adds the element at the beginning.
user=> (cons 3 `(1 2))
(3 1 2)
user=> (cons `(1 2) `(3 4))
((1 2) 3 4)
Map — converts the map to a vector, prepends the argument.
user=> (cons {:a 1 :b 2} {:b 3})
({:a 1, :b 2} [:b 3])
Set — converts the second argument to a seq, prepends the first argument to it, regardless of whether it was in the set or not.
user=> (cons 2 #{1 2})
(2 1 2)
An example of cons
returning a List instead of a Cons object:
user=> (class (cons 1 nil))
clojure.lang.PersistentList
Concat
Concat (concat c1 c2 & colls)
— Merges existing collections, returns lazy seq.
- Takes collections only, and flattens them one level.
Vectors — combines in order and returns a lazy seq.
user=> (concat [1 2] [3] [4 5])
(1 2 3 4 5)
Lists — combines in order and returns a lazy seq.
user=> (concat `(1 2) `(3) `(4 5))
(1 2 3 4 5)
Maps (see merge to return a map) — converts key-value pairs to vector pairs.
user=> (concat {:a 1 :b 2} {:c 3})
([:a 1] [:b 2] [:c 3])
Sets (see union to return a set) — adds each element to the returned lazy seq, including duplicates.
user=> (concat #{1 2} #{2} #{4 5})
(1 2 2 4 5)
Can combine types
user=> (concat `(1 2) [3] #{4 5})
(1 2 3 4 5)
A quick gotcha, beware of mixing concat, which generates a lazy seq, with a non lazy loop, as it can result in a stack overflow error. More information here: https://stuartsierra.com/2015/04/26/clojure-donts-concat.
Into
Into (into to from)
(into to xform from)
— Uses conj repeatedly to put the items into the new collection. Returns an object of the type of “to”.
user=> (into [1 2] ‘(3 4))
[1 2 3 4]
Mapcat
Mapcat (mapcat f & colls)
— Map then concat. Returns the result of applying concat to the result of applying map to f and colls. Returns a lazy seq.
user=> (mapcat reverse [[1 3] [4 5]])
(3 1 5 4)
Be aware that the consequence of applying concat is essentially one level of flattening. For instance, when using just map
with the above example, the returned list has nested lists:
user=> (map reverse [[1 3] [4 5]])
((3 1) (5 4))
Thus, if you’re getting an error like IllegalArgumentException Don’t know how to create ISeq from: java.lang.Long clojure.lang.RT.seqFrom (RT.java:542)
, that probably means you can just use `map` instead of `mapcat`.
Prepend vs. Append
No discussion would be complete without touching on the infamous prepending vs. appending to vectors and lists. Vectors are designed for appending, while lists are designed for prepending. If you want to prepend to a vector, you can do something like:
user=> (into [:foo] [:bar :baz])
[:foo :bar :baz]
However, be aware that the performance of this technique is O(n). Likewise, if you want to append to a list, you can do:
user=> (concat `(1 2) [3])
(1 2 3)
However, be aware that this returns a lazy seq, which you will need to convert back to a list if required.
For next time…
So, that was complicated. In my next post, I’m going to ask the question, why is adding to collections so complex?