Protocols: Clojure’s Polymorphic Magic

Published in

helpshift-engineering

9 min readFeb 28, 2024

In this post, we will try to understand how protocols work in Clojure. As an example, we will see how clojure/core.cache, a well-known Clojure library, implements caching algorithms using protocols.

Polymorphism In Clojure

Polymorphism is a design principle which proposes using a single interface for multiple (more specific) methods. In other words, it refers to the practice of using a common interface for a general class of actions, where the entity-specific implementations can differ based on the type of the entity. The advantage of using this practice is that a lay user can use the interface for any given entity, without having to worry about the specific implementation for that entity. More significantly, one need not evaluate each available implementation to determine the one that fits the entity at hand.

In Clojure, polymorphism is implemented using protocols. Because protocols work with records, let’s understand what records are.

Understanding records

Using defrecord we can create a map-like data type namely record, which is a key-value data structure. They work much like a Clojure map: we can destructure them, look-up values by using keyword, i.e., (:keyword m), use assoc to add new key-value pairs, etc.

But there are important differences between a simple Clojure map and a record. For example, when we create a record using defrecord, we are creating a Java class with a constructor, where the different key-value pairs represent the different members of the class. The constructor is generated by Clojure automatically. Because every Java Class inherits the methods of the Object class, methods of Object class such as hashCode and equals are available for records as well. Note that the default implementation of these two methods is overridden by defrecord’s constructor, such that two instances of the same type will have the same hashcode and be considered equal if their constituents (key-value pairs) are the same.

(defrecord Cube [width length height])

(def my-cube (Cube. 11 11 11))
(def another-cube (Cube. 11 11 11))

(.hashCode my-cube)
;;=> -1060357495

(.hashCode another-cube)
;;=> -1060357495

(.equals my-cube another-cube)
;;=> true

There are other differences between a Clojure map and a record: records are more performant, take less memory, they cannot act as functions etc.

Understanding Protocols

The most crucial difference between a record and a map is that a record can be used to implement a protocol in Clojure (because it is a Java class).

Protocols are a set of function signatures — grouped under a common name. They are like interfaces in Java — the implementation of each function is left to the specific classes themselves. This means that we can define a protocol with its constituent methods, and leave the implementations to each specific classes that wish to extend those methods to themselves.

(defprotocol SimpleProtocol
  "A simple protocol for showing how to declare protocols"
  (foo [x y] "A foo method for this simple protocol")
  (bar [a] "A bar method for this simple protocol"))

In the above example, we created a protocol (SimpleProtocol) which has two methods (foo and bar). Note how we declare the signature of the methods along with a helpful doc-string but do not specify their exact implementations.

It’s important here to note that the first argument of a protocol method should always be an instance of a Java Class (which includes a record). This is because the type of the first argument is used to dispatch the type-specific implementation of that function.

Using `extend-type` to implement a protocol

We can use extend-type to extend a protocol to an existing data-type (such as a record).

Let’s first create a protocol called Log which can be used to implement different forms of logging for different data-types. To keep things simple, this protocol has only two methods log-only - for simply logging a given payload and log-level - for logging the given payload along with some additional data.

(defprotocol Log
  (log-only [m] "Log the object")
  (log-level [m severity] "Log the object with severity information"))

Let’s implement this for String class first:

(extend-type String
  Log
  (log-only [m] (println m))
  (log-level [m severity] (println (format "[%s]\n%s" severity m))))
  
(log-only "Server started...")
;;=> Server started...
(log-level "Server failed to start!" "ERROR")
;;=> [ERROR]
;;=> Server failed to start!

Now, let’s create a new record:

(defrecord ExampleMessage [ts msg severity])

(ExampleMessage. (System/currentTimeMillis) "System working" "Info")
;;=> {:ts 1707877144399, :msg "System working", :severity "Info"}

We can now implement the Log protocol for this new record:

(extend-type ExampleMessage
  Log
  (log-only [m] (clojure.pprint/pprint m))
  (log-level [m severity] (println (format "%s:\n" (clojure.string/upper-case severity))
                                   (select-keys m [:msg :ts]))))
                                   
(def sample-message (ExampleMessage. (System/currentTimeMillis) "Exception XYZ at ABC ..." "CRITICAL"))

(log-only sample-message)
;;=> {:ts 1707877683229,
;;=>  :msg "Exception XYZ at ABC ...",
;;=>  :severity "CRITICAL"}

(log-level sample-message (:severity sample-message))
;;=> CRITICAL:
;;=> {:msg Exception XYZ at ABC ..., :ts 1707877683229}

Important to note here is that with extend-type we can extend any protocol to an existing object. As shown above, the String class was not written to implement our custom protocol Log but we could still do it using extend-type. This means protocols defined by third-party libraries can be implemented for our application-specific object, without a sweat! And the converse is also possible, that is, we can extend application-specific protocols to third-party or built-in classes.

Implementing a protocol within `defrecord`

There is another way to implement a protocol for a record type: we can implement a protocol while declaring the record itself:

(defrecord ExampleMessage [ts msg severity]
  Log
  (log-only [m] (clojure.pprint/pprint m))
  (log-level [m severity] (println (format "%s:\n" (clojure.string/upper-case severity))
                                   (select-keys m [:msg :ts]))))

(def another-message (ExampleMessage. (System/currentTimeMillis) "System working..." "INFO"))
(log-only another-message)
;;=> {:ts 1707877985573, :msg "System working...", :severity "INFO"}

(log-level another-message (:severity another-message))
;;=> INFO:
;;=>  {:msg System working..., :ts 1707877985573}

Applying protocols within the defrecord form has its advantages: methods implemented this way are several times more performant than those extended to a record type using extend-type. Also, this allows functions to refer to the fields of the record type directly, i.e., using ts directly in a form rather than (:ts m)

Beauty of clojure.core/cache

While thinking about caching, choosing the most optimum cache replacement policy becomes crucial. There are many commonly used cache replacement algorithms, including Least-Recently Used (LRU), First-In-First-Out (FIFO), Least-Frequently-Used(LFU), etc.

One way to write a caching library would be to write a separate and independent set of methods for each type of caching policy. The downside of doing this is that the end-user cannot assume any commonality in the API of the corresponding functions of the different types of caches. Thus, one would have to go through the implementation of each operation to be sure of the respective context.

Another — more idiomatic — way of doing this would be to use a protocol: we define the common operations related to any form of cache and then leave the implementations to the specific caching types. This is exactly how clojure/core.cache does it.

(defprotocol CacheProtocol
  "This is the protocol describing the basic cache capability."
  (lookup [cache e]
          [cache e not-found]
   "Retrieve the value associated with `e` if it exists, else `nil` in
   the 2-arg case.  Retrieve the value associated with `e` if it exists,
   else `not-found` in the 3-arg case.")
  (has?    [cache e]
   "Checks if the cache contains a value associated with `e`")
  (hit     [cache e]
   "Is meant to be called if the cache is determined to contain a value
   associated with `e`")
  (miss    [cache e ret]
   "Is meant to be called if the cache is determined to **not** contain a
   value associated with `e`")
  (evict  [cache e]
   "Removes an entry from the cache")
  (seed    [cache base]
   "Is used to signal that the cache should be created with a seed.
   The contract is that said cache should return an instance of its
   own type."))

Thus, irrespective of the specific implementation of the cache you use, you know the function signature will remain the same for the respective method, and so will its context. Only the underlying implementation will differ — something the end-user need not be concerned about while interacting with the library.

clojure/cache.core uses defcache for creating a new type of cache and implementing the CacheProtocol . defcache is a convenience macro, that apart from implementing the CacheProtocol, also implements “Clojure's associative data structure protocols as well” as per the CacheProtocol implementation. That is to say, it ensures that when we use assoc, on a cache object, the corresponding implementation of miss will be triggered, and when we use contains?, the respective has implementation will be triggered, and so on.

Creating a custom cache using CacheProtocol

Now, let’s create a new type of cache which implements the Random Replacement (RR) Policy. As per this algorithm, when the cache reaches its capacity, a key would be randomly selected for eviction.

To create this cache, we must pass the following parameters:

cache: a map
threshold: the capacity for the cache

(:require [clojure.core.cache :as ccc])

(ccc/defcache RRCache [cache threshold]
...
)

Let’s start with implementing the trivial methods: lookup has? hit

(:require [clojure.core.cache :as ccc])

(ccc/defcache RRCache [cache ks threshold]
  ccc/CacheProtocol
  (ccc/lookup [_ k]
          (get cache k))
  (ccc/lookup [_ k not-found]
          (get cache k not-found))
  (ccc/has? [_ k]
        (contains? cache k)) 
  (ccc/hit [this k]
           this))

Let’s implement miss now. If the cache size does not reach the specified threshold, we can simply return a new cache object, with the new key-value pair. If, however, the cache is at full capacity, we will have to first select a random key and then return a new cache object without this randomly selected key.

(:require [clojure.core.cache :as ccc])

(ccc/defcache RRCache [cache ks threshold]
  ccc/CacheProtocol
  ...
  (ccc/miss [_ k v]
            (if (>= (count cache) threshold)
              (let [ks (keys cache)
                    evicting-k (rand-nth ks)
                    new-cache (dissoc cache evicting-k)]
                (RRCache. (assoc new-cache k v)
                          threshold))
              (RRCache. (assoc cache k v)
                        threshold))))

Let’s implement the remaining methods., evict and seed. Note that seed is used for populating the cache with a set of key-value pairs.

(:require [clojure.core.cache :as ccc])

(ccc/defcache RRCache [cache ks threshold]
  ccc/CacheProtocol
  ...
  (ccc/evict [_ k]
         (RRCache. (dissoc cache k)
                   threshold))
  (ccc/seed [_ base]
        (RRCache. base
                  threshold)))

This completes our own implementation of RRCache. Now, to create an instance of this cache, we need to create a cache-factory

(:require [clojure.core.cache :as ccc])

(defn rr-cache-factory
  [base & {threshold :threshold :or {threshold 32}}]
  (ccc/seed (RRCache. {} threshold) base))

Let’s try this out:

(def my-cache (atom (rr-cache-factory {:a 11 :b 22 :c 33 :d 44} :threshold 5)))
;;=> #'user.core/my-cache

(swap! my-cache ccc/miss :e 555)
;;=> {:a 11, :b 22, :c 33, :d 44, :e 555}

(swap! my-cache ccc/miss :f 666)
;;=> {:b 22, :c 33, :d 44, :e 555, :f 666}

(swap! my-cache ccc/miss :g 777)
;;=> {:c 33, :d 44, :e 555, :f 666, :g 777}

(swap! my-cache ccc/hit :f)
;;=> {:c 33, :d 44, :e 555, :f 666, :g 777}

(swap! my-cache ccc/hit :a)
;;=> {:c 33, :d 44, :e 555, :f 666, :g 777}

(ccc/has? @my-cache :a)
;;=> false

(ccc/has? @my-cache :f)
;;=>true

(ccc/lookup @my-cache :f)
;;=> 666

Conclusion

In this post, we explored how the concept of polymorphism is available in Clojure: using protocols and records, we can make our code extensible and re-usable in new and different contexts, by allowing a set of given operations (or methods) to be custom-implemented for a wide variety of different data-types.

This lets Clojure make the best of both the worlds of functional programming and object-oriented programming. In an object-oriented programming language, like Java, objects are first-class entities and, therefore, methods, can reside only inside objects. Thus, only the respective class can decide whether to extend a protocol or not. However, in Clojure, being a functional programming language, functions are first-class entities. This means that ccc/has? has an independent existence (like any other function) and different types can choose to implement this function. As a consequence of this, we can extend any protocol to a built-in class (such as Exception or String) or a third-party class, even though that class did not implement this protocol in the first place. This is not possible in an object-oriented programming language.