How To Implement Functional Data Structures: Examine List Implementation of Java Library VAVR
A functional data structure is a data structure that is both immutable and persistent. Immutability is one of the main characteristics of functional programming which allows writing side effect free functions. It is also easy to safely work and share immutable resources between different threads.
But when we consider data structures like lists it is not always desirable to create and use forever. We need to add elements. The persistent data structure will be able to modify the data structure and get a new data structure while keeping the previous versions available. So it will make the references effectively immutable and allows to modify and get new references.
This also means the modification methods are side effect-free which also means if a function is called with the same arguments the same result is expected. This characteristic is also known as referential transparency in functional programming which means you can effectively replace the invocation of a function with its return value if the function arguments are not changing.
After Java 8 now we can write more functional code with the help of lambdas. But still, Java is not perfect when we need to write pure functional code. One such gap is Java data structures are not persistent. Let’s consider List, we can create immutable lists using methods like List.of and convert existing ones into immutable using Collections.unmodifiableList methods. But we can’t modify them later.
VAVR is a popular Java library that tries to reduce gaps in Java language when it comes to writing more functional style code. It has an entire collection of persistent data structures. Let’s see an example to understand what a persistent list looks like.
import io.vavr.collection.List;
public class Main {
public static void main(String[] args) {
List<Integer> original = List.of(1, 2, 3);
List<Integer> firstCopy = original.prepend(4);
List<Integer> secondCopy = original.prepend(4);
System.out.println(original); // List(1, 2, 3)
System.out.println(firstCopy); // List(4, 1, 2, 3)
System.out.println(secondCopy); // List(4, 1, 2, 3)
}
}
The original one is not modified. both firstCopy and secondCopy are similar in data, hence the append method is preferentially transparent which means you can cache it if this operation is performed multiple times.
How is this implemented under the hood?
You might already guess that it might copy the first list into an entirely new one and then append the element. This is a valid option to implement, but it is very inefficient. These operations need to be implemented using fewer modifications and sharing as much as possible. Since all instances are effectively immutable it is safe to share the structure. This technique is also called structural sharing. Let’s reverse-engineer the code to better understand how this is implemented in the List class by starting with the of method
public static <T> List<T> of(T... elements) {
Objects.requireNonNull(elements, "elements is null");
List<T> result = Nil.instance();
for (int i = elements.length - 1; i >= 0; i--) {
result = result.prepend(elements[i]);
}
return result;
}
The list is an abstract class. You can see it is prepended data by starting with calling Nil.instance method returned List object. Nil represents the empty result and it uses a singleton pattern because it can be reused in all lists. The prepend method adds the elements to the top, that's why the loop is run in the reverse order.
In the above example, 3 is prepended to Nil object. Let’s understand how the value 3 is represented by checking the prepend method
public final List<T> prepend(T element) {
return new Cons<>(element, this);
}
The actual list implementation with value is the Cons inner class where the basic components to support structural sharing reside. Let’s check the state variable of this class and the constructor.
private final T head;
private final List<T> tail;
private final int length;
private Cons(T head, List<T> tail) {
this.head = head;
this.tail = tail;
this.length = 1 + tail.length();
}
The head is the element that is pretending and the tail is the rest. Length also can be calculated at this point and no need to recalculate since it will remain unchanged. With this knowledge let's visualize how previous versions managed
Now I think it is clearer how structural sharing is implemented more efficiently. To get the more clear let’s consider the update method which updates a value of an index.
public final List<T> update(int index, T element) {
if (isEmpty()) {
throw new IndexOutOfBoundsException("update(" + index + ", e) on Nil");
}
if (index < 0) {
throw new IndexOutOfBoundsException("update(" + index + ", e)");
}
List<T> preceding = Nil.instance();
List<T> tail = this;
for (int i = index; i > 0; i--, tail = tail.tail()) {
if (tail.isEmpty()) {
throw new IndexOutOfBoundsException("update(" + index + ", e) on List of length " + length());
}
preceding = preceding.prepend(tail.head());
}
if (tail.isEmpty()) {
throw new IndexOutOfBoundsException("update(" + index + ", e) on List of length " + length());
}
// skip the current head element because it is replaced
List<T> result = tail.tail().prepend(element);
for (T next : preceding) {
result = result.prepend(next);
}
return result;
}
I hope you can understand the code but let’s visualise it with an example.
I will stop here. There are lots of methods available and you can examine them. The list is a bit straightforward but implementations like map can be more complex, so it is always best to use a battle-tested library like VAVR if you need persistent data structures.
In the above example, you might already notice the fact that the append method is more costly. But the thing is although such operation cost is high once you perform such an operation you can cache the results and reuse them. Anyway, these have slightly more performance overhead but might be useful in some cases like highly concurrent applications. So add this knowledge also to your bag of tools and use it if you think that they are worth the use.
If you are using more functional languages like Clojure and Scala these are normally baked into the language. So I thought it is better to know these concepts even if we never use them.