Data Structure Stories: The Linked List

The linked list is unlike a typical array. Though still a collection of items, it’s elements are not stored in contiguous memory. A great advantage of this is that linked lists are limited only by the memory available to a program. They do not run out of reserved space like an array and do not require expensive allocation operations.

The way I like to think of them is a chain. Each element is a “node”. These nodes each contain data, and a pointer to the next node in the list. Thus, the whole list forms a chain of pointers between nodes. Because we have these explicit references, it is not necessary for the nodes to be stored next to each other in memory. One thing to note about this configuration is that you often get poorer cache performance as a result.

A simple linked list. Each node contains a value and the memory address of the next node. The Head of the list is the first element. The rest of the list is the Tail.

If I know the first node of the list, the head, I can follow all the pointers and traverse the whole list. The rest of the list is often called the tail. It is the sub-list of everything except the head. The above example is the simplest form of linked list, often referred to as a singly linked list. As you can probably tell, we can go forwards through the list, but we have no pointers to traverse backwards with. Furthermore, what happens when we reach the end of the list? Every node has a pointer to the next, but there is no next at the final node. Usually this value is assigned as null, to signal the end of the list. This leads to the infamous traversal condition “while(node.next != null)”.

An empty linked list obviously consists of no nodes, however, a sentinel linked list always contains some nodes. This consists of always holding placeholder nodes in memory, namely a start and end node. The normal list can then be attached to these nodes. These sentinel nodes are intended to simplify list operations and allow safe de-referencing of all other nodes. Evidently, the drawback here is the need to occupy more memory with these sentinel nodes.

Linked lists are not without issue, but they are also very powerful for fast insertion and deletion of nodes. In fact they are far better than arrays at those operations! Furthermore, singly linked lists can easily be decomposed to heads and tails as we traverse the list. This leads to fluent and easily understood recursive algorithms. Linked lists can be used to implement other data structures like stacks or queues.

History

The linked list was invented in 1955–56 by Herbert Simon, Alan Newell, and Cliff Shaw at RAND corporation. Its original purpose was as a core data structure in their IPL language for Artificial Intelligence. Through many years it has seen more usage in major programming languages like LISP.

Performance

Lookup

As we have seen, we must traverse a linked list from the head via its pointers in order to reach nodes. This means we get a linear indexing performance of O(n), far from desirable when compared to an array.

To find the node with the value 3 I have to traverse twice from the head of the list.

Insertion and Deletion

A clear advantage of the linked list approach is that we do not have to shift elements around to insert or delete nodes. We simply have to rearrange pointers, a constant, or O(1) operation. Since we may often know the head or last node of a list, we can directly insert new nodes there in O(1) time. On the other hand, if we are to insert into the middle of the list we have to incur the time it takes to traverse and then insert the element, an average complexity case of O(n/2) + O(1).

Speeding up Lookups

There are two ways that linked lists can be sped up. The first is to use a move-to-front heuristic. When an element is found it is moved to the front of the list. This provides a small version of caching; one hopes that recently used elements will be accessed again soon. A second option is to use an external data structure to index the linked list. This data structure could hold references to nodes. The disadvantage of this approach is the need to update the external data structure in response to manipulations of the list.

Drawbacks

I alluded to the first drawback of linked lists previously. If the nodes are not contiguous in memory, then we can’t take advantage of caching as much as we would like. Secondly, we must traverse the list to perform most operations. Finally, it is important to consider the fact that the pointers we use take up memory, as well as the actual data in the nodes.

Alternatives

Though we have looked at singly linked lists, there are also doubly linked lists. In a doubly linked list, each node has a pointer to the next node, but also a pointer to the previous node. This gives one more flexibility in manipulating the list, as backwards traversal is now possible. Conversely, the list now occupies even more memory, with twice as many pointers as before.

A doubly linked list. Each node contains a pointer to the next and the previous nodes.

There are also multiply linked lists, where each node has two or more references to other nodes. This type is a generalised super-set of the doubly linked list. Circular linked lists are the last type. In a circular linked list, the final node’s next node is the head of the list, forming a cycle from end to start.