An overview of Data Structures and their implementation. Part 1 — Arrays and Linked Lists

Harveyroper
9 min readJul 7, 2022

--

Hi there! This article serves as a survey of my understanding of data structures, including the classic examples from computer science (Linked Lists, Trees etc) as we as hopefully in later parts an exploration into representation of mathematical structures such as groups, vector spaces, fields etc and methods of implementation in programming languages.

I will attempt to mostly stick with C style pseudocode for the sake of generality, however where necessary I may take advantage of the rich standard library of Java to ease the process in places.

What is a Data Structure. Why Care?

Data structures are generally something that is only talked about in computer science, in pure mathematics, there is no need for the notion of a linked list, or an array. Mathematics has the privilege of infinity, ideas such as sets allow for an unordered collection of any object, it can be extended to multisets, and relations and functions allow for the addition of direction. Or more simply we also have tuples, which are fixed ordered collections of objects, meaning that is it fine to collect objects and manipulate them, and never worry about ‘how’ we might physically do that. In programming however, we are always limited to a fixed memory space, and must assume that it takes time to access information. Therefore various techniques to arrange this data must be studied, so that we can more efficiently access and edit the data we wish to hold. Data structures study ways in which we can… structure data.

Arrays

An array is the most fundamental data structure, perhaps a variable is more fundamental, however classically we do not consider them to be data structures. It could be argued, from a set theoretic perspective that indeed if all mathematical objects are sets, then (and C permits this in some sense with array pointer duality) all objects in programming, are arrays, variables perhaps some equivalent of singleton sets.

a
b = 10
c = [1,2,3]
d = [[1,2,3],[4,5,6],[7,8,9]]
e = [b,c,d]

It’s a nice idea to consider arrays as being very akin to sets, however they are not analogous to eachother. Arrays are necessarily ordered, and generally fixed length (They are immutable). They also can contain identical elements twice. eg: [1,1,2,3]
NB: Python throws a spanner in the works here because lists, arrays, and tuples (though often confused) are necessarily different objects, with some subtle but important differences.

One further difference between mathematical sets and and an array is the standard enforcement of Type for arrays. For example we might have an array of integers, and the inclusion of a floating point number would be invalid.

Anyway, none of that is really relevant when studying data structures, however I do think there is something interesting about the parallels and differences between the most basic collection of objects in mathematics and programming.

How do arrays really work?

Arrays in the most primitive sense are simply a memory buffer. It asks the operating system to set aside a discrete amount of contiguous memory space, which you can then fill with whatever you want! Breaking that down a little, contiguous means that the memory registers allocated will be located one after the other in memory with nothing inbetween. If we created an array of characters (assume a char is one byte), length 4. Somewhere in memory there will be four registers which for the runtime of the program will only contain what you put in them. The benefit of contiguous storage is O(1) lookup for any element of the array if you know the index of that element. In case the reason for that isn’t immediately obvious sample the following C code:

int array[] = {1,2,3,4,5}; //Array of integers
int* p = &array; //Pointer to the start of the array
int b;
b = *p //b = 1
b = *(p+2) //b = 3

Pointer arithmetic allows us to calculate the address of any element of the array, and then gets its value by dereferencing, by simply adding to the address of the start of the array.

The drawback of contiguous memory allocation is that you only have as much space as you allocate, you cant make the array bigger, because the registers located after your array will most likely be in use, and even if they aren’t, there is no guarantee you can just access that memory space, in fact it is fundamentally unsafe for the operating system to let a programmer do so. There are a number of exploits, the most basic of which being the buffer overflow attack, which surround this.

Therefore we require a way to collect items in such a way that we could continue to add more items to our collection, and from here we begin to look at more complex abstract data structures. The simplest of which is the Linked List.

Linked Lists:

A diagram of a linked list
Source: geeksforgeeks.org

One of the first times someone explained a Linked List to me, they used the analogy of a train, containing multiple cars. One can imagine a series of carriages connected together, as trains often are. This can give the illusion of the train being one contiguous entity, however obviously it is not, and if it were necessary to add more carriages, then it would be possible to do so.

Likewise, as trains are made up of individual cars, a linked list consists of nodes. Nodes themselves contain at the very least two pieces of information, the Data (or Payload in cooler terms) and the pointer to the next node. In this sense the definition of a linked list is beautifully simple, two pieces of information are required, value and a pointer.

Above describes what we’d call a singly linked list, named so because each node is linked in only one direction, this had the effect that you can only traverse the list in one direction. This usually might be quite annoying, since unless you store a permanent reference to the head (First node) of the linked list (A good idea anyway), you will lost the entries in the list that exist before the one you are currently at. For this we create a doubly linked list, where each node contains data, and a pointer to both the previous node and the next node in the list. Handy.

In a linear linked list, that is a doubly or singly linked list we consider the next pointer of the tail node, and the previous pointer (for doubly linked lists) of the head node to point to NULL. A nod to the idea that it is the end of the list, and therefore points to nothing, however to avoid having to create structures to accommodate these special nodes, we set their pointers to null instead.

Pseucode of a singly linked list:

structure LinkedListNode{    Type Payload
LinkedListNode* next //* means a pointer
}

Thats it! The simplicity of implementation makes linked lists very easy to work with. In some sense, there is a recursive nature to the definition of a linked list node, since its definition contains itself, this can be confusing but fear not, the compiler knows what to do.

Note: There is another kind of linked list, which in truth I am aware of however I have never really needed to use, this is a circular linked list. A circular linked list very simply is one in which the tail node of the list points to the head node, allowing you to traverse through the list in a circular manner. This has a few implications which I recommend thinking about, a few include: the removal of the issue of losing entries in singly linked lists, small improvements in traversal times, as well as possibly some applications to modular arithmetic. It naturally follows that doubly circular linked lists are also a thing (a doubly linked list that is also circluar).

Performance of a linked list:

Unlike arrays, which have a constant lookup time, linked list have a linear lookup time, O(n). This is because if you wish to get the data from the nth node in the list, you must traverse through all n-1 nodes before it to get to it.

This means that there is no universally good decision when it comes to using an array or a linked list, it is the job of a good programmer to judiciously decide which is the best for their purposes.

Stacks and Queues:

You may be familiar with the terms stack and queue, you may even have a good idea how they operate too. If you were like me and struggled to understand the difference between a stack, queue, and a linked list I’ll briefly describe here. A stack is any kind of list which operates in a LIFO (last in first out) scheme. This means that you can only access the piece of data which is at the start of the list, or in better terms, on the top of the stack. You can imagine a stack of cards, in which you can only remove the card from the top of the pile. In programming this means that you have to remove the item on the top of the list to get the next one. This paradigm is useful in many areas of computing, consider algorithms such as recursive backtracking, or anything which requires some kind of history of action, the lower on the stack the older the element. Stacks are also used in the flow of programs in which function calls, recursion, iteration etc are used. For fun, the website Stack Overflow gets its name from the programming error in which the stack allocated to a program gets full and more data is added to it, hence stack overflow.

A queue is essentially the opposite, it operates in a FIFO (First in first out) scheme, and as you may be able to guess works based on the real world idea of a queue. Imagine lining up at the bank, whoever gets there first gets served first, I can imagine it is not difficult to think of many ways a queue is a useful data structure.

Now importantly, a linked list is a more abstract idea, stacks and queues are restrictions imposed on the general idea of a linked list to enforce certain behaviour. A linked list is a method of implementation of both of these things. Note that it is not the only way, commonly a queue is made using a primitive array which we know imposes a discrete size. If we knew that a queue would never have more that 10 elements, implementation with a linked list might not be the best approach.

Stacks and queues require two methods each to be fully implemented. The two methods essentially perform the same thing but classically have different names. Stacks feature a push and a pop method, in which elements are pushed onto the stack, and popped off. Queues have enqueue and dequeue, performing the equivalent action.

Operations on linked lists:

Discussions surrounding algorithms associated with arrays and linked lists are outside of the scope of this article, things like searching and sorting algorithms, among a sea of others. They’re something I would love to cover in another article. However there are certain operations which are so useful and fundamental to these data structures, it feels important to mention.

These include: Reversing a linked list, rotating a linked list, and insertion and deletion at a given index.

Reversing a linked list is the process of making each node point to the previous node, instead of the next one, effectively flipping the order. ie:
1-> 2-> 3->4->5 becomes 5->4->3->2->1.

Pseudocode code might look something like:

function reverse(node* head){
prev = NULL
curr = head
next = curr.next
while(current.next != NULL){ next = current.next
// NB Implementing in C would require the -> operator
current.next = prev
prev = current
current = next
}
head = prev
return head
}

Round up notes: So far we have covered primitive arrays and Linked Lists. For the sake of brevity in this article, I will cover non linear data structures (Trees, graphs, hash maps, etc) in a part 2. However, it is important to stress that there are many more flavours of linear data structures, such as dynamic arrays, associative arrays, unrolled linked lists, skip lists, and multi level linked lists. I do encourage you to look into these if you are interested. This link is an excellent resource .

--

--