Data Structures and Their Impact on Performance

Héla Ben Khalfallah

Published in

Tech x Talent

17 min readSep 22, 2022

How to choose the right data structures ?

Introduction

Why algorithms and data structures matter?

Suppose you are going to travel. Depending on the luggage you are going to bring, you have different sizes of suitcases. The more you import, the bigger the size becomes :

Resources Sizes (Image licensed to the author)

But you have to be careful because airlines limit the size and weight of the suitcases because the storage areas in the plane are limited and defined. If the size of the plane exceeds the standard, it cannot take off either:

Luggage box sizes (Image licensed to the author)

In the land of programming, the plane is the computer, you are the passenger and the suitcases are the program, the instructions and the data you want to store in memory.

If the program was larger than the available memory, it could not be loaded. An OutOfMemoryException is thrown when there is not enough memory to continue the execution of a program. A StackOverflowException is thrown when the execution stack exceeds the stack size.

This means that all resources inside a computer are limited. Therefore, we must design our programs in such a way that they require fewer instructions to execute and less memory to use. We cannot fix resource limits by increasing them, this will increase costs and size !

In my last article, I explained algorithms, their impact on performance, and how to choose the right algorithm. In this article, to highlight the importance of the choice of data structures, I will rely on well-known data structures: Array, Linked List, Doubly Linked List and Hash Table. In my future article, I will explain many other advanced data structures such as Trees, Tries, Heap, and Graph. So let’s get started!

Memory representation

Before we begin, let’s imagine our simplified representation of memory:

Bit is the basic unit of memory. At a time, it can be either 1 (on) or 0 (off).
Computer memory is the collection of several bits.
Group of 8 bits are called byte.
Computer memory is a collection of contiguous blocks of bytes.
Individual block is called a cell (memory cell).
Each cell is associated with a unique numeric address (also called physical memory address).
Data in memory are stored in arrays of cells.

More practically, if in the programming language we are using, a character is represented by 8 bits (= 1 byte), then representation of the “Hello” string will be as follows:

String Memory Representation (Image by the author)

It can also be represented as:

If we want to represent an array of integers, it will be like this:

Array Memory Representation (Image by the author)

An integer take 8 bits (= 1 byte).

Well, we know how things work internally and we have our fantastic and simplified imagination for representing data in memory, it’s time to discover data structures! 🙌

So, what’s a Data Structure?

A data structure is a specialized format for storing (write), organizing (sort), processing (update), and retrieving (read) data in memory.

Whoa! What great and concise definition!

♦️ Possible operations (CRUD): storing (write), organizing (sort), processing (update), and retrieving (read).

♦️ Specialized formats: arrays, stacks, queues, hash tables, linked lists, heaps, tries, trees, …

Wait a moment, why do we have so many specialized formats? Having only arrays is not enough? 😏

Well, as we will see below, arrays are not effective in all cases. Let’s start digging!

Arrays

Consider this array of integers:

When a program declares an array, it allocates a contiguous set of empty cells for use in the program. So if we create an array meant to hold ten elements, the computer should/must find a group of ten empty cells in a row and designate it to serve as an array:

Memory allocations for ten contiguous empty cells (Image by the author)

If the memory is not used, it will be easy to find ten empty placements. However, otherwise the array cannot be allocated and some programming languages throw an OutOfMemoryException.

📌 This is the first limitation of Arrays!

Let’s see together how many steps are needed and the limits of each of the operations below!🙌

Retrieving element by index (read)

A computer can read from an array in just one step!

When the computer reads a value at a particular index of an array, it can jump straight to that index because:

a computer can jump to any memory address in one step.
whenever a computer allocates an array, it also makes note at which memory address the begins, whoa!
then the computer can find the value at any index by performing a simple addition.

If we asked the computer to find value at index 4, the computer would simply take the memory address at index 0 and add 4. Memory address are contiguous! Tada ! ✨

The cost of reading an element at index = one step regardless of the index

Reading from an array is an efficient operation, since the computer can read any index by jumping to any memory address in one step.

An operation that takes just one step is the fastest type of operation.

Will it be the same for other operations? Let’s see how many steps are needed for the insert operation!

Insert element at index

⚫ Insert 43 at index = 0

We need 11 steps to do this operation !

⚫ Insert 43 at index = 2

Insert element at index = 2 (Image by the author)

We need 9 steps to do this operation !

⚫ Insert 43 at the end

Insert element at the end (Image by the author)

We need only one steps to insert an element at the end. Finally an operation that does not take many steps!

🔴 Let’s recap:

Insertion number of steps (Image by the author)

The worst-case scenario for insertion into array is when we insert the element at the beginning. This is because when inserting at the beginning of the array, we need to shift all the other values one cell to the right then finally execute the insertion step.

🔶 Insertion in a worst-case scenario take N+1 steps for an array containing N elements.

⛔ The insertion efficiency decreases further if the array is ordered. Before inserting, we need to find the specific position, then shift the elements then insert the element.

Remove element at index

⚫ Remove element at index = 0

We need 10 steps to do this operation !

⚫ Remove element at index = 2

We need 8 steps to do this operation !

⚫ Remove element at the end

We need only one steps to delete an element at the end.

🔴 Let’s recap:

Deletion number of steps (Image by the author)

The worst-case scenario for deletion from an array is when we delete an element from the beginning. This is because the index 0 will become empty, and we need to shift all the remaining element to left to fill the gap.

🔶 Deletion in a worst-case scenario take N steps for an array containing N elements O(N) .

Searching an element (Linear Search)

⚫ Linear searching for a value at index = 0

Searching for a value at index = 0 (Image by the author)

Linear searching for a value at the beginning will only take one step!

Whoa ! 👏

⚫ Linear searching for a value at index = 5

Searching for a value at index = 5 (Image by the author)

Linear searching for a value at the index = 5 will take 5 steps!

⚫ Linear searching for a value at the end

Searching for a value at the end (Image by the author)

Linear searching for a value at the end will take 9 steps.

➡️ More generally, N — 1 steps with N being the array size.

⚫ Linear searching for a value that does not exist

Linear search for a value that does not exist will take 10 steps (array length). We have to go through the whole array.

➡️ More generally, N steps with N being the array size.

🔴 Let’s recap:

Linear searching number of steps (Image by the author)

So, for N cells in a array, linear search would take a maximum of N steps. This means that for an array of 10 cells, the maximum number of linear search steps would be 10. For an array of 1000 cells, the maximum number of linear search steps would be 1000.

⛔ Linear search is slow in both ordered and unordered array.

Searching an element (binary search)

The time complexity of the binary search algorithm is O(log n).

✅ Binary search is more efficient when the array is ordered.

You can find here more details about algorithms efficiency.

Remove element by value

Instead of deleting an element by its index, we would first have to find the value, then the index, then proceed with the deletion.

Assuming we are looking for item 84, let’s see what happens for these different cases:

Unordered Array:

Looking for value 84

finding the value depends on the algorithm we’ll use: linear search (O(N)) or binary search (O(log N)). As the array is unordered we’ll use linear search.
obtaining the value index will be done in a single step: O(1).
at this level, the complexity of the rest will be equal to an ordinary remove operation: O(N) .

The worst complexity to remove an item by value from an unordered array is: O(N) + O(N) + O(1) = O(N) .

Ordered Array:

Looking for value 84

binary search (O(log N)) will take 2 steps to find 84, whoa!
obtaining the value index will be done in a single step: O(1).
the complexity of the rest will be equal to an ordinary remove operation: O(N) .

The worst complexity to remove an item by value from an ordered array is: O(log N) + O(N) + O(1) = O(N) .

⛔ Deletion is slow in both ordered and unordered array.

Array with duplicates:

⛔ When the array contains duplicates, the efficiency decreases further because we have to remove all values and not just the first occurrence.

Sorting an array

So why not use arrays for all cases (array efficiency)?

Let’s recap the efficiency of Array operations:

Efficiency of Array operations (Image by the author)

According to the obtained results we can deduce that:

arrays are not efficient when inserting an element in a position other than the end. This is because we need shift the elements to the left.
same for the deletion, arrays are not efficient when deleting an element in a position other than the end. Elements must be shifted to the right.
searching and sorting depend on the used algorithm.

The complexity of the insertion, deletion, search and sort operations depends closely on the number of elements that the array contains. The operations consume more and more steps as the array size increases. Only the reading keeps a constant complexity whatever the size of the array: a single step O(1).

Another issue is that the size of the array should be fixed and known in advance because we need to allocate contiguous memory cells:

If our prediction is too large, we’ll waste memory. If our prediction is too large, we waste memory.
If our prediction is small, we will get an overflow exception.

Ordered arrays are useful where searches are frequent. Unordered arrays are useful where insertions are frequent.

Oh, it’s hard! You see why we can’t use an array for all cases!

Let’s continue our journey, maybe we will come across a magical data structure !

Handling temporary data

Temporary data is information that loses meaning and significance after processing, so we may delete it once we are done with it.

Let’s take print jobs as an example of temporary data. What’s special about it? Generally the printer processes the jobs one after the other and in the order of their arrival. Once a document is printed, it is removed from the waiting list.

If we use a regular array to manage print jobs:

we should always choose the first element (array[0]).
new elements must always be inserted at the end.
every time we read an element, we have to delete it.
print waiting list is always bounded.

Our array should be equipped with these additional tasks. What if we had a magical data structure that did all of this for us? This is where stacks and queues will shine, let’s see !

Queues (FIFO)

A queue stores data in the same way as arrays (contiguous memory cells). The difference is that queues have the following constraints:

data can be inserted only at the end.
data can be deleted only from the front.
only the element at the front can be read.
the first item added to the queue is the first item to be removed.

This is why we use the acronym FIFO (First In First Out) to describe queue operations.

Queue and operations (Image by the author)

✅ Inserting new item into a queue is called enqueue.

✅ Removing an item from a queue is called dequeuing.

✅ The first item enqueued is the first item dequeued.

Some programming languages have built-in queues, we don’t need to implement them: Java and C++.

One real use of queues is the Callback Queue inside the JavaScript engine:

Stacks (LIFO)

A stack stores data in the same way as arrays (contiguous memory cells). The difference is that stacks have the following constraints:

data can be inserted only at the end.
data can be deleted only from the end.
only the last element can be read.

This is why we use the acronym LIFO (Last In, First Out) to describe stack operations.

Stack and operations (Image by the author)

✅ Inserting new item into the stack is called pushing onto the stack.

✅ Removing an item from the top of the stack is called popping from the stack.

✅ The last item pushed onto the stack is always the first item popped from it.

Some programming languages have built-in stacks, we don’t need to implement them: Java and C++.

One real use of stacks is the Call Stack inside the JavaScript engine:

Let’s wrap up

✅ By adding some constraints on how to add and remove elements, compared to a regular array, Stack and Queue are more efficient (we always insert at the end).

🔴 Stack and Queue do not allow arbitrary access. Only two options are available: FIFO or LIFO.

🔴 Therefore, Stack and Queue are not efficient for search situations.

✅ Queue is an interesting data structure when we want to have FIFO access to array items like a sequential HTTP Request Queue (the first request arrived is the first served, the others are pending). We can see that we will not need to look for a request or access it randomly.

✅ Stack is an interesting data structure when we want to have LIFO access to array items like the “undo” function of an editor or operating system.

🔴 If the stack is implemented using an array, its size must be fixed and predictable (contiguous memory cells), otherwise a stack overflow exception will occur.

Well, the question now, how can we have more flexible memory management?

Let’s see if linked lists can help!

Linked Lists

Linked List vs Array

💠Like an array, a Linked List is a data structure that represents a list of items.

💠 However, instead of being a contiguous block of memory, the data from Linked Lists can be scattered across different cells throughout the computer’s memory. Whoa!

This can be illustrated as follow:

Linked List vs Array memory allocation (Image by the author)

💠Connected data that is dispersed throughout memory are called node.

💠Each node represents one item in the list.

💠 Each node has a little extra information about the memory address of the next node: link.

💠The first node is called the “head” and the last is the “tail”.

Reading efficiency of Linked List

We know that a computer can read from an array in O(1) time whatever the index. Let’s see the efficiency of reading from a Linked List!

✅ Reading item at index = 0 : this will take a single step since the item to read is the head (O(1)).

🔶 Reading item at index = 3 :

Linked List of characters (Image by the author)

Go to the first item and get the memory address of the item at index = 1 which is 1752.
Go to the item at index = 1 and get the memory address of the item at index = 2 which is 1893.
Go to the item at index = 2 and get the memory address of the item at index = 3 which is 1978.
Go to the item at index = 3 and reading the value.

We need 4 steps to reach the item at index = 3 (the fourth item) !

🔴 To get any node, we always start with the first node and follow the chain of nodes until we reach the wanted node.

🔴 If we were to read from the last node in the list, it would take N steps for N nodes in the list!

🔻Linked Lists have a worst-case read of O(N) which is a major disadvantage compared with arrays (O(1)).

Let’s dig into searching efficiency !

Searching efficiency of Linked List

🔻We have seen that linear search on an array has a speed of O(N) .

🔻Linked Lists also have a speed search of O(N) . We always start with the first node and follow links.

I’m starting to despair, let’s see if we can win something during the insertion!

Insertion efficiency of Linked List

Let’s analyze different use cases:

Linked List vs Array insertion performances (Image by the author)

Arrays favor insertion at the end, while Linked Lists favor insertion at the beginning. An important point to note!

Deletion efficiency of Linked List

Let’s analyze different use cases:

Linked List vs Array deletion performances (Image by the author)

Let’s recap!

I always refer to my beautiful Big O!

I think I know what you are saying now: why should one know and use this Linked List? I’ll answer this question after the next section !😳

Doubly Linked List ✨

A Doubly Linked List is like a Linked List but with these peculiarities:

Each node has two links: one that points to the next node and another that points to the previous node.
Doubly Linked List always keeps track of both the first and last nodes instead of just the first node.

Wonderful! How can this enhances the results?

✅ Since a Doubly Linked List always knows where both its first and last nodes are, we can access each of them in a single step O(1) , amazing !

✅ Then, like we can read, insert or delete from the beginning of the list in O(1) we can do the same from the end of the list O(1) !

✅ With a Linked List, we can only move forward. However, with a Doubly Linked List we can move forward and backward (from the start to the end or from the end to the start)!

So, back to the previous question about the importance of Linked List, which data structure have an immediate access to both the front and the end ? Yeah, good, Queue !

Since Doubly Linked List have immediate access to both the front and end of the list, they can insert data on either side at O(1) as well delete on either side at O(1) , they are a perfect underlaying data structure for a Queue!

Aha, now you know how to implement an efficient Queue if there isn’t one built in! Tada!

Hash tables

The magic of Hash Table

Most programming languages have a magic built-in data structure called a Hash Table.

Hash Tables can have different name in various programming languages: hashes, maps, hash maps, dictionaries, …

Hash Tables keep data in pairs : key and value.

Hash Table Key and Values (Image by the author)

Hash Table have an amazing superpower: fast reading ! Accessing to a value by its key take one step O(1) . Well how is this possible ? Let’s look to the below picture:

Whoa! Hashing function will map a key to an index for storing the key’s value !

A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. Hash table — Wikipedia

✅ Hash function needs to meet only one criterion to be valid: a hash function must convert the same string to the same number every single time it’s applied.

✅ So functions that use random numbers or the current time as part of their calculation are invalid!

✅ For every key-value pair, each value is stored at the index of the key, after the key has been hashed.

✅ We can do O(1) lookups only when using a key to find the value. If we use a value to find its associated key, we cannot take advantage of the Hash Table’s fat lookup magic!

A very important feature about Hash Table: Dynamic resizing.

⚫ Hash values depend on the size of the table. Therefore, the hashes of the inputs are changed on resizing and the algorithm cannot simply copy data from the old storage to the new.

Very interesting information! 💯

Our first journey inside the data structures ends at this level, it’s time to recap everything we’ve seen!

When to use what ?

Data Structures efficiency (Time Complexity)

Data structure efficiency (Image by the author)

Data Structures advantages and disadvantages

Based on these tables, we can easily optimize our code! Whoa! 👏

Techniques for Code Optimization

Determine your current Big O

Algorithms and their impact on performance | Better Programming

Change the Data Structure

As we have seen above, the choice of the data structure has a great impact on the efficiency of the program: time and memory.

Greedy Algorithms

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. Greedy algorithm — Wikipedia

Very interesting approach!

Dynamic programming

Dynamic programming is an approach for optimizing recursive problems. Among the optimization techniques we can mention: memoization.

Memoization is a simple and efficient technique for reducing recursive calls by remembering previously computed function, whoa!

For more details about memoization.

Conclusion

We have seen a lot of things in this journey!

Choosing the right algorithm and the right data structure can dramatically affect the performance of our code (time and memory).

Creating an efficient software includes evaluating the trade-offs of the available options.

Thank you for reading my article !

Want to Connect?You can find me at GitHub: https://github.com/helabenkhalfallah

Data Structures and Their Impact on Performance

Introduction

Why algorithms and data structures matter?

Memory representation

So, what’s a Data Structure?

Arrays

Retrieving element by index (read)

Insert element at index

Remove element at index

Searching an element (Linear Search)

Searching an element (binary search)

Remove element by value

Sorting an array

So why not use arrays for all cases (array efficiency)?

Handling temporary data

Queues (FIFO)

Stacks (LIFO)

Let’s wrap up

Linked Lists

Linked List vs Array

Reading efficiency of Linked List

Searching efficiency of Linked List

Insertion efficiency of Linked List

Deletion efficiency of Linked List

Let’s recap!

Doubly Linked List ✨

Hash tables

The magic of Hash Table

When to use what ?

Data Structures efficiency (Time Complexity)

Data Structures advantages and disadvantages

Techniques for Code Optimization

Determine your current Big O

Change the Data Structure

Greedy Algorithms

Dynamic programming

Conclusion

Written by Héla Ben Khalfallah