# Algorithms II: Fundamental Data Structures

In the early days of my career as a programmer, I was writing in the C programming language. This was my first language to program in, and I didn’t hate the experience. One of the projects I set out to build as a way to learn the key components of the language and test out my mastery was a point of sale system, similar to what one would have at the checkout from your local supermarket.

The only problem working with **C **to build such a system was how to store and manage data. For instance, like any other checkout, I need to store a master list of commodities and their matching prices, I had to store metadata, for instance, whether or not the item had a discount and how much that discount was. At the time the only data structure I knew how to work with within the C programming language was **primitive arrays. **This made things really difficult and eventually I dropped the project to build a game.

Devising algorithms without proper data structures can be really challenging. This is why most experienced programmers would consider the supported data structures in a programming language before they decide to use it for a project, at least, that’s what I always do after my experience with **C**. It is important to use a language that supports multiple data structures, considering that use cases and problem types change. It is also important to ensure the language implements the default algorithms with the best performance, for instance, one should prioritise a language that provides the best performance overall for search, fetch, insert, delete and sort operations on data structures such as list, this is because software systems will most probably perform these operations at least a few million or billion times per second in the actual production environment.

Consider the following language performance benchmarks. I don't want to start a fight here, but if performance is top on your list, you probably will stay clear of **python3 **according to these benchmarks.

Am not the first person to notice this. Learn more about python’s lack of performance and efficiency here: http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/

Later on, in my career, I settled on Java as the language with the best support for these data structures that proved critical for designing even the simplest of algorithms. With Java collections, I felt ( and still do) invincible against any domain-specific problem thrown at me. I just need to pick out a data structure (Map, HashMap, List, ArrayList, Set e.t.c) from the collections API and am good to go. It is important to note that most (I think all) modern programming languages have extensive support for data structures, the distinguishing factor would be how the speed and efficiency of operations on these data structures vary from language to language.

A **data structure** can be defined as a **scheme **of organizing other data items. In this article, we are going to look at a few key data structures that define the day to day life of a programmer regardless of which language they use to design and implement algorithms.

These are **Linear data structures, Graphs, Trees, Sets and Dictionaries.**

**Linear Data Structures**

**The Array**

The **array **and **Linked List **are arguably the most important data structures. The array can be one-dimensional or multi-dimensional usually two-dimensional. A one-dimensional array is a sequence of items **n **of the same data type that are stored contiguously and whose values can be accessed by a number indicating the **index **of the array.

The elements of an array can be assumed to use the same amount of computer memory, also each element of the array can be accessed in a predictable and constant amount of time regardless of their position or index in the array. These properties distinguish the array from the linked list and other data structures. The array is used to define other types of data structures such as the **string**.

**The Linked List**

The **Linked List **is a sequence of zero or more elements called **nodes **each containing two kinds of information; some data and one or more links to other nodes (called **pointers**)** **to other nodes in the linked list. A special pointer called “**null**” is used to indicate the absence of the successor of the node (end of the linked list).

To access an element of a linked list you must first start with the first node of the linked list and traverse the list until you arrive at the intended node. Therefore, the time needed to access an element in the linked list depends upon the position of the element’s node in the linked list.

The advantage of linked lists is that they do not require any preliminary reservation of computer memory as nodes can be added or removed dynamically, as opposed to arrays.

Linked lists sometimes contain a **header **that may provide information about the linked list, for instance, the size of the linked list, the pointer to the nodes containing the first and last elements of the linked list.

A linked list can be enhanced into a **double linked list, **in which every node contains a pointer to the next and previous node.

**Simple or Linear List**

Linked lists and arrays are used to represent a more abstract data structure called a linear list or a simple list. Lists support basic operations such as inserting, deleting and searching elements in the list.

The list allows us to derive more important data structures such as **queues **and **stacks**.

The stack is a data structure that allows operations on only one end of the data structure, the top. The data structure operates in the **LI-FO **format, that is, ** last in, fast out. **Addition (also known as a push operation) and deletion (also known as a pop operation) occurs at the top element.

The queue is a data structure that supports operations on either end of the linear data structure. The addition of elements is done at the **rear **end of the data structure and this operation is referred to as **en queue**, while deletion of elements is done from the **front **end of the data structure also referred to as **de queue. **Therefore, queues operate in the **FIFO **format, (*first in, first out*).

**Graphs**

A graph can be defined (informally) as a collection of points (**nodes or vertices**)connected by line segments (**edges or arcs**).

A graph can also be defined (formally) as a **G **= [V, E]. **G **is the graph represented by a pair of two sets, **V **being a set of vertices and **E **being a set of edges.

If the pair of vertices (*u,v*) is equal to the pair of vertices (v,u) then the graph is considered a **directed graph, **otherwise, the graph is considered an **undirected graph. **Directed graphs are also called **Diagraphs.**

**Graph representations**

Computer algorithms can represent graphs in two formats; **adjacency matrix **or **adjacency linked list. **The adjacency matrix of a graph is an **n-by-n** boolean matrix (represented by 1s and 0s) with one row and one column for each vertex. if there is a connecting edge from the** ith row** to the

**the value is**

*jth*column**1**if a connecting edge -does not exist- then the value is

**0**.

On the other hand, the **adjacency linked** list is a collection of a linked list for each vertex that contains all the vertices adjacent to the lists vertex i,e connected to the said vertex by an edge. The graph representation utilizes the Linked list header on which it adds the name of the vertex for which the linked list is derived.

**Graphs: Weighted graphs**

A weighted graph (or weighted digraph) is a graph or a digraph with numbers assigned to its edges. The number assigned to its edges is called **weight **and **costs**. To represent weighted graphs in the adjacency matrix, if an edge does not exist (therefore does not have a weight assigned to it) a special symbol such as **∞ **can be used.

Weighted graphs have numerous real-world applications, such as finding the shortest path between two points in a transportation or communication network, or the travelling salesman problem.

**Graphs: Paths and cycles**

A path from vertex *u** *to vertex *v** *of graph **G **can be defined as a sequence of adjacent vertices that starts with *u** *and ends with *v**. *The length of the path is the total number of vertices in a vertex sequence defining the path minus one, which is the same as the total number of edges in the chosen path.

A graph is connected if there is a connecting edge for every pair of vertices (u,v).

**A cycle **in a graph is a path that starts and ends at the same vertex in the graph. A graph with no cycles is called **acyclic.**

**Trees**

A tree is a connected acyclic graph.

A graph that has no cycles but is not connected is called a **forest, **as each of its connected parts is a tree.

For every two vertices in a tree, there always exists one simple path from one of these vertices to the other. This makes it possible to select one of the vertices in the tree and make it the **root **of the tree, hence converting the tree into a **rooted tree.**

A common application of a tree in computer science is in describing hierarchies in file directories and organizational charts for enterprises.

**Ordered trees**

An ordered tree is a rooted tree in which all the children of each vertex are ordered.

A **binary tree **is an ordered tree in each every vertex has no more than two children, and each child is either designated as a **left child **or a **right child **of its parent vertex.

**Sets and Dictionaries**

A set originates directly from mathematics. A set can be described as an unordered collection of distinct items. Notation for sets differ, one involves explicit listing of the elements like so;

**s = {1,2,3,4,5}**

or by declaring a property that all the elements of the set have in common, for example;

**s = {n:n is prime number and n < 10}**

The most important set operation is checking membership and finding unions between multiple sets. Some programing languages will provide a set data structure out of the box but if one has to explicitly define a set, there exists a couple of ways to do so. One involves bit strings, also called **bit vector i.e**

**s = {2,3,5,7} becomes 011010100**

Another way of representing sets is by using the **list **data structure.

**Dictionary**

A data structure that implements set operations (search, insert, delete etc) is called a dictionary.

**Conclusion**

There exists a very intimate relationship between data and operations in computer science. Therefore, the design, implementation and application of algorithms rely heavily on the data structures employed for these operations.

The data structure will influence the speed and efficiency of the algorithm. Therefore, it is very important to consider the data structures available to you before defining a strategy to develop an algorithmic solution to a domain problem.

So instead of just looking at the popularity and whether or not **Facebook **and **Google **claim to highly utilize a new, hip and shinny programming language, it is important for you to interrogate which data structures (ADT to be precise) are supported by the language, what is the speed and efficiency score for these ADTs and how easy are they to us while defining new solutions to new problems.

This is the second part of a four-part series on the topic of Algorithms. If you haven’t, kindly check out the prequel to this article here: https://medium.datadriveninvestor.com/algorithm-problem-types-afdfc811d8af

Support me by cheking out my small app : Life Planner

Thank you for reading…