Nerd For Tech
Published in

Nerd For Tech

Algorithms II: Fundamental Data Structures

src: USA today: Amazon warehouse

In the early days of my career as a programmer, I was writing in the C programming language. This was my first language to program in, and I didn’t hate the experience. One of the projects I set out to build as a way to learn the key components of the language and test out my mastery was a point of sale system, similar to what one would have at the checkout from your local supermarket.

The only problem working with C to build such a system was how to store and manage data. For instance, like any other checkout, I need to store a master list of commodities and their matching prices, I had to store metadata, for instance, whether or not the item had a discount and how much that discount was. At the time the only data structure I knew how to work with within the C programming language was primitive arrays. This made things really difficult and eventually I dropped the project to build a game.

Devising algorithms without proper data structures can be really challenging. This is why most experienced programmers would consider the supported data structures in a programming language before they decide to use it for a project, at least, that’s what I always do after my experience with C. It is important to use a language that supports multiple data structures, considering that use cases and problem types change. It is also important to ensure the language implements the default algorithms with the best performance, for instance, one should prioritise a language that provides the best performance overall for search, fetch, insert, delete and sort operations on data structures such as list, this is because software systems will most probably perform these operations at least a few million or billion times per second in the actual production environment.

Consider the following language performance benchmarks. I don't want to start a fight here, but if performance is top on your list, you probably will stay clear of python3 according to these benchmarks.

Am not the first person to notice this. Learn more about python’s lack of performance and efficiency here: http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/

Java Vs Python Tried and true Vs “modern and new”https://belitsoft.com/java-development-services/java-vs-python-tried-and-true-vs-modern-and-new
Programming languages and tools: Medium https://medium.com/travelling-developers/top-5-programming-languages-to-learn-in-2019-70247ec3e729

Later on, in my career, I settled on Java as the language with the best support for these data structures that proved critical for designing even the simplest of algorithms. With Java collections, I felt ( and still do) invincible against any domain-specific problem thrown at me. I just need to pick out a data structure (Map, HashMap, List, ArrayList, Set e.t.c) from the collections API and am good to go. It is important to note that most (I think all) modern programming languages have extensive support for data structures, the distinguishing factor would be how the speed and efficiency of operations on these data structures vary from language to language.

A data structure can be defined as a scheme of organizing other data items. In this article, we are going to look at a few key data structures that define the day to day life of a programmer regardless of which language they use to design and implement algorithms.

These are Linear data structures, Graphs, Trees, Sets and Dictionaries.

Linear Data Structures

The Array

The array and Linked List are arguably the most important data structures. The array can be one-dimensional or multi-dimensional usually two-dimensional. A one-dimensional array is a sequence of items n of the same data type that are stored contiguously and whose values can be accessed by a number indicating the index of the array.

The elements of an array can be assumed to use the same amount of computer memory, also each element of the array can be accessed in a predictable and constant amount of time regardless of their position or index in the array. These properties distinguish the array from the linked list and other data structures. The array is used to define other types of data structures such as the string.

The array: Medium: https://medium.com/@vandersonramos/moving-the-first-item-to-the-last-position-in-the-array-javascript-c8ba435efee3

The Linked List

Single Linked List: source https://www.educative.io/edpresso/what-is-a-singly-linked-list

The Linked List is a sequence of zero or more elements called nodes each containing two kinds of information; some data and one or more links to other nodes (called pointers) to other nodes in the linked list. A special pointer called “null” is used to indicate the absence of the successor of the node (end of the linked list).

To access an element of a linked list you must first start with the first node of the linked list and traverse the list until you arrive at the intended node. Therefore, the time needed to access an element in the linked list depends upon the position of the element’s node in the linked list.

The advantage of linked lists is that they do not require any preliminary reservation of computer memory as nodes can be added or removed dynamically, as opposed to arrays.

accessing a single linked list

Linked lists sometimes contain a header that may provide information about the linked list, for instance, the size of the linked list, the pointer to the nodes containing the first and last elements of the linked list.

A linked list can be enhanced into a double linked list, in which every node contains a pointer to the next and previous node.

Double linked list

Simple or Linear List

Linked lists and arrays are used to represent a more abstract data structure called a linear list or a simple list. Lists support basic operations such as inserting, deleting and searching elements in the list.

The list allows us to derive more important data structures such as queues and stacks.

The stack is a data structure that allows operations on only one end of the data structure, the top. The data structure operates in the LI-FO format, that is, last in, fast out. Addition (also known as a push operation) and deletion (also known as a pop operation) occurs at the top element.

The queue is a data structure that supports operations on either end of the linear data structure. The addition of elements is done at the rear end of the data structure and this operation is referred to as en queue, while deletion of elements is done from the front end of the data structure also referred to as de queue. Therefore, queues operate in the FIFO format, (first in, first out).

Graphs

A graph can be defined (informally) as a collection of points (nodes or vertices)connected by line segments (edges or arcs).

A graph can also be defined (formally) as a G = [V, E]. G is the graph represented by a pair of two sets, V being a set of vertices and E being a set of edges.

If the pair of vertices (u,v) is equal to the pair of vertices (v,u) then the graph is considered a directed graph, otherwise, the graph is considered an undirected graph. Directed graphs are also called Diagraphs.

Graph representations

Computer algorithms can represent graphs in two formats; adjacency matrix or adjacency linked list. The adjacency matrix of a graph is an n-by-n boolean matrix (represented by 1s and 0s) with one row and one column for each vertex. if there is a connecting edge from the ith row to the jth column the value is 1 if a connecting edge -does not exist- then the value is 0.

Illustration of a graph representation using an adjacency matrix. Source. research gate: https://www.researchgate.net/figure/Different-types-of-graphs-and-their-corresponding-adjacency-matrix-representations-The_fig1_347300725

On the other hand, the adjacency linked list is a collection of a linked list for each vertex that contains all the vertices adjacent to the lists vertex i,e connected to the said vertex by an edge. The graph representation utilizes the Linked list header on which it adds the name of the vertex for which the linked list is derived.

Illustration of the adjacency linked list. Source, Research gate: https://www.researchgate.net/figure/Different-ways-to-represent-a-static-graph-Beyond-the-adjacency-matrix-representation-as_fig2_347300725

Graphs: Weighted graphs

A weighted graph (or weighted digraph) is a graph or a digraph with numbers assigned to its edges. The number assigned to its edges is called weight and costs. To represent weighted graphs in the adjacency matrix, if an edge does not exist (therefore does not have a weight assigned to it) a special symbol such as can be used.

weighted graph. source javatpoint https://www.javatpoint.com/graph-representation

Weighted graphs have numerous real-world applications, such as finding the shortest path between two points in a transportation or communication network, or the travelling salesman problem.

Graphs: Paths and cycles

A path from vertex u to vertex v of graph G can be defined as a sequence of adjacent vertices that starts with u and ends with v. The length of the path is the total number of vertices in a vertex sequence defining the path minus one, which is the same as the total number of edges in the chosen path.

Finding a “not-shortest” path between two vertices: https://mathematica.stackexchange.com/questions/4084/finding-a-not-shortest-path-between-two-vertices
Time-dependent graphs: https://link.springer.com/article/10.1007/s41019-019-00105-0

A graph is connected if there is a connecting edge for every pair of vertices (u,v).

https://www.gatevidyalay.com/tag/cyclic-graph-definition/

A cycle in a graph is a path that starts and ends at the same vertex in the graph. A graph with no cycles is called acyclic.

https://www.gatevidyalay.com/tag/cyclic-graph-definition/

Trees

A tree is a connected acyclic graph.

A graph that has no cycles but is not connected is called a forest, as each of its connected parts is a tree.

For every two vertices in a tree, there always exists one simple path from one of these vertices to the other. This makes it possible to select one of the vertices in the tree and make it the root of the tree, hence converting the tree into a rooted tree.

Rooted tree: https://media.geeksforgeeks.org/wp-content/uploads/minHeightTree.png

A common application of a tree in computer science is in describing hierarchies in file directories and organizational charts for enterprises.

Ordered trees

An ordered tree is a rooted tree in which all the children of each vertex are ordered.

A binary tree is an ordered tree in each every vertex has no more than two children, and each child is either designated as a left child or a right child of its parent vertex.

Binary tree: https://www.geeksforgeeks.org/binary-tree-data-structure/

Sets and Dictionaries

Sets

A set originates directly from mathematics. A set can be described as an unordered collection of distinct items. Notation for sets differ, one involves explicit listing of the elements like so;

s = {1,2,3,4,5}

or by declaring a property that all the elements of the set have in common, for example;

s = {n:n is prime number and n < 10}

The most important set operation is checking membership and finding unions between multiple sets. Some programing languages will provide a set data structure out of the box but if one has to explicitly define a set, there exists a couple of ways to do so. One involves bit strings, also called bit vector i.e

s = {2,3,5,7} becomes 011010100

Another way of representing sets is by using the list data structure.

Dictionary

A data structure that implements set operations (search, insert, delete etc) is called a dictionary.

Conclusion

There exists a very intimate relationship between data and operations in computer science. Therefore, the design, implementation and application of algorithms rely heavily on the data structures employed for these operations.

The data structure will influence the speed and efficiency of the algorithm. Therefore, it is very important to consider the data structures available to you before defining a strategy to develop an algorithmic solution to a domain problem.

So instead of just looking at the popularity and whether or not Facebook and Google claim to highly utilize a new, hip and shinny programming language, it is important for you to interrogate which data structures (ADT to be precise) are supported by the language, what is the speed and efficiency score for these ADTs and how easy are they to us while defining new solutions to new problems.

This is the second part of a four-part series on the topic of Algorithms. If you haven’t, kindly check out the prequel to this article here: https://medium.datadriveninvestor.com/algorithm-problem-types-afdfc811d8af

Support me by cheking out my small app : Life Planner

Thank you for reading…

--

--

--

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Recommended from Medium

Devise Authentication with Rails 5

Java Syntax

Hyperconvergence? What’s That?

Agile Development: What to do if I have two different teams with huge Velocity differences?

Creating your First Docker Image & Container

Perks of volunteering a hackathon

AWS CSA: Creating Key Pair, Security Group , EBS Volume and Attach them to AWS Instance using…

Deep Dive into Kong Authentication Plugin

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Samuel Owino

Samuel Owino

Software Engineer. Co-Founder, Life Planner

More from Medium

Linked Lists — Data Structures

[LeetCode]#1974. Minimum Time to Type Word Using Special Typewriter

LeetCode 129. Sum Root to Leaf Numbers

The Choose Languages for your career