Data Structures: The Basics

Kristy Parker
The Startup
Published in
7 min readAug 25, 2020

I recently graduated from a coding bootcamp. By the end, I now feel comfortable creating a web application from ideation through to completion. However, I know that there are gaps in my knowledge. In my first mock technical interview, I was asked questions on data structures right off the bat. The curriculum I completed did not cover data structures explicitly. I was able to muck my way through some of the questions logically, but now I’m taking the time to fill in that knowledge for myself, and to give an overview for any other fresh grads who are running into a similar experience.

DATA STRUCTURES

Data structures are ways of organizing data so that it can be used effectively, while taking into account the space and time complexity of different tasks.

A phylogeny of data structures; showing both primitive and non-primitive forms.

This post will assume that if you’re reading this, you’re already familiar with the primitive data structures, and will focus on the nonprimitive data structures, beginning with simple ordered lists.

ARRAYS

An array is an ordered collection of elements, each with its own index. Each element itself could be a value, an array, or an object. The elements of an array are stored in contiguous memory locations.

An array declaration containing a list of name strings of great scientists, separated by commas.

Since the position of each element can be directly computed, it’s not necessary to navigate the data structure in order to access a particular element, which is part of what makes an array one of the most efficient ways to store and access a sequence of values or to store a collection of data with similar elements.

Advantages and Uses:

Simple to create and use.

Iterating on arrays is faster compared to other data structures.

Large datasets can be stored in an array instead of declaring many variables, which can improve code readability.

Arrays can be used to store data in more than one dimension.

They are the foundational building block for complex data structures, and can be used to implement other structures.

Disadvantages:

In many languages, if you want to use an array, you have to know the number of elements you want to store when it is declared. It is impossible to add additional elements after it has been declared.

STACKS

A stack is a linear structure. Insertion or removal of any elements happens at the top of the stack. It follows the Last In First Out (LIFO) pattern.

A diagram of a stack of objects, with insertion or removal happening at the top of the stack.

An example of a stack is your browser history. If you hit the back button, you don’t want to be taken to a site you visited last week or last month; you want to see the last element added, or the last website you visited, which, fortunately, is at the top of the stack.

There are basically three operations that you can perform on stacks. You can add (push) an element to a stack, remove (pop) an item from the stack, or display (peek or top) the contents of the stack.

Advantages

Memory can be allocated dynamically, unlike an array.

Reading from and writing to stack variables is very fast and easy.

Disadvantages

Inflexibility. Most memory that people need is not in a sequential or chronological order, so if you aren’t looking for the item on the top of the stack, the computer must take the path from the last set of information to the set you want, moving through the stack.

QUEUES

Similar to stacks, queues are sequential, linear structures. However, queues process elements in the order that they were entered. A queue follows the First In, First Out (FIFO) rule.

A diagram of a queue, showing the first in element being dequeued first, and an element being enqueued to the end.

Queues can be used for CPU task scheduling, or where a large number of requests might be received, to store each request to be processed in the order in which it was received.

Advantages

Memory can be allocated dynamically, unlike an array.

Ensures the oldest data is processed first.

Low runtime.

Disadvantages

Flexibility. Only intended for retrieval of the oldest element.

LINKED LISTS

Linked lists are linear data structures that use a referencing system rather than an index or element position in the line. They consist of elements stored in nodes that each contain a pointer(link) to the next node, from head to tail.

A metaphor for this could be a scavenger hunt where each item includes a clue that points you to the next one.

Diagram of a linked list with each element connected to the next with a pointer, starting at the head.

If any elements are swapped out, the pointers will need to be reset so the linked list remains unbroken. Similarly, if a new element is added to the first position, the head will need to be set to the new element, and a pointer added that directs to the next element, which was previously the head.

So why bother?

Advantages and Uses

Linked lists are easier and more efficient to restructure.

Elements can be inserted into linked lists indefinitely, while an array may either fill up or need to be resized.

Best used when data must be quickly added or removed from unknown locations.

Disadvantages

While arrays allow random access, linked lists only allow sequential access to elements, making them unsuitable for applications that need the ability to look up an element by its index.

They use more memory than an array.

It is inefficient to traverse the list backwards.

TREES

Trees are nonlinear data structures primarily used to represent hierarchical relationships.

Similar to a linked list, a node contains both elements of data and pointers marking its relation to immediate nodes.

Diagram of a tree, showing the highest height at the top of the tree, and the highest depth at the ends of the branches.

The root contains references to all elements directly below it, which are known as its “child nodes”.

A common type of tree is the binary search tree, which is used to easily search stored data. These search operations are highly efficient, but must follow very strict rules.

Advantages and Uses

Excellent for storing hierarchical data.

Quick and easy insertion and deletion operations.

Dynamic size.

Trees are easier to search than linked lists. Binary search trees are very quick at searching, with tame worst-case behavior.

Can be used to find a shortest path in networking.

Disadvantages

Child nodes do not contain any information about their parent node.

It is slow to rearrange nodes.

Hash tables are faster in most cases.

GRAPHS

Graphs are used to represent networks. This could include social networks, or flow of computation, among others.

In a social network, each vertex might represent a single person, and hold data for that individual.

A diagram of an undirected graph with nodes connected to differing numbers of other nodes.

Graphs can be undirected, as in the above diagram, or directed.

In a directed graph, the edges are directional. An example of this is a website, where the vertices represent web pages and the directed edges represent links from one page to another.

Graphs are most commonly represented by either an adjacency matrix or an adjacency list, depending on the situation.

Advantages

Able to model a diverse number of subjects, and can allow any number of edges from each node.

They can be used to find shortest paths.

They make large data simpler to work with.

Disadvantages

Must choose a representation based on the task, with each having trade-offs.

Adjacency matrices have a large memory complexity.

Adjacency lists have no quick way to determine whether a given edge is present in the graph.

HASH TABLE

Hash tables store data in an associative manner. Data is stored in an array format, with each data value having its own unique index value.

Diagram of a hash function taking in keys and pairing them to an index associated with a value in a table.

A hash table uses a hash function to generate an index where an element is to be inserted or is to be located from.

The actions that are done on a hash table are search, insert, and delete.

Advantages

Access of data with a known index is very fast, irrespective of the size of the data.

Disadvantages

There is a point at which a hash table becomes too full, and the table must be enlarged, which can result in unwanted behavior.

Collisions can occur, which require error handling.

MORE RESOURCES

For more on data structures, see Overview of Data Structures on GeeksforGeeks

For more information on trees, I found Data Structures 101: a deep dive into trees with Java by Amanda Fawcett useful.

For more information about graphs and their representations, see Graph Data Structures by baeldung.

THANKS!

I am forever improving. If you have corrections or feedback, I would be happy to learn and make fixes.

--

--

Kristy Parker
The Startup

I’m a scientist turned software engineer who is excited to help modernize health and research. Connect on LinkedIn: www.linkedin.com/in/kristynparker/