# Data Structures Primer

## Depending on what you’re optimizing for, there are many different approaches to holding or organizing your data.

As its name suggests, a data structure is a structure for holding data. In roughly descending order of importance for an interview, the common data structures are:

**Array**

An array is the most straightforward way to hold a set of objects. It stores items in a simple list of objects. Looking up an object is fast if you know the index(position) but slow otherwise. For example, it’s fast to retrieve the 12th person in a list but slow to find all people named “Alex” (since you have to look through all people).

An array cannot “grow” in length after being created in most languages. You must specify the array’s length upfront, which cannot be changed afterward.

**Problem 1 — **Given a sorted array of positive integers with an empty spot at the end, **insert an element in sorted order**.

We can imagine that our array looks something like this (with a blank spot at the end):

1 4 7 8 9 _

If we need to insert an element like 6, we can’t just insert it at the end. We are supposed to put it in order.

1 4 6 7 8 9

This requires “shifting” all elements down to make space for 6 and then inserting it. There are two ways of approaching this problem:

**Approach 1: Shift From Back, Then Insert**

The first approach shifts all the elements over and then inserts the value x. We have to be careful not to overwrite values as we’re inserting. Instead of shifting from the front, we can move forward from the back.

1 4 7 8 9 _

We would first copy 9 into the empty spot. Then 8 into where 9 was. Then 7 into where 8 was, and so on. When we find the appropriate spot for x, we stop and insert x.

We return true if we could insert the element or false if there is an error.

**Approach 2: Swap Elements Moving Forward**

Alternatively, we could iterate forwards through the array. We don’t do anything for the initial elements in the array (those less than x). Those won’t be moved. However, when we find where x should be inserted, we swap x and the current element in the array. The value of x will now equal the old element in the array. When we get to the next element, we want to swap x for that value. We continue doing this for each element in the array until we get to the end.

insert 6 into 2, 3, 7, 8, 9, _

set x = 6

start i at A[0]

move i to A[1]

move i to A[2]

swap A[2] and x.

A = {2, 3, 6, 8, 9, _}

x = 7

swap A[3] and x.

A = {2, 3, 6, 7, 9, _}

x = 8

swap A[4] and x.

A = {2, 3, 6, 7, 8, _}

x = 9

swap A[5] and x.

A = {2, 3, 6, 7, 8, 9}

x = _

The following code implements this algorithm.

Note that once the if statement on line 7 becomes true, it will always be true.

Both algorithms will take **O(N) time**.

**Problem 2 — Reverse the order of elements** in an array (without creating a new array).

At first glance, we might want just to create a second array, iterate over the elements in order, and insert them in reverse order into the new array. Unfortunately, the question says not to create a second array.

Let’s look at any example:

Original: 0, 1, 2, 3, 4, 5, 6

Reversed: 6, 5, 4, 3, 2, 1, 0

You might notice that by reversing the array, we’re putting the 0 where the 6 is and the 6 where the 0 is. Likewise, the 5 and the 1 are put in each other’s places. That is, we’re swapping values!

We can iterate through the array rather than create a second array, swapping the left values with the corresponding values on the right. We only need to iterate through the left half of the array since the right half will have already been taken care of.

Be very careful with the arithmetic on lines 2 and 5.

The algorithm will take **O(N) time**.

**Hashtable**

A hashtable (sometimes called a “dictionary” or a “hashmap”) allows you to map a “key” to a “value.” This key is often a number or string, and the value can be any type of object. You might use it to map from a person’s ID number to some object with other information about them.

This is a very useful data structure because it allows for a very fast lookup. We generally assume that a hashtable is **O(1)** (constant time, regardless of the amount of data) to **insert and lookup elements**.

**Problem 3 — **Given two lists (A and B) of unique strings, write a program to **determine if A is a subset of B**. Check if all the elements from A are contained in B.

We’re told that the two lists contain unique strings, so we only need to check if all the elements in one list are contained in the other.

**Approach 1: Brute Force**

We can approach this by “brute force.” For each element in A, check if it is in B. As soon as we find an element in A that is not in B, we can return false because we know A is not a subset. We know we could find every element if we reach A’s end and haven’t returned yet. We return true.

This algorithm takes **O(a*b) time**, where a is A’s length, and b is B’s length.

**Approach 2: Hashtable**

The earlier approach is slow because we have to search through B for each element. Wouldn’t it be nice just to look up if an element is in B? This is what a hashtable allows us to do. We can build a hashtable of all the elements in B. Then, when we want to look up if an element is in B, we just use that hashtable.

This algorithm takes **O(a+b) time**, where a is the length of A and b is the length of B. It takes O(b) additional memory to hold the hashtable.

**Problem 4 — **You are given a two-dimensional array of sales data where the first column is a product ID and the second column is the quantity. Write a function to take this list of data and return a new two-dimensional array with the **total sales for each product ID**.

Example:

Input:

211,4

262,3

211,5

216,6

Output:

211,9

262,3

216,6

The output for this method needs to be a list of product IDs and their total counts.

We can straightforwardly do this by using a hashtable. We iterate through the list of (productID, quantity) pairs. For each value, we increment its entry in the hashtable or insert it if it’s not already in there. Finally, we convert the hashtable back into an array.

Don’t worry if you don’t know the specific commands for keySet and containsKey. Your interviewer shouldn’t care about things like this. The important thing is that you know how to translate an approach into something that resembles workable code.

This algorithm takes **O(N) time**, where N is the number of lines in the input.

**Graph and Tree**

A **graph **is a set of nodes that are connected through edges. Not all the nodes need to be connected — you could have two entirely separate subgraphs — and the edges can be either “directed” or “undirected.” A directed edge can be thought of as a one-way street, with an undirected edge being like a two-way street. If the graph is directed, an edge from v to w is not an edge from w to v. Therefore, you might be able to “drive” from node n to node m, but not the other way around.

A **tree **is a type of graph in which any two nodes are connected through one, and only one, path. A tree will not have cycles since there can only be one path between any two nodes.

A tree can come in many forms, but the most common is the **binary tree **by far. A binary tree is a tree where each node has only two child nodes. We call these nodes the left node and the right node. There cannot be any “cycles” on the tree (no paths from a node back to itself). Because of these restrictions, a binary tree can be represented in a strictly hierarchical fashion like this:

Commonly, we work with **binary search trees**. A binary search tree is a tree in which all nodes in the left subtree are less than the node’s value, which is, in turn, less than the values of all the nodes in the right subtree. The above tree is a binary search tree.

If a binary search tree is balanced (and usually we deal with balanced binary search trees), inserting an element, as well as finding an element, is** O(log n)**, where n is the number of nodes.

**Problem 5 — Insert an element into a binary search tree** (in order). You may assume that the binary search tree contains integers.

In a binary search tree, lesser values are put on the left of a node and greater values on the right.

The easiest way to implement this is recursively. Start with the root and compare the value you want to insert, x. If x is less than the root, call insert on the *root.left*. Call insert on the right side when x is greater than the root. Repeat this until you don’t have a left or right child. Insert x there.

The **time to insert a node will depend on the tree’s height.** If the tree is relatively **balanced**, it should have **height O(log N),** where N is the number of nodes in the tree. However, if the tree is **very imbalanced** (for example, basically a straight line down of nodes all on one side), the **height could be as much as N.**

**Problem 6 — **Given a **binary search tree** that contains integers as values, **calculate the sum** **of all the numbers.**

If we approach this problem recursively, it is surprisingly simple. In this case, the “right” perspective means.

Suppose we want to compute the sum of the nodes in a tree like this:

We could traverse the tree, collapse it into an array, and then compute the sum of those values. That’s a lot more complicated than is necessary. The simpler way is to think about the problem’s subproblems. The sum of the entire tree will be the *sum of the left subtree + sum of the right subtree + sum of the root*.

sum(tree_at_20) = sum(tree_at_10) + sum(tree_at_30) + value_at_node_20

Getting the sum at node 10 can then be defined in terms of its subproblems.

sum(tree_at_10) = sum(tree_at_5) + sum(tree_at_15) + value_at_node_10

We can almost directly translate this into code.

If we hit the end of a path (a null node), we return 0.

The runtime will be **O(N)**, where N is the number of nodes in the tree. One way to see the runtime is to realize that sum will be called exactly once for each node in the tree.

**Linked Lists**

Like a binary tree, a linked list is a data structure composed of nodes, where each node has a pointer to other nodes. The node only has a pointer to its next node in a singly-linked list. The node points to its previous and next nodes in a **doubly-linked list**. It is generally considered highly problematic and possibly a violation of the linked list structure if a linked list has a cycle.

**Inserting **a node into the **front **of a linked list can be done in **O(1) **time. However, if the **list is sorted** and you wish to **insert the node in order**, it will take **O(N)** time, where N is the number of nodes because you must first find the right spot, which requires searching through the whole list.

**Finding a node** in a linked list is **O(N)**, whether or not the list is sorted.

**Problem 7 — Insert a node into a sorted linked list.**

To insert a number in order into a linked list, we first need to find the right place to insert the node. Then, we need to insert it.

The tricky bit is figuring out how to handle inserting a node into the front of the linked list. Imagine we call an *insertInOrder *method that looks like this, and it (for this particular case) needs to insert n into the front of the linked list:

*void insertInOrder(LinkedListNode nd, int value)*

Just inserting node *n* and having *n.next* point to nd is not enough. Whoever uses the linked list doesn’t know that the real **head **of the linked list has been updated from *nd *to *n*. They only have a reference to *nd*. Therefore, in an insert method, you need to return the new head of the linked list. Most of the time, the head will be the same as before you called an insert. Sometimes it will change, though, and you need to notify the caller.

This algorithm takes **O(N) time**, where N is the number of nodes.

**Problem 8 — “Sort” a linked list that contains just 0s and 1s.** That is, modify the list such that all 0s come before all 1s.

**Approach 1: Build Two Linked Lists**

One of the simplest ways is to build a “zeros list” and an “ones list” and then join them at the end.

Observe that we need to return the new head of the linked list, as it might have changed.

**Approach 2: Count the Zeros**

We’re not required to use the same actual objects that we were given. If we moved values instead of nodes, that would fit the problem requirements. Therefore, we can just iterate through the linked list once, counting the number of 0s. Then, we iterate through it again, setting the first k values to 0 and the rest to 1.

In this approach, we’re moving values, not nodes. The actual reference to the head won’t change, so we don’t need to return anything.

**Approach 3: Swap the Values**

Since we only need to move the values, we can also iterate through the linked list, swapping the 0s and 1s as we find them. This approach works by two pointers, p, and q. The p pointer looks for 1s, and the q pointer looks for 0s. When they find their values, they swap.

1. Start p at the head.

0->0->0->1->1->0->1->0->1->0

p

2. Move p to first 1.

0->0->0->1->1->0->1->0->1->0

p

3. Start q at p.next

0->0->0->1->1->0->1->0->1->0

p q

4. Move q to next 0.

0->0->0->1->1->0->1->0->1->0

p q

5. Swap values at p and q.

0->0->0->0->1->1->1->0->1->0

p q

6. Repeat at step 4:

// move p to next 1

0->0->0->0->1->1->1->0->1->0

p q

// move q to next 0

0->0->0->0->1->1->1->0->1->0

p q

// swap

0->0->0->0->0->1->1->1->1->0

p q

// move p to next 1

0->0->0->0->0->1->1->1->1->0

p q

// move q to next 0

0->0->0->0->0->1->1->1->1->0

p q

// swap

0->0->0->0->0->0->1->1->1->1

p q

In other words, p always points to the first 1, and q points to the first out of place 0 (which is the first 0 after p). Whenever q finds a 0, we know the 0 is out of place. We swap its value with p and move p to the next node.

This approach might be the least intuitive for some people, but it leads to fairly short code with the use of a helper function — it leads to fairly short code.

These three approaches are all **O(N)**.

**Stack**

A stack is a data structure that defines a precise order for elements to be inserted and removed. When an element is added or “pushed,” it is inserted on the top of the stack. When an element is removed, it is “popped” from the top of the stack.

A stack is said to be a LIFO (last-in-first-out) data structure since the last (most recent) element added is the first one to be removed.

This way, it acts as a stack of plates in real life. When you add a plate onto a stack of plates, you add it on the top. When you remove a plate, you always remove it from the top.

**Inserting and removing** from a stack is **O(1)**. A stack is not a good data structure choice if you need to find an element with a particular value, as it would require removing all the elements, one by one.

**Problem 9 — **Write a function that takes a stack as input and **returns a new stack with the elements reversed**.

The most straightforward way is to create a new stack and pop the elements from the first stack onto the second. This will put the top element from the original stack on the bottom of the new stack.

The only problem with this is that our original stack gets completely emptied in the process.

If this is a problem, you can use an additional stack to hold all the popped values. We push the popped values onto both the temp and the reversed stack. (These stacks will have the same elements.) Once we’re done popping the elements from the stack, we push them back from temp onto the original stack.

Both approaches will have **O(N) runtime**.

The second one will go through two passes instead of one, but constants don’t affect the big O time. Remember that big O is not an expression of how many seconds something takes. It expresses how the time scales (in this case, linear) as the input size gets longer and longer.

**Problem 10 — **Write a function that **removes all the even numbers from a stack**. You should return the original stack, not a new one.

For this problem, we can rely on the same instinct as the second approach from the prior problem: reversing something twice puts the elements back in their original order.

We can just pop the stack, element by element. If the element is odd (that is, not even), push it onto a new, temporary stack. Then, once we’re all done, push them back onto their original stack.

This algorithm will take **O(N) time**. Since you have to go through every element, you can’t solve the problem faster than this.

**Queue**

A queue is essentially the opposite of a stack. Rather than removing the newest item with a LIFO (last-in-first-out) principle, it removes the oldest item. It is said to be “FIFO” (first-in-first-out) since the first item you add will be the first one you remove.

It acts as a queue (or line) in real life. When people are in a queue for movie tickets, the first person to get in line is the first person who will be served. This is, of course, how the data structure gets its name.

**Inserting **(or “enqueuing”) and **removing **(or “dequeuing”) from a queue is **O(1)**. As in a stack, it is not used for finding an element.

**Problem 11 — **Write a function to **check if two queues are identical** (same values in the same order). It’s okay to modify/destroy the two queues.

We are allowed to modify the two queues, which should give us a clue that we need to do just that. We can repeatedly remove the front of each linked list and compare the values. If the values are not equal, then we immediately return false.

What happens when one list is emptied? That depends. If both lists are empty, we know the linked lists are identical (nothing has failed yet). However, if only one list is empty and the other is not, we know the lists are different. After all, we’re removing the elements in the same order.

This algorithm takes **O(N) time**, where N is the length of the smaller list. Why the smaller? Because we exit as soon as either list is empty. That will happen to the smaller list first. It doesn’t matter how big the bigger list is; it won’t affect the runtime.

**Problem 12 — **Write a function to **remove the Kth element from a queue** (keep all the other elements in place and in the same order).

Observe that if we continuously remove elements from the front and add them to the back, we’ll wind up with the same list. Therefore, to remove the Kth element, we can just remove each element and re-add it — skipping the Kth element.

This algorithm takes **O(N) time**, where N is the number of nodes.