Search Algorithms in Ruby

Michelle
8 min readSep 11, 2017

--

Having tackled some common sorting algorithms, next up was search. We use search for nearly EVERYTHING, so it’s clearly a very important area of study. So from the lowest, fundamental level: how does Search work?

Linear Search

This is the rough let’s-get-this-done search algorithm; it gets the work done, but it’s not very efficient. Imagine you have a business card and you’re trying to find this contact, “Jon Snow”, in a phone book. Maybe you start at the beginning of the “J” section and maybe you skip around a bit, but you’re still comparing each entry in the phone book to the name on the card until you find him. This is linear search: comparing the key (item to find) to each item in a list until it has been found. Clearly, not very efficient because you’re touching every single item in the array, especially if the item you’re looking for is at the very end of the list. But it is relatively simple to understand.

Ruby Implementation:

  1. Set the method to take in an array and a key (item you’re looking for).
  2. If the key is not found, return -1.
  3. If the key is found, I printed out a statement stating “key at index”.

With the sample array, the output is “3 at index 6”.

Ruby might not be the best example to show how slow this algorithm can be since the Array#index method and it’s under-the-hood magic returns the results pretty instantly. So I wrote it a different way to try and show the comparison of the key with every single element of the array.

Binary Search

If linear search is the brute force version of search, binary search is the sibling that knows how to work smarter, not harder, but only under certain conditions. Binary search works only on sorted arrays; it divides an array into two subarrays, looks for the key element in one of the two halves, and ignores the other half. At each level, half of the elements are untouched.

There are two ways to write binary search: recursive (in which the method will call binary_search on a subarray) or iterative (in which the method will continue to run within a loop condition). I started with the iterative version.

Ruby Implementation:

  1. The method will take in two parameters: an array and a key value that you’re looking for.
  2. Create two new variables low and high and set them equal to 0 and array.length — 1, respectively. These variables represent the beginning and ending points of a range. (We use array.length — 1 because arrays in Ruby start with index 0, so the last index is actually length — 1)
  3. Set the method to run only during the condition low <= high .
  4. Create a variable mid to store the middle index that will divide the array. We use (low + high) >> 1 to avoid overflow bugs (this occurs when a math operation ). At first, I thought to use (low + high) / 2 or even array.length / 2 to calculate the mean of two numbers, but this can cause issues as outlined by the Rosetta wiki on binary search. Using either of those calculations to find the mean might result in returning the wrong values (per this, integer overflow happens when math attempts to create a number that is outside of the range of values that can be represented with a specific number of bits). To fully understand why the >> syntax was needed, I had to dive into Ruby’s bitwise operators. The linked article explains it much better than I ever could, but essentially, integers are now treated as a sequence of bits in 1s and 0s. The right shift operator >> shifts the integer’s bits to the right by a given number of positions (in this case, it’s 1 position).
  5. Ruby’s spaceship <=> operator compares value A with value B. If A > B, operator returns 1; if A < B, operator returns -1; if A == B, operator returns 0.
  6. We use case statements to check the relationship between array[mid] and the key. If key > array[mid], then we set the new lower bound of array to mid + 1. If key < array[mid], then we set the new upper bound of array to mid — 1. If key == array[mid], return mid as index of where the key is found. Otherwise, we repeat the loop with the new lower or upper bounds.

Depth First Search

The next two sets of search algorithms focus on traversing tree and graph data structures. Graphs are a collection of nodes with relationships between the nodes (edges). There is less concern over the type of geometric, high-school graph and more focus on the relationship between the nodes. Directed graphs show relationships with arrows; node A may have relationship with node B, but node B does not have relationship with node A (arrow is one way from A to B). Depth First Search is the process of finding whether or not a path exists from one node to another by going as deep as possible along a line of nodes.

A real life application of this algorithm could apply to navigating through a maze; when faced with multiple forks in the road, you can only choose one path to go down until you reach the end of that path to determine whether or not there is an exit at the end of that path. If there is no exit, you have to retrace your steps back to the fork and try a second path. But how will you know whether or not you’ve already walked a certain path? In the maze, you could drop items to make a breadcrumb trail; in the algorithm, you would need a flag for each element to indicate whether or not it has been visited. Without breadcrumbs or flags, you could end up retracing the same steps over and over and over in an infinite loop. For DFS in graphs, we’ll make use of matrices and stacks to implement the search.

Big picture:

  1. Visit an adjacent, previously unvisited vertex. Flag as visited. Push the vertex element to a stack.
  2. Continue with step 1 until no adjacent vertex is found. Pop the last vertex from the stack.
  3. Repeat step 1 and step 2 until no more vertices exist in the stack.

The algorithm makes use of a few things I haven’t touched upon before, including an adjacency matrix. To recap, a stack behaves in a LIFO fashion (the last item added is the first item removed).

Pseudocode steps:

  1. Create an adjacency matrix by mapping out whether or not there are paths from vertex to vertex. If there is a path (edge) from Vertex A to Vertex B, place 1 in the cell matrix[A][B] ; else place 0. Do this for all vertices in the graph until matrix is complete.
  2. Create an array for visited indices and set value of each index to 0 (has not been visited yet).
  3. Create a stack to push and pop visited vertices. The stack will keep track of which nodes to backtrack.
  4. Start with a vertex and set it equal to i.
  5. If visited[j] == 0 AND matrix[i][j] == 1 (which means the vertex has not been visited yet and there is an edge from vertex i to vertex j), then the next item to be pushed to the stack is j.
  6. If the above condition is not met, increment j and repeat step 5.
  7. When the condition in step 5 is met, we push that item to the stack and set i equal to that index.
  8. Each time a new i is set, j gets reset to 0.
  9. Repeat steps 5–8, but this time, if all of the indices in the array have a value of 1 (meaning they have all been visited), the top value of the stack will be popped and i is set to the new top value in the stack.

Ruby Implementation

note: I apologize. I took a short cut. I didn’t build my own adjacency matrix. In Ruby this kind of data structure can be represented by an array of arrays, so I’m just passing it through the method as a parameter.

  1. The method will take 3 parameters: the adjacency matrix representing vertices and common edges, a source index (where the search starts), and a terminal index (the item you’re looking for).
  2. First create a Ruby array to represent our stack by adding the source index to the array. To simulate stack behavior in an array, we’ll only work with the pop method (deletes last item in array) and the push method (adds items to end of array).
  3. Create a loop that will terminate on certain conditions; otherwise keep it running. The loop will fail if the current_node is equal to node_stack.pop (last item in the stack), if the current_node is nil (no more nodes to go through), and if current_node is equal to the node we’re looking for.
  4. Otherwise, we will create an array of childnodes for each current_node that we are visiting. Use (0..adj_matrix.length-1)to create an array of indices (use length-1 so we don’t run out of bounds !) and call .select on that array return an array of children in which the adj_matrix[current_node][i] == 1 (if the value of this intersection is 1, that means there is a path from current_node to i). The Ruby array .select method returns a new array of all elements for which the passed block is true.
  5. Update the node_stack by adding the array of children to it.
  6. The loop will continue to run until it reaches the above conditions in step 3.

Breadth First Search

Imagine a scenario in which the node you’re looking for is just one node down but on a neighboring branch. If you use DFS, you would’ve gone all the way down one branch and back track upwards before finding the node you want. That would be pretty inefficient! Luckily, there’s Breadth First Search. Instead of searching for an element in by going deep into a single path, Breadth First Search checks all neighboring elements of a node first before visiting the next level down. We will still use an adjacency matrix to conduct this search, but now we’ll use a queue instead of a stack. To recap, the queue has a FIFO behavior: first element added is the first element removed.

General breadth first search steps:

  1. Start with a vertex, add it to the queue, and mark it as visited.
  2. Dequeue the first visited element.
  3. Use the adjacency matrix to find vertices connected with the first element, add them to the queue, and mark them as visited.
  4. Dequeue the elements.
  5. Repeat steps 3 and 4 until all nodes have been visited and the queue is empty.

Pseudocode Steps:

  1. Repeat essentially the same as above for DFS, except we will implement a queue instead of a stack.

Ruby Implementation

  1. Since we’re using Ruby arrays to mimic stacks and queues, we can use the same structure of the DFS algorithm except change how we manage the adding of new elements to the node_queue. Instead of appending the children to the node_queue, we will push the children nodes to the front of the node_queue while continuing to remove elements from the end of the array.

The four of these took a bit of time to get through, especially in getting familiar with graph structures and visualizing how all the pieces were moving. And the search for searching continues!

--

--

Michelle

wife. product @airbnb. traveler. DIY-er at @imperfect.thread