Algorithms & Data Structures: String Searching

Introduction

Recently, I’ve decided to revisit some core computer science fundamentals and take a look at some of the most common algorithms and data structures. My background in college was MIS so we didn’t study much computer science theory or algorithms and data structures. Since I’ll be on the job hunt in the next few months, I thought it’d be a great idea to revisit some topics I had learned throughout the years and on the job. This post and subsequent posts titled “Algorithms & Data Structures I & II” are a review of the video series on Pluralsight but rewritten in Ruby. You can view the series here, assuming you have a Pluralsight account,https://www.pluralsight.com/courses/ads-part1 &https://www.pluralsight.com/courses/ads2

Topic

In this following post, we cover two popular string searching algorithms, the Naive search algorithm and the Boyer-Moore-Horspool algorithm. The process is simple; we pass in a string to find and a string to search. We’ll examine the implementation of both and also look at the differences when comparing the two.

Naive Search

Believe it or not, but you may have used the naive search implementation before and never noticed it. The process is as follows:

  1. Loop through the length of the string to search
  2. Set a variable called match count to 0
  3. For each iteration, while the position at match count + 1 in our string to search is equal to the character in the string to find AT the position of the match count, increment the match count + 1
  4. Repeat step 4 until match count equals the length of the string to find
  5. If step 5 returns true, we know we’ve looped through each character in our string to find and we’ve found our match

All it is a for loop with a nested while loop inside. The nested while loop keeps checking against the string to search using the match count as our position in our comparison. We can even display a message with how many loops it took to find our string. In the example above, it took a whopping 562 loops. Next, let’s take a look at an algorithm that can handle string searching larger strings more efficiently.

Boyer-Moore-Horspool

The Boyer-Moore-Horspool algorithm is intended to search the string and skip as many iterations as possible. This is done in two steps. Step 1 consists of using something called a “bad match table” to compare against the string to search. Step 2 is the actual comparison being done. Below is a snippet from the Pluralsight video that details how exactly the bad match table is created and how the comparisons are made.

The above video should give you the requirements you need for implementing step 2. Below is the full code to guide you through. There’s a lot of moving pieces so take it slow and really understand what each step is doing.

With the Boyer-Moore-Horspool algorithm created, we can see the first test took 83 loops and the second test took 56 loops. So in actuality, the longer our string to find is, the less loops we go through. Remember, in our naive search it took 562 loops no matter what.

Conclusion

We’ve taken a look at two different string searching algorithms, and also noted the differences between the two. To play around and test it out yourself, check out this repl.it https://repl.it/CZAE/16

Like what you read? Give Jon Shyu a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.