What is an algorithm?
As a programmer, it is important to have a wide arsenal of tools at one’s disposal. This may include programming languages, text editors, packages, etc. Perhaps most importantly, it is crucial to have a firm understanding of the various types of algorithms that exist. These will becomes one’s most important set of tools in tech interviews and in the workplace.
For those who do not know, an algorithm is not a challenging concept to understand. Algorithms are simply well-defined, step-by-step solutions to problems. For example, there is an exercise in many computer science classes where students are asked to explain how to make peanut butter and jelly sandwiches. Shockingly, the process of doing so is an algorithm. An MIT summer camp handout provides insight as to what this algorithm might look like:
1. Pick up a slice of bread
2. Open the jar of peanut butter by twisting the lid counter clockwise
3. Pick up a knife by the handle
4. Insert the knife into the jar of peanut butter
5. Withdraw the knife from the jar of peanut butter and run it across the slice of bread
6. Take a second slice of bread
7. Repeat steps 2-5 with the second slice of bread and the jar of jelly.
8. Press the two slices of bread together such that the peanut butter and jelly meet
As you can see, each step is incredibly explicit, leaving no room for error. This is because algorithms are typically interpreted by computers, which are objectively dumb. For instance, say you give a computer a list of 7 things and ask for the 8th item. An intelligent human would merely reply, “There are only seven things on the list, thus I cannot give you the eighth thing.” A computer, however, would likely break.
On the plus side, computers are fast, which allows them to run really complicated algorithms more efficiently than humans ever could. For instance, a computer can sort a list of 100 things in a fraction of a second, provided it received explicit instructions detailing how to do so.
What types of algorithms exist?
Computers are asked to do a wide variety of tasks, and are expected to do them quickly. Sorting a contact list by each contact’s last name, searching for a specific video on YouTube, the list goes on. Since before the first computer was ever invented, scholars were researching algorithms: designing novel, efficient ways to solve problems. I’ll go over two of the more well-known algorithmic paradigms:
- The brute force approach
Imagine you are looking for the word “enigma” in the dictionary. The intelligent person would simply flip to the section containing words with the letter e, and go from there. Pretty efficient, right? Unfortunately, computers do not understand english too well. Thus, one may propose using the brute force approach to solve this problem.
The brute force approach would entail looking at every word on every page, until the word enigma came up. This is quite an inefficient task. In fact, we can say that, if there are n words in the dictionary before the word enigma, we would have to perform n operations (an operation, in this case, would be checking to see if we reached the word enigma) before finding the correct word.
Let’s imagine the worst case scenario. Say you wanted to search for the word “zzzzzz.” Obviously, this word does not exist, but what if you wanted to double check? Using the brute force approach, this would require you to flip through every page, looking at every word, until you reach the last word in the dictionary (Zyzzyva) in order to determine that “zzzzzz” is not a word. Similar to the enigma example, this would require n operations if we assume there are n words in the dictionary. There must be a more efficient way to solve this problem.
- Divide and conquer
Now let’s suppose that the dictionary you are looking at is ordered, as most are. This means that you could split the dictionary into pieces and determine if the words in each piece occur before or after “zzzzzz” alphabetically.
The workflow would look like this :
1. Find the center of the dictionary2. Determine if "zzzzzz" is at the center. If so, our job here is done3. If not, determine if "zzzzzz" comes before or after the center word in the dictionary alphabetically4. If "zzzzzz" comes before the center, repeat these steps with the first half of the dictionary. Otherwise, repeat these steps with the second half of the dictionary
Eventually, you will split the dictionary so many times that you will be left with either one or two words. In either case, the process of determining if any of the remaining words is “zzzzzz” is trivially easy. Unlike with the brute force approach, you were able to reach this point while skipping a large portion of the dictionary.
The approach of dividing a workload into smaller and smaller pieces falls under the name divide and conquer. This algorithmic paradigm utilizes what is referred to as recursion, where a problem is defined in terms of a simpler version of itself. In this case, the problem is searching for “zzzzzz” within the whole dictionary. The “simpler version” would be searching for “zzzzzz” within a smaller section of the dictionary.
Just how much more efficient is this solution? In order to determine this, we need to calculate how many operations we performed. For simplicity, we will say that steps 1 through 4 in the workflow above count as one operation. We can do this because the amount of trivial operations performed at each step do not contribute much towards inefficiency: it’s the amount of steps taken that make code run slow.
So at each step, we perform a single operation. How many steps are we taking? Well, at each step we are either finding “zzzzzz” or are dividing the size of the dictionary in half. Thus, if the full dictionary contains n words, the amount of words that are disregarded at each step are:
n/2 -> n/4 -> n/8 -> n/16 -> n/32 -> .... -> 1
With a little math manipulation, we eventually realize that as opposed to requiring n operations in the worst case, we now only require log n. So, if the dictionary contained 1,000,000 words, the brute force approach would require 1,000,000 operations in the worst case, where divide and conquer would only require about 20. Not bad!
Note that the log, in this case, would be log base 2. However, similar to how the amount of operations performed at each step can be disregarded, so can this. For an explanation of why, feel free to check out this link: https://cs.stackexchange.com/questions/78083/why-is-the-log-in-the-big-o-of-binary-search-not-base-2
Why are we only concerned with the worst case?
After talking so much about worst case scenarios, computer scientists probably sound like a group of chronic pessimists. However, this could not be further from the truth.
We worry about the worst case scenario because it is our hope to minimize the negative effects (or the probability of experiencing) the worst case. The negative effects can include anything from excessive memory consumption, a long run-time, or potentially stalling ad infinitum.
There is a certain notation tied with analyzing worst case runtimes. It is called Big O notation. In reading this article, you already know the Big O efficiency of two major searching algorithms:
- The technical name of the brute force searching strategy I outlined is Sequential Search. Recall that, in the worst case, this algorithm will require n steps/operations. Thus, the Big O efficiency of this algorithm is written as O(n)
- The technical name of the divide-and-conquer strategy is Binary Search. Since it required log n operations in its worst case, its Big O efficiency is O(log n)
And there you have it! You now know two terribly inefficient methods of searching through the dictionary. You also now know how to measure and describe their efficiency. I sincerely hope, however, that you decide to just Google the definitions of words.
Thank you for taking the time to read this. Please let me know if you found this helpful, or if you have any suggestions of computer science topics I should cover in the future. Happy hacking!