Computer Science for people who hate math — Big-O notation — Part 2

Debbie Lamont
5 min readDec 21, 2019

--

In my last post in this series we defined what Big-O notation is, why it matters in computer science, and why you should care about it in your day to day work as a software engineer.

As a quick refresher, Big-O notation is just a standardized way of comparing the worst-case growth rate (and therefore the efficiency) of a given algorithm used to solve some type of problem.

In that post we looked at O(1), which is where no matter how big the inputs are, there is no “growth rate” because whatever you do with those inputs in your algorithm takes the same number of steps no matter how big your inputs grow. We also looked at O(n), which is where the number of steps taken in your algorithm increases in direct correlation with the size of your inputs.

I used a nice, concrete example to illustrate all this, and in this post we’re going to expand on it a little more, and yes, we will start looking at those tech interview classics — sorting algorithms — to help us understand more about Big-O notation.

So, let’s get started!

Remember that bookshelf with all the books on it that we were going to finally get organized? Here were the tasks that we needed to take care of:

  1. Create a space on the floor to put all the books. (This is our O(1) task).
  2. Pull all the books off the bookshelf one by one and put them in our working space on the floor. (This is our O(n) task).
  3. Sort the books into “keep” and “don’t keep” piles. (I added a step! And this, too, would be an O(n) task because as the number of books increase, the time this step takes will increase by the same amount of time for each book added).
  4. Organize the “keep” books into alphabetical order and get them back on the bookshelf. (This is where we need to sort our books in the most efficient way possible).

Now, it’s this last step that we want to look at a little more closely because there are several ways we can alphabetize, or sort, our books. Depending on which method we choose, it can be done really efficiently … or really inefficiently.

So let’s look at our first sorting option.

One method we can use for sorting our books is one of the simplest to understand. We could just pick up the book at the top of the pile of “books to keep” and see if the title starts with an “A”. If not, we put it back in the same spot in the pile, pick up the next book down in the pile, and see if the title of that one starts with an “A”; and we keep going until we find a book that has a title starting with “A”. Then we’d repeat for the letter “B”, then “C”, and so on until we either run out of letters or books.

We can put some concrete data in here to make it easier to think about.

Suppose we have just five books in our “keep” pile. The titles are:

  • War and Peace
  • Oliver Twist
  • A Midsummer Night’s Dream
  • Pride and Prejudice
  • The Time Machine

These five books are in our “keep” pile in the same order as listed above, with War and Peace at the top of the pile, and The Time Machine at the bottom.

To alphabetize the books, we’d want them in this order:

  • A Midsummer Night’s Dream
  • Oliver Twist
  • Pride and Prejudice
  • The Time Machine
  • War and Peace

Our first step would be to find any books starting with A.

War and Peace is our first book, and it doesn’t start with A, so back on the pile it goes.
Oliver Twist is the second book down, and also doesn’t start with A, so back on the pile it goes.
A Midsummer Night’s Dream is our third book and it does start with A, so we put it at the top of our “keep” pile.

Now our books are organized like this:

  • A Midsummer Night’s Dream
  • War and Peace
  • Oliver Twist
  • Pride and Prejudice
  • The Time Machine

We’d then search for any books starting with B, then C, and so on.

It took us three steps to find our first book — and it started with the letter “A”! But what if A Midsummer Night’s Dream had been at the bottom of the pile? We would have had to search through all five books to get to the one we were looking for! Every time we’re looking for a book starting with a given letter, our worst case scenario is that we’ll have to search through ALL the books. This linear search method therefore is O(n) because as the number of books increases, the overall time taken will correlate to the number of books in a linear fashion.

For us humans, dealing with only five books, we probably wouldn’t ever use a linear search method for alphabetizing the books. After all, we only have five starting letters to worry about. (Or, if the starting letters of the title match for one or more books, we’d have to look at second or even third letters … but let’s keep it simple.) And wouldn’t we pull out each book into a separate pile every time, so the original pile would get smaller and smaller as time went on?

Remember that we’re talking about computers (not humans) and the computer program (this algorithm) may have to process lots of different kinds of piles of books that might have only two books, or two hundred, or maybe even two million books. And there is no way for a computer, dealing with unpredictable data, to know ahead of time that it only has to look for five letters, or three, or twenty; or even how many books it’s going to have. As programmers, we might have some idea based on knowledge of specific data, but we always have to consider if the method we’re writing is only going to be used in the way it was originally intended by us when we wrote it. (I’ll answer that for you — almost never).

However, linear search, although easy to understand, is not really the best option we have here for searching, even if we, as programmers, don’t know anything about the actual data we’ll be getting. We do know what “shape” the data will be in, and there are better ways of sorting it than using a brute force kind of approach like this.

What’s Next?

Was this helpful? Was there anything you didn’t understand? Did you spot any typos or things I could have explained better? Leave a comment and let me know!

In part 3, we’re going to look at binary searches and use this to help us understand O(log N) which falls somewhere between O(1) and O(N) in terms of efficiency.

If you missed it, you can read the first part of this series here:
Computer Science for people who hate math — Big-O notation — Part 1

--

--