How I taught myself sorting algorithms and Big O in just 3 days, and you can too
To all the developers who may read this, I’m sure you’ll find a few errors. In fact, I hope you do: I welcome your corrections in the comments. Thank you!
Nothing compels the human mind to learn something new more than curiosity and raw necessity.
And so it was that three days ago I was compelled to learn about algorithms; something called the Big O; data sorting; binary search (and trees); and more.
(OK technically, because it took me the better part of two days to write this up, I should clarify: Today is Friday. Monday morning, I had no idea what any of this stuff was; by Wednesday I’d learned everything I’m writing about here and was inspired to whip up a first draft late that evening; last night I did a first round of edits; and I’ve spent the last two hours this morning editing, expanding, and finalizing.)
First, a bit about me: a technically-minded-yet-decidedly-non-developer
I’m not a developer. Yes I have a solid foundation of math and science concepts (mostly physics). Yes I’ve been a founder and head of product for two startups over the past seven years. And yes I have a pretty solid grasp of fundamental web technologies (HTML, CSS, some JS; I’m learning my way around Bootstrap and this awesome Bootstrap editor), and I’m pretty comfortable finding my way around Chrome’s Inspector, mocking up styling and copy edits with aplomb, and generally learning what does what.
In fact, I even built, singlehandedly, Twibble’s homepage — and all ancillary pages — using a (heavily modified) Squarespace template with perhaps several pages’ worth of custom CSS and third party JS snippets for things like Google Analytics; Optimizely; Mixpanel; of course Font Awesome for some icons; and so much more besides. Our pricing page was especially challenging for me to learn how to style, but this tutorial was super helpful.
I also taught myself and published, in just one afternoon, a cool little Chrome plugin for our HashTest.io real-time, color-coded #hashtag discovery tool. (Yes it’s basically just a super simple iframe running HashTest, but still. With about 2000 installs and about 4.7 stars, I’m pretty pleased for my first ever Chrome plugin.)
I took PIC10A at UCLA to grab some basic C++ skills (long since forgotten). I completed numerous tutorials online — I loved SoloLearn especially for CSS, JS, and, along with Py, Python, too — but I still have lots to learn: while I can solve the exercise problems with near perfect accuracy, I still can’t code from scratch.
And while I can explain the ins and outs of computers, mobile devices, and the internet generally — even what happens when you type a URL into Chrome, say, including things like SYN/ACK and so on — the truth is, I’m absolutely not a bona fide developer.
And now I’m just starting to dip my toes — fingers? — into Angular, Node.JS, and Express, too.
But what about the more foundational stuff like, say, algorithms?
And that’s when it hit me: I didn’t know what the heck an algorithm really is beyond a vague, ambiguous, layperson’s understanding.
Some googling later, and it turns out I have no idea what this ridiculous Big O thing is either. Or sorting. Or binary (tree) search.
So then. It was time to learn.
Many (many) browser tabs, wikipedia pages, paper scribbles, and YouTube videos later, however, spanning just three days, I’m pleased to say that I finally understand all of these topics at a pretty passable level now, even if I can’t yet do any of them on my own. (I am getting (marginally) better with pseudocode, at least.)
Specifically, I’ve now learned about Big O; sorting algorithms including bubble sort, merge sort, insertion sort, and quick sort; data structures including arrays, stacks, queues, and binary trees; and binary search.
(As an aside, in case you’re wondering, like I was, and keep googling the heck out of it, binary trees are a type of data structure comprised of nodes and no more than two children per node; binary search tree is a binary tree organized in a specific way (items larger than the node on the right; items smaller than the node on the left); and binary search (not any type of tree but which utilizes the concept of binary search trees) is a super awesome fast way to find a single value in a huge — but sorted — array. There’s a great discussion clarifying all this here and here.)
So in effort to give back to the (extraordinary) community from which I’ve learned so much, so quickly, I thought it would be appropriate — and hopefully helpful, too — for me to try to teach back what I’ve learned, if only to provide the perspective from someone not a developer.
So let’s get started. I promise to keep this short(ish), sweet, succinct, to the point, easy, and of course, fun.
Also, I’m not going to write any code. Because I’m assuming that, like me, you’re not a developer, either. And besides, please remember, I still can’t really write any proper code yet, anyway.
I hope this helps you learn something too.
What’s an algorithm?
It’s a fancy word for a function in a computer program that does something.
And what is Big O notation?
This is probably what you googled or why you started reading this in the first place.
Well to start with, the good news is that Big O is not hard to understand. It’s not rocket science. It’s actually super easy to learn. And you can learn it without knowing any code whatsoever, just like I did.
So here we go.
Big O is just a thing that shows how long, in a worst case scenario, it could take an algorithm to run.
(Aside: Algorithms do not run instantly; no computer program does. They take time. So the question becomes: how long will this algorithm take to run, in a worst case scenario? (It doesn’t matter if it runs quicker than the worst case; that would just be a bonus.)
More specifically — mathematically — it places an asymptotic upper bound on the time it could take for the algorithm to run.
So Big O is just a function that plots the time for the algorithm to complete, against the number of elements on which that algorithm is operating.
(Another aside: When I say “elements,” just imagine stuff lined up; perhaps sorted in some order, perhaps not. For example, physical papers in folders. Numbers in a list. Popsicles lined up in their freezer trays. Whatever.
Programmatically, to be precise — at least for this discussion — we’ll be talking about numbers in an array; a list, if you will. Some sorted, some not.)
So time is along the y-axis; the increasing number of elements in our list of stuff, n, is along the x-axis. (I have some pretty pictures below to help illustrate this, don’t worry.)
Just remember that the algorithm will probably be able to run in less time than the Big O function indicates; the important thing is that it will never run longer than that.
Why does this matter?
Because you need to know whether an algorithm can work its magic in seconds, days, minutes, hours, or millennia. You don’t want to wait until our Sun has gone nova and swallowed the Earth whole while your algorithm is still just 17% complete.
That would suck.
I don’t get it…
Some algorithms are fast. Some are slow.
Some are fast when dealing with a few things but slower when you give them more things to do.
Some are slow when dealing with a few things, but faster when you give them more things. Those are awesome algorithms.
For example, suppose you want to find a random number in a list of 10 numbers; and suppose this random number happens to be in the 10th spot. But of course you don’t know this. (That’s the whole point.)
Well you could easily write an algorithm that checks each and every number from the first to the tenth number; it would take 10 checks, right?
Just think about it: it would start at the first number and check it; then move to the second number; then then third, and so on, until finally finding it at the tenth position. Something like this:
1 (nope), 2 (nope), 3 (nope), …, 9 (nope), 10 (yeah, that’s the one!).
With pretty much any computer, this will be calculated almost instantly.
But believe it or not, if you had say, a million numbers; or a trillion numbers… well this will actually take a really long time, going one number of the other.
I mean sure, if the number you were looking for just happened to be in one of the first few spots, well great, that would be awesome!
But if we’re assuming a random list of tons of numbers, then the chance of it appearing randomly in the first spot, say, is like, 1/[tons of numbers], right?
In fact, as you can probably see intuitively then, the amount of time it would take to run through each and every single number in the list just to find that one number you’re looking for would increase proportionally to however many numbers you had in the list.
So scanning through 10 numbers might take, say, 10 milliseconds; but scanning through 1 billion numbers would take 1 billion milliseconds, which is like, 11 days.
On the other hand, what if you could do this more efficiently? What if I told you to just guess where the number is. No algorithms. Seriously, let’s try it.
Suppose in a list from 1 to 100, I’m thinking of 47. But of course you don’t know this.
So you guess 50, subconsciously yet intuitively soundly deciding to start right in the middle of that list of 100 numbers.
I say nope, too high. So you eliminate 50 to 100: if 50 is too high already, then you know all numbers greater than 49 are too high.
So that leaves just half of what you had before: 1 to 49.
So now you guess 25. I say nope, too low.
Ok, so now you know you can eliminate 1 to 25 leaving only 26 to 49.
Now you guess something roughly between 26 to 49, say, 38.
I say nope, too low again. So now you eliminate everything below 38 and you’re left with only 39 to 49.
This time you guess 45, at which point I say again, nope, too low, but this time, you’re now left with only 46 to 49 as your possible guessing range, and finally you succeed by successfully guessing 47.
So how many steps (guesses) did this take? Five. Just five guesses.
In contrast, however, imagine if you checked each and every number sequentially from 1 all the way up until you hit 47. Well, that would have taken you 47 steps instead of 5. So the guessing method was like 9 times faster or more efficient, right?
The point is, if you had to find one number out of our previous list of 1 billion sorted numbers, this “divide and conquer” approach would similarly be way more efficient than just going one number at a time.
So the point is: For an algorithm acting on a small number of things — say numbers in a list — the type of algorithm you use probably won’t matter. Just use whatever is fastest and easiest. But for a large number of things — say finding 1 number in a list of 1 billion numbers — well now it’s going to matter a lot what sort of algorithm you use, and the choice could mean the difference between seconds or days… or even years.
And this is what the Big O notation is all about.
So... what’s the Big O notation, finally?
It’s a way to understand — in relative terms — how efficient an algorithm is with larger and larger sets of things (numbers, popsicles, whatever).
From our discussion above, we’d basically say something like this:
The Big O of searching for the needle in the haystack one number at a time is way bigger, or way worse than searching for it using that cool “divide & conquer” method where you keep splitting the list of numbers into smaller and smaller halves.
If any of this is sounding vaguely familiar mathematically, great! Here’s how we’d describe the previous examples in math:
The time required for the slow step-by-step method increases linearly as the number of elements n in your list increases; so we would say this algorithm operates in O(n), pronounced “big oh of n” or “on the order of n.”
The fast, divide-and-conquer method is actually going faster and faster over time; so we would call this O(log n), pronounced “big oh of log n” or “on the order of log n.”
And those mean pretty much what you think they do. Still not seeing it? Try this instead:
Replace the O with y= and the (n) with x.
Now you have:
O(n) → y=x
O(log n) → y=log x.
Or alternatively, you could think of them in terms of f(x) such that:
O(n) → y=x → f(x)=x
O(log n) → y=log x → f(x)=log x.
Is this starting to make sense? Why don’t you try graphing these yourself; just type them into Google:
See what’s going on here?
The red line is the algorithm described by O(n) and the blue line is the algorithm described by O(log n).
Basically we just showed that one algorithm — the “divide & conquer” one — grows at the rate of O(log n) or f(x)=log x if you prefer; the other one grows at the rate of O(n) or f(x)=x. (Potentially. Remember, these are worst case scenarios only.)
And guess what. There’s other types of algorithms too.
Some remain constant regardless how many elements you’re working with. These are called O(1).
Others grow exponentially and are called — you guessed it! — O(n²).
And just for fun, there’s some totally silly ones that grow stupidly fast: O(n!). (Actually, they’re not that silly: they’re used for solving those so-called “traveling salesman” problems when you try to figure out all the possible combinations to travel between a bunch of cities.)
Take a look:
Green is O(n²).
Red is O(n).
Orange is O(1).
Blue is O(log n).
So to be super clear, the vertical y-axis is showing time (say, seconds; milliseconds; whatever; it doesn’t matter — it’s just relative comparisons, remember) while the x-axis is showing the number of things in your list of things.
And so what these graphs of various algorithms’ Big O functions are showing us is the so-called “time complexity” of those algorithms; these graphs show us the longest it could ever take for those algorithms to finish what they’re doing.
Pretty cool, right? I sure thought so when I figured this out!
OK, but how do you calculate the Big O for an algorithm?
Ah. Good question. This is where things admittedly get a bit tricky, but I’m going to do an academically frowned upon thing and skip the underlying theory and just cut to the chase to explain how to do this, thereby upholding my promise — more or less — of avoiding any code.
The idea is to basically look at your code and imagine you’re the computer running through each line of code.
These types of things run in O(1) time: Suppose you’re simply defining a variable. Well, that’s just happening once, right? It doesn’t matter how many things you have in your list of stuff. So we’d say it runs in O(1).
These types of things run in O(n) time: Now suppose you’ve got some sort of a “for loop” and you want it to run for the duration of all numbers in your list of numbers: if you have 10 numbers, it’s going to run 10 times, right? So it runs in O(n).
These types of things run in O(n²) time: Now if you have a loop within that loop that also runs with O(n), that means you’ve got to run that loop and the first loop, which means — intuitively, right? — that you’ll need to multiply them together: n x n = n² and we say it runs in O(n²).
And these types of things run in O(log n) time: What if you’re doing that “divide & conquer” method from earlier; there’s something going on with division, right? Well, because of the way it keeps reducing everything by smaller and smaller halves, well, that means it’s going to be running in O of log n.
The actual work to calculate these results involves some clever thinking combined with some basic math, but I think it’s worth showing at least one super simple example.
Suppose you have a line of code which says something like “Set our variable i to 1; as long as it’s less than or equal to n, where n=10, continue to increase it by 1.”
In pseudo code we’d write something like
For i=0, i ≤ n, i++.
So to calculate the time complexity of this line of code, we need to think about each element in that line of code, and think how many times it needs to be run. So let’s see:
i=0. Well, this is basically just saying, let’s set i to 0. Is it dependent upon how many items we have in our list of stuff? No. We’re just setting it to 0 once, and that’s it. Therefore, it has a time cost of 1.
i≤n: Basically, this is just checking that i is always less than, or equal to, n, in this case, 10. So it’s going to need to check 10 times (obviously) plus one more time to confirm that 10 is as high as it can go; in other words, it’s going to test 10+1=11 also just to make sure that there’s nothing else beyond 10. So we say it has a time cost of n+1 meaning it’s going to run the check n times plus one more time (therefore 11 times) to make sure there’s really nothing after 10.
i++: This is just a mechanical thing to keep adding +1 to whatever value of i we have so far. And because we know it’s only going to do this n=10 times, we say it has a time cost of n.
So let’s take all three parts’ time costs and add them together; we get:
(1) + (n+1) + (n) = 2n + 2.
Now for Big O notation, it turns out the constants don’t matter; we just care about the magnitude. And regardless of constants, n < n² always, no matter what constant is in front of n, right? So let’s just get rid of the 2s and we’re left with:
n → so we have O(n), a good example of how you’d look linearly through a list of numbers in an array. Fast at first, but growing proportionally as the number of things in that array, n, gets larger.
If you’re still not clear on why i≤10 is n+1 and i++ is n, I apologize; I’m not sure I explained it very well. If anybody could help clear it up in the comments, I’ll be sure to edit and accredit you accordingly. Thanks.
So to summarize:
O(1): Your algorithm is totally unaffected by the number of elements in your list of stuff: it only needs to act once. A coding example would be something like defining a variable.
O(log n): The time for your algorithm to complete its task will basically plateau irrespective of the number of elements involved. A coding example would be a loop that keeps dividing things.
O(n): Your algorithm’s time to run will increase as the number of elements in your list of stuff increases: 10 will take 10 seconds; 1000 will take 1000 seconds; and so on. This would be something like a loop that keeps adding 1 to a variable.
O(n²): Your algorithm’s time to run will increase exponentially as the number of elements in your list of stuff increases: so 10 will take 100 seconds; 100 will take 10,000 seconds; and so on. A nested loop within another loop is what we’re talking about here. (Add yet another nested loop? Now you’re looking at an O(n³) algorithm.)
So there you have it! That’s Big O notation: It’s simply a way to measure the relative time efficiency of an algorithm by letting you know the worst case scenario — i.e., the longest it could take — for that algorithm to run.
Wrapping it up…
Ok so this wasn’t anywhere near as short as I’d hoped; but hopefully you’ll agree it was simple to follow, concise, and logical.
That said, due to its length, I’m definitely not going to dive into related topics like sorting algorithms or binary (search) trees in this article; but if enough people ask me to write a follow-up where I teach those topics too, then I’ll be happy to do so.
Until then, have fun, keep learning, and always stay happy and positive!
PS. Totally off-topic but, do you like autonomous cars? If so, I have a podcast called — rather unimaginatively — Autonomous Cars with Marc Hoag; it appears to be the only regularly updated (typically 2–3x per week) podcast in the world dedicated entirely to autonomous cars. So I’m pretty excited to be doing it. Episodes are usually bitesized 15-minute segments; sometimes up to 30 minutes. Anyway, I’m always looking for contributors or guests, so have a listen and feel free to reach out if you’d like to be on the show!