Big-O Notation

10 min readApr 12, 2018

Imagine this:
It’s been weeks since you first started writing your code for this killer app that you swear will change the way that the world views technology. You’re almost done with it, except for this one particular feature that simply refuses to be debugged. You scour countless Stack Overflow threads and watch just about all of YouTube trying to find a solution, but you come up with nothing. In frustration, you take a break and go walk/run/nap/take a bathroom break and suddenly, it hits you.

A solution.

But it certainly is not the world’s prettiest solution; it involves a bunch of iterators and conditionals and even more iterators. But when you run your program, it works. You let out a sigh of relief, pat yourself on the back and promise yourself that you will refactor the code tomorrow. Perhaps there’s a more elegant solution that you’ve just missed and perhaps after a full night’s sleep, you’ll find it. Or perhaps not. Your code works and that’s good, right?

Your killer app that will change the world.

Well, not really… At the end of the day, the truth of the matter is that not all code is designed the same. Some eat up less memory than others, some are elegant one-liners and some are just more effective than others. But how do we quantify the effectiveness of a specific algorithm? Enter big-O notation.

Not all code is made equal; some are just better than others.

So, what is big-O notation? The formal definition of it is as follows:

For some functions f(x) and g(x), defined on some subset of the real numbers,
f(x) = O(g(x)) as x → ∞ if and only if there exists a positive constant C and a constant N such that |f(x)| ≤ C|g(x)| for all x > N.

Rather than trying to parse the definition, let’s first discuss big-O in layman’s terms. Simply put, the “O” in big-O is talking about the “order” of an algorithm and what the order of an algorithm tells you is how fast or slow your code runs (the runtime of an algorithm) depending on the size of your input/parameters. Figuring out the exact runtime that your algorithm takes when it is executed requires a somewhat deep understanding of how your language is structured, but here are some general rules:

We assume that the size of our input is “n”. So, the runtime will be a function of n.
We assume the worst-case scenario as we are finding the upper bound of our runtime. Sometimes our input is nicely formatted, so our actual runtime may be faster than our expected runtime, but never slower.
Coefficients of the terms as well as the slower growing terms do not matter when considering the runtime.

But instead of dealing with hypotheticals, let’s look at some examples (listed from the slowest growing to the fastest growing runtime)!

A scene from an anime aptly named “The Big O”

O(1) Operations — Constant Time

The best runtime for any algorithm is said to be constant time and it is considered as a O(1) operation. This means that no matter the size of our input/parameter, your computer will execute the code in the same amount of time. Some examples of algorithms that has a constant runtime might be accessing a given index of an array/hash, adding two numbers together or determining the parity of a binary number. So, for instance, whether you’re accessing the 1007th element in a given array or the 1000000007th element, your computer should take about the same amount of time because both operations are of O(1).

O(log n) Operations — Logarithmic Time

Operations done in logarithmic time are somewhat trickier to explain, but it helps to think of this as a O(log(base 2) n) operation rather than a simple O(log n), since base 2 makes it easier to relate to the concept of halving. A classic example of an algorithm that is an O(log n)operation is a binary search on a sorted array, in which we try to find an element by essentially cutting the array into halves. We first compare the element that we’re trying to find (call it n) with the term at the middle of the array and depending on whether n is bigger or smaller, we look to the right or to the left of the array, respectively. We then repeat the process for the subarray that we chose and we keep on doing this until we eventually find n or find the index in which n should be in. This concept of constantly halving the size of the input is the key to identifying O(log n) operations and when it comes to working with large sorted databases, algorithms with O(log n) runtime is the preferred way to go.

A binary search for the number 19 in the sorted array.

O(n) Operations — Linear Time

Algorithms done in linear time are considered to be O(n) operations and as the size of our input increases, our runtime increases proportionately. One example of an algorithm done in linear time is going through a given array and searching for a specific instance. So, if we have an array of 200 numbers and an array of 100 numbers and we’re looking for the index of the number 1 in each of them, it’s going to take longer for the same computer to look through the 200-element array since the size of our input is bigger. Now, we could be lucky and find that the index of number 1 is 0 (eg: 1 is the first element in the array), but remember that whenever we’re calculating the runtime, we are going to assume the worst-case scenario because we’re always finding the upper bound of our runtime.

We’re iterating through an array to search for an element.

O(n²) Operations — Quadratic Time (And Other Polynomial Time)

For algorithms done in linear time, if the size of our input increases, our runtime increases proportionately. However, for algorithms done in quadratic time, our runtime is directly proportional to the square of the our input size. Algorithms done in quadratic time are considered to be O(n²) operations and and any time that an iterator is used inside another iterator (nested iterators), it is an operation done in at least quadratic time. So, for example, if we are given an array of n numbers and we want to match all the elements in the array with all the other elements in the array as well as itself, our code would look something like this:

Pair_nums takes an array of numbers and pushes a doubleton into an empty array.

And if we use even more nested iterators, our runtime quickly goes from quadratic to higher degreed polynomials (cubic, quartic, etc.) and we consider that as O(n^c) operations, where c is the number of iterators being nested. Much like algorithms done in linear time, remember that we are going to consider the worst-case scenario.

O(2^n) Operations — Exponential Time

Algorithms done in exponential time are considered as O(2^n) operations and they are going to be the fastest growing operations that I will cover in this blog post. Specifically for O(2^n) operations, with each addition to the input, our runtime will essentially double. An example of this type of algorithm is the recursive method to get the nth Fibonacci number:

As we figure out the next Fibonacci number, the number of branches (almost) doubles, so we run twice as much operations.

At first, it may be hard to remember the order of growth for each kind of runtime especially when you first learn about big-O. But thankfully, the great thing about big-O is that it’s heavily intertwined with mathematics, so we can approach the topic in multiple ways. For all the visual learners out there, if we were to graph the parent functions of all the runtimes, we can see how fast each type of graph rises — the higher the graph, the faster the growth in runtime.

The higher the graph, the faster the growth in runtime. The lower the graph, the more efficient the graph.

Now that we have a somewhat better understanding of big-O, let’s see them in action! We’ll take a look at two specific methods that both accomplish the same goal of seeing if two numbers in the array add up to a target number and compare their runtime. In order to find out the runtime of each code, let’s first go through each of them and write down what’s happening.

For method 1:

We loop through the num_arr, with one “pointer” called n.
While n stays on one element, we once again loop through num_arr with a different “pointer” called n2.
We compare if the target number minus n is equal to n2.
We return an array (whether it’s the array of two numbers or the empty array).

For the first step, what would be the runtime? Well, we first loop through the num_array, meaning that the runtime depends solely on the size of the num_array. If there’s 10 elements in the num_array, we’re going to do 10 operations whereas if there were 75 elements in the num_array, we’re going to do 75 operations. So that’s a O(n) operation.

The second step is a little bit weird. We’re still looping through the num_array, so that’s still going to be an O(n) operation. But how should that be combined with the first step? Should we be adding the runtime or should we be multiplying it? Well, for this specific case, we’re iterating through the num_array for each of the elements in it, so we would actually be multiplying it. This is the nested iterator that we talked about before when talking about O(n²) runtime.

For the third step, we’re actually doing two things at once; we’re subtracting two numbers and seeing if it’s equal to another number. So, that’s two O(1) operations being done twice, or O(1) + O(1).

In the last step, returning an array is also a O(1) operation since all that requires the computer to do is to create an array by accessing its memory.

But note that for steps 3, we’re doing it in the context of the iterated loop. So, all in all, our method takes about O(n)*(O(n)*O(1)) + O(1) or O(n² + 1) runtime. But remember for big-O notation, we don’t care about coefficients and the slower growing terms. So our overall runtime of the first method is O(n²). Not bad, but not good either.

O(n²) runtime is decent… But we could definitely do better.

Let’s do the same thing for method 2:

We create a hash.
We loop through the num_array, with one “pointer” called n.
We make a comparison by accessing a hash with the key n.
We return an array (whether it’s the array of two numbers or the empty array).

For the first step, the creation of a hash is an O(1) operation (because once again, all that a computer needs to do is allocate some memory for the hash).

For the second step of looping through the num_array, we are running an O(n) operation.

For the third step, we’re accessing a hash, which as described previously, is an O(1) operation and then, comparing the result to another value, which is another O(1) operation.

Returning an array is still an O(1) operation.

So, for method 2, our overall runtime is O(1) + (O(n)*O(1)) + O(1) or O(n+2). Remembering to get rid of the less significant terms and coefficients, we get that the overall runtime for method 2 is O(n), which is much better than method 1’s runtime. Two functions that accomplish the same goal, but one is definitively better than the other in terms of runtime!

So, now that we feel a lot better with big-O notation, let’s try to see if we can make sense of the formal definition. Here it is, once again:

For some functions f(x) and g(x), defined on some subset of the real numbers,
f(x) = O(g(x)) as x → ∞ if and only if there exists a positive constant C and a constant N such that |f(x)| ≤ C|g(x)| for all x > N.

Referring back to our example of finding the runtime for method 1, we can set the following:

f(x) as the overall runtime of the method in terms of x (f(x)=x²+1)
g(x) as the parent function of the overall runtime (g(x) = x²)
C as 100
N as 2

Now, when we make the substitution into the right side of the bi-conditional statement, we get the true statement:

|x²+1| ≤ 100|x²| for all x > 2

Since we were able to find a positive constant C and a constant N, then g(x) is one of our possible runtime. Note that if g(x) were a lower degree polynomial, the inequality would not be true for all values of x past a certain point. In fact, notice that g(x) could have easily been a higher degree polynomial such as g(x)=x³ or even g(x) = x!. This is because that whenever we calculate runtime, we are looking for an upper bound on the limit, not necessarily the lowest upper bound. So, while it is technically correct to say that the runtime for method 1 is O(n³) or even O(2^n), we generally want to try to find the lowest runtime so that the information we have is not only accurate, but useful as well.

Graph representation of the definition pulled from Wikipedia.

Resources: