The big D with big O
Navigating big O notation with a high school dropout
It’s no secret I don’t have a computer science background. I share my personal life with our industry at every chance I get because I want people to know that even those they respect and look up to didn’t have a traditional path. I’ll die on this hill that you don’t need a traditional education to become a developer. All you need is a passion for learning and the drive to build software. That being said, learning computer science-y stuff will help you become a better engineer, and give you the tools you need to propel yourself to the future you want.
In this series, I’m going to guide you through my process of going from ‘code hitchhiker’ to confident problem solver, and we’re starting with the most important computer science topic I’ve learned — the crux of optimization: Big O notation.
So what is Big O notation?
The mathematical explanation is that it is the asymptotic analysis of time and space. I know what you’re thinking: this is not helping. Let’s take a step back.
WTF is O? Why is it big? Notation?
O is an abbreviation for Order. Order has a number of different meanings in mathematics (you’re probably familiar with the idea of “order of operations”), and in this case we are referring to the order of magnitude. Magnitude is just a fancy ass way of saying size. Now that we know we’re dealing with the size of something, we can get a bit of perspective.
If we look back at the terminology I naively threw you “asymptotic analysis of time and space” — we can start to understand what information we are ultimately looking for. Time and space both have the concept of size. So we are ultimately analyzing the size of time and space.
In what context does the size of time and space matter in programming?
In the web, let’s say a user just started playing a video game called League of Legends. They want to learn about some champion, say Zyra. They get to her page and they are greeted by lovely, animated splash art.
We know that every asset we load has a cost associated with it. This asset is a video, so the tradeoff is that we are willing to use a higher-cost asset for the user’s experience. There is a limit on the capacity for space in the user’s browser, the user’s device, and their bandwidth for downloading assets. We have to budget and be aware that we have a large asset to account for.
We’ll optimize our assets to make them smaller, in fact, we want them to be the smallest size possible to conserve both time and space. If our asset is too large, it could take too long to load, and our user may just leave, and never play League of Legends ever again. 😱
When do you analyze time and space costs in your current role? When do you consider tradeoffs? Tradeoffs can be between time and space, but they can also be between time and space and user experience, readability, or even business-y stuff like launch dates and quality.
Consider a Minimum Viable Product (MVP) discussion. You trade time for volume of work. You may determine that the MVP in the time limit is too degraded, so you are willing trade quality for time, and adjust the launch date and MVP to include a feature that will take longer to complete than the business originally hoped for.
Now that you know that you are already doing analysis on time and space, let’s go back to the terrifying notion of Big O.
Big just means biggest. In mathematical analysis, we say “leading-order term”. This sounds misleading, but if we go back to the idea that order in this context is referring to size, we can deduce that the leading-order term is the term with the highest magnitude. The term with the highest magnitude will then be the term with the largest size.
If we go back to the Zyra splash video, when we analyze how long the page takes to load, we’ll find that video may be the bottleneck. We want our pages to be fast and light, so we start optimizing at the bottleneck. We wouldn’t start optimizing with some small SVG icon, because the effect on the user’s load time would be negligible. If the time it takes to load the page is 5 seconds and the video takes 5 seconds to load, the only way we are going to reduce the page load time is by optimizing (or even removing) the video.
Do you see you’re already analyzing bottlenecks in the context of the size of time and space?
What kind of bottlenecks in your current role do you consider and optimize for? What kind of limits or constraints do you have when you are optimizing? Think about things like database sizes, third-party APIs, inherent timeouts in browsers and memory.
Consider the Zyra splash page. Let’s look at a number of assets that may be on the first load. Even though we can use dev tools to get the real data, let’s be theoretical, because the real data itself doesn’t matter in this context. We’re just looking for a relationship between the size of each asset.
- Video: 5 seconds, 20MB
- CSS: 200ms, 3KB
- JS: 600ms, 50KB
- 3rd Party: 8 seconds, 600KB
- HTML: 200ms, 500KB
What is the time bottleneck? What is the space bottleneck? Which is most important?
If our page takes too long to load, our users will leave. The page takes S seconds to load, but we know that all of these items are loading asynchronously. We only care about the bottleneck. The bottleneck in the context of time is 8 seconds. So the time complexity of Page Load is notated as O(3rd Party). A notation is a shorthand way of expressing some longer idea. Here, O(3rd Party) means “the bottleneck of Page Load is the 3rd Party asset”.
Let’s say we realized we can offload and cache the 3rd Party call to reduce its load time to 200ms. Great job! But our page still takes 5 seconds to load, and even worse, product just sent us a feature request to add more stuff to the page. 😩
The time complexity of Page Load was reduced to O(Video), but product wants to add a slider of videos for all of the different splash art pieces available for Zyra. Let’s say Zyra has six different splash art videos.
How did the time complexity of Page Load change, assuming we just load all of the videos on page load? 6 times! Oh no!! So now we have a time complexity of O(6 ✗ Video). Or do we?
We can’t reduce the number of videos we need to load (we’re ignoring that we don’t have to load them all at once), and we wouldn’t want to only optimize one of the videos. Our limit is still dependent on the time a single Video asset takes to load.
Let’s graph it.
The time increases linearly, which really just means that as the number videos increases, the amount of time it takes for the videos to load increases at a constant rate. This means that the time complexity of the bottleneck (a video) is linear. If we optimize a video to load 🔥 blazing fast 🔥at 5ms, does the graph change?
Nope! The graph is still linear. So the Page Load’s time complexity is always O(Video), as long as the Video remains the bottleneck. The co-efficient of Video, then, is irrelevant.
Big O notation is the shorthand representation of the most complex part of set of operations in the context of time or space.
So what about asymptotic? What kind of fancy mathematical 💩 is this? A value is said to be asymptotic if it approaches an infinite limit but never reaches it. When we say that we are doing asymptotic analysis of space and time complexity, we’re saying that we are looking for the worst possible runtime or space needed to complete some operation.
How do we calculate Big O notation?
Let’s apply this knowledge to problem solving in the context of writing algorithms. An algorithm is just the process used in solving a problem. It’s the functions you’re writing taking some input and outputting the solution.
Programming is a double-edged sword when it comes to problem solving. Often times there’s more than one way to get the correct output, which can be really empowering when you get it working for the first time (🎉), but really frustrating when you go into code review (😰). This is where analyzing the bottleneck before you put in a Pull Request can help you ahead of time.
Let’s consider a function for a feature I wrote that takes a set of data from all of your Github repositories.
The goal of using this data was to create a graph representing the amount of code I’ve written in Github per language, to display next to my resume.
So the data needed to be a sum of all the bytes of each language across all of my repositories with the name of the language, along with the sum of all bytes of all languages across all of the repositories.
The data I got back was by repository, and then by language. So I needed to write an algorithm to reduce the data to a single array of languages and a total.
It’s impossible for us to consider the complexity until we actually work through the problem’s solution, but we should start thinking of it immediately, so we can consider how we can reduce it if it’s not optimal, or worse, if it’s horrible.
If we consider a high level iteration of our problem, we can consider the runtime we need to try to work down from.
The first step is creating an array. This is very fast, and we’re only doing it once. This takes a constant time, so its time complexity is O(1). It’s never going to be our complexity bottleneck.
We know we’re going to have an array of size r (for repositories) which is the number of repositories that we will have to loop through at least one time, depending on how we code. Its time complexity then is linear, so looping through each repository takes O(r) time.
Each repository has an array of size l (for languages) which is the number of languages it contains. We have to loop through each of these one time, and perform an operation of placing it in the new array. While this sounds like O(l + 1), we will have to match it to a unique language in the new array, the size of which we don’t actually know, and we are completing each sum operation l times.
Let’s say that the new array is a size of s (for summed languages). In order to find the matching language, we’ll need to loop over the new array, and then we can perform the summing operation. We are completing s ✗ 1 operations for each l, so we have to multiply the two terms. For each l, and for each s complete 1 summing operation. So the complexity of this operation is O(l ✗ s ✗ 1), which simplifies to O(ls).
Next, we need the sum of all of the summed language sizes, which we can get by looping through the new array, and adding the sum of each language to a running total sum. This will then have a complexity of O(s).
So what is the total worst-case complexity of this theoretical algorithm? We have O(1) + O(r) ✗ O(ls) + O(s). We multiply the second and fourth terms because we are doing l ✗ s operations per r.
Here’s where things get imprecise and hand-wavy. 👋
We don’t actually care about l or s. Why? Because we only care about how the complexity compares to our input. How does our runtime or space requirement grow as the number of our repositories increases? Since it grows with the repositories, we only care about the size of r. In notation, we don’t need to be specific for the variable name, so we just refer to as the input as n. So now we can change this to O(1) + O(n) ✗ O(n²) + O(n).
If we were to graph this, we would say: for each n, increase the number of operations by n times n. We are also increasing linearly, but we can ignore the lower terms, as we know they are arbitrary in this case. Why? Again, we care only about the growth of our time or space requirements compared to our input. We can consider n³ and n³ + n + 1 to be roughly equal because n + 1 has very little impact on a cubed value. Let’s say n = 10. 10³ = 1000. 10³ + 10 + 1 = 1011. See?
The leading-order term then, the part with the biggest size, is n³. So our Big O is O(n³). We’ll write n ∈ O(n³). ∈ just means “element of”. Like repository ∈ repositories.
We don’t need to be precise, we just want to know roughly how the runtime (or space requirements) will increase as our input gets larger.
If you’re thinking that this looks pretty horrible, you’d be right. We can see that we probably don’t want to code the algorithm the way we’ve described, and we’ll want to consider whether or not we can reduce our complexity before we write any of it.
Sometimes it’s not possible to avoid this type of worst-case complexity, depending on the data you get in and what you need to get out. This solution is not scalable, but for a really small n, does it need to scale? No. However, in this case, we can get an unknown number of repositories. Let’s see if we can reduce.
So we’ve identified at least one place we can reduce n³, while we identified a second spot, we’re going to first focus on the line that we know we can change. How can we change it? The key is in the first question we asked. How else can we hold a list of data? If we know we are going to compare and compute a running sum, is there any data type that will help us? If you guessed a hash map, you’d be spot on!
Our third loop then can be done away with, and we can use our previous loop to build and access it.
Even though moving the last step into a previous loop won’t change the Big O notation, we should still do it. If it’s an optimization that doesn’t trade off readability or much space, that’s a choice we can make. Be aware though that the variable we create will be living for the entirety of the first loop, which has a few operations, so this may not be right decision for every runtime optimization.
Let’s calculate the new complexity.
- We create a hash map: O(1).
- We loop through each repository: O(n).
a. We loop through each language: O(n).
— Add name as key to hashmap and size as value, if it exists, add to size to current value O(1)
— Add value to running sum of total size O(1)
Now you can see we only multiply n times n once, and again, we drop the O(1) terms, and we can see that we have a big O of O(n²).
This is undoubtedly the best upper bound complexity we are going to get, but there’s a catch. We know that there’s a fixed number of languages, so as our input is lower, our number of operations may actually be higher than n², and as we go above the fixed number of languages, our number of operations will be lower than n². 🤨
So what does this mean? Well, the worst-case complexity is still O(n²), and we don’t really care how precise that is, but maybe we can still optimize our code. We could find the number of unique languages available on Github, sure, but we would drop any coefficient of n² we added anyway (because it’s negligible for numbers high above n compared to n), so it’s better just to look to optimize our code knowing this piece of information. We can’t really improve in the scope of this function without changing the shape of the data of our input, so we will just leave it at n ∈ O(n²) until we can change the bottleneck causing the exponential complexity.
O can be Big, but can it ever be little?
What about cases where we can optimize in scope, and the runtime or space requirements will never reach their Big O notation?
Let’s consider this example:
Some naive initial consideration here would be to take the difference (d) between x and y, and take the second power to the base of d and loop through iterations the size of the product + d, giving us a big O of O(n²). But are all of those iterations necessary?
Half + d of the iterations are just the inverse of an earlier iteration! So we can iterate on ½n². This is an optimization we won’t express in big O, because the difference between n to n² and n to ½n² is negligible as you approach infinity. Even though say, 1 trillion is a lot more than 500 billion, they are both astronomically larger than 1. 1 trillion is 1 trillion times larger than 1, and 500 billion is a half a trillion times larger than 1. Our upper bound worst-case complexity won’t change, so we’ll still drop the coefficient of ½.
We can, however, express that our function will never run at n² with little o notation. We can just write o(n²) and that will communicate that it will explicitly never reach the theoretical limit of n², without having to get precise.
What about Omega and Theta?
ω (little omega): Just like little o, what if a function cannot ever reach its theoretical minimum limit of Ω(1)? What if your data is pre-filled with at least 3 repositories with at least 1 language? n ∈ ω(1).
Θ (Big Theta): If we know both the Big Ω and Big O complexities, we have an upper and a lower bound, and we know our runtime will always be between them. What if n ∈ O(n) && n ∈ Ω(n)? Then our runtime must always be n, and we say n ∈ Θ(n).
Here are some resources on Big O notation that will take you more into the practice and the mathematics:
Hi there! This webpage covers the space and time Big-O complexities of common algorithms used in Computer Science. When…bigocheatsheet.com
Read and learn for free about the following article: Big-O notationwww.khanacademy.org
Big O Notation on Brilliant, the largest community of math and science problem solvers.brilliant.org