A Priori Analysis of Algorithms
Algorithm Analysis — Part 3
Hello, and welcome back to the Practical Guide to Algorithm Analysis.
This is Part 3 of the series, and I hope by this post, you will get to learn the fundamental techniques to read and analyse simple iterative algorithms that doesn’t do anything! Don’t worry, we’ll get to touch meaningful algorithms later in the series.
Also in this post, we’ll get to touch some of the maths from the previous post –so be sure to go back there when you feel you’re lost in the maths.
So, the technique that I’m going to teach you in this post is called frequency counting, which is a form of a priori analysis, and the goal is to count how frequently a line of code is executed. The basic idea is that the higher the frequency (the higher the line needs execution), the longer time that a computer needs in order to complete the work.
As programmers, we aim to create programs which run fast, and consume as little memory as possible. There are two properties that needs emphasis here: time, and space — which leads to what you typically read in algorithm books as: the time complexity and the space complexity. For now as to give you a basic understanding, let’s drop the complexity part of time and space.
In order for the program to run fast, it should perform lesser work as possible. This is where frequency counting comes in. The higher the frequency count, the more work that the computer needs to get done, the more time it takes for the computer to complete the work.
So, before we rush into the action, let’s first get familiar some of the basic concepts of frequency counting:
1. Any line of code that performs any operation, as long as the code isn’t a function definition or a loop, is assumed to perform in constant time — counted as 1 execution.
2. Any user input are assumed to have an arbitrary size, and will be represented by n. Any operation performed n times, either by a loop or a function, are counted as n executions.
Okay ready?
Now, let’s get down to business!
Consider the following snippet:
snippet1:(1) for i <- 1 to n:
(2) for j <- 1 to n:
(3) print(i,j)
Fancy that loop? Nope? Okay.
What we are interested in is to analyse the time complexity of the snippet above to determine how well it performs (either fast or slow) — by counting the number of times each line executes (frequency count), and having an estimation based on the number of times the entire loop is executed (Big-Oh).
Starting with line (1), for i <- 1 to n
. How many times will this execute?
For beginners, if it’s not quite obvious, we can rewrite the line as:
(1) for (i=1; i<=n; i++) // rewritten: for i <- 1 to n
Starting from i=1, this loop will execute n times.
However, there will be 1 more iteration which will evaluate i<=n
as false, thereby exiting the loop. So, the total number of executions is: n+1.
Using the same technique we used above, line (2)for j <- 1 to n
is evaluated to also have n+1 number of executions.
However, notice that loop j in line (2) is nested inside the loop i in line (1)? So loop j will be repeated n times.
In total: line (2) will be executed (n+1)(n) times.
Line (3) is executed 1 time (as per the basic concept #1 indicated above). However, line 3 is written inside two nested loops, and must be evaluated accordingly. In total: line (3) will execute (1)(n)(n) times. The first n is from the inner loop j, and the second n is from the outer loop i.
In summary, the execution of each line is as follows:
Which yields a sum of:
As an estimate, snippet1 will have a worst-case performance of:
Never heard of Big-Oh before? Don’t worry. It’s not as scary as it sounds. I will be covering BigOh, as well as other asymptotic notations, in the next part of the series. For now, consider Big-Oh as the worst case estimation of the code.
At worst, the loop in snippet 1 will behave n-squared times given the size n. Which means, if we have the size n=3, the loop is estimated to perform 9 times. The Big-Oh of n-squared is also known as quadratic time — more into these in the next part of the series.
The first ones easy! Okay, another example:
snippet2:(1) for i <- 1 to n:
(2) for j <- 1 to i:
(3) print(i,j)
Line (1) is obviously n+1. Since i<=n
is evaluated n times, plus the 1 last iteration that evaluates i<=n
as false, telling the loop to stop.
Line (2) is a little bit tricky. If you evaluated it as (n+1)(n), you’re wrong. Notice that loop j is dependent not of n but of i. Yet, i is a variable which initially has a value of 1 to n. So, to write an estimation, we need to write it in summation form as follows:
You may wonder why loop j is evaluated to be the sum of i+1. The +1 refers to the 1 iteration that evaluates j<=i
as false, thereby exiting loop j.
Take time to digest it, because we will be using some of the arithmetic series formulas from the mathematical preliminaries in part 2 of this series.
We need to determine the frequency count of line 2 where we can use it later to add with lines 1 and 3. In order to do so, we need to simplify the summation as follows:
Using the techniques we learned so far, we can evaluate line (3), as follows:
Which, if simplified, yields to:
In summary, the execution of each line is as follows:
Which yields a sum of:
As an estimate, snippet2 will have a worst case performance of:
If it’s not quite obvious yet, the Big-Oh is just the largest term in the equation. Yep.
Okay, so far we’ve covered loops with arbitrary repetitions n. Now, let’s try something concrete and quite simpler…
snippet3:(1) for i <- 1 to 100:
(2) for j <- 1 to 50:
(3) print(i,j)
Using the same technique we used from the previous examples, line (1) can be evaluated to execute 100+1 times.
Line (2): loop j is evaluated to execute 50+1 times, and repeated 100 times by the outer loop i. Which is: (50+1)(100) = 5100.
Line (3) is evaluated to execute 1 time, repeated by 50 times by loop j, and 100 times by the loop i. Which is: (1)(50)(100) = 5000.
The sum of the frequency count per line yields: f(n)=10201 total. Since f(n) is a constant, the worst case estimation is O(1) — also known as constant time.
Regardless of how large f(n) is, as long as it represent a constant number, will always be written as O(1) to represent constant time estimation.
How about the combination of something concrete and arbitrary?
Okay, here we go! I’ll be going fast this time.
snippet4:(1) for i <- 1 to n:
(2) for j <- 1 to 100:
(3) print(i,j)
Line (1) executes n+1 times.
Line (2) executes 100+1 times, repeated by the loop i for n times. Which is: (100+1)(n)=101n.
Line (3) executes 1 time, repeated by the loop j for 100 times and n times by the loop i. Which is: (1)(100)(n)=100n.
The sum of the frequency count per line yields: 202n+1; with a worst-case estimation of O(n).
We’ve been going easy for a while.
Did you notice that the loop examples above always starts with 1? What if we change the lower bound of the i loop of any number other than 1?
Let’s try and do that.
snippet5:(1) for i <- 4 to n:
(2) for j <- 1 to n:
(3) print(i,j)
As you see, the loop i in line (1) starts at 4. We can use this handy formula to evaluate line (1):
The lower bound of loop i is 4, and the upper bound is n. Using the formula, we can acquire the frequency count of line (1) as:
Line (2) is pretty straightforward. The line is executed n+1 times, repeated by n-3 times by the loop i.
Wait, why is the loop j repeated n-3 times, when clearly loop i was formerly evaluated n-2 times prior?
Remember that we have a +1 as the loop exit in loop i? If n-2 has included a +1 as an exit, then excluding it to count the number of repetitions inside the loop will yield: n-2-1=n-3.
Going back to line (2), the frequency count is:
Line 3 is executed 1 time, repeated n times by the j loop, and n-3 times by the i loop. Which is:
Summing up the frequency count for each line yields:
With a worst estimation of:
If you’re into more adventure, we can also try changing the lower bound of the inner loop j.
snippet6:(1) for i <- 1 to 2n
(2) for j <- 5 to i
(3) print(i,j)
Line (1) is simple as you should be familiar on it by now. The line gets executed 2n+1 times.
You know where that +1 was from right? Great.
Line (2) involves a lot of tricks. If you think of summations, you’re correct! But before we use summations, we first need to determine how many times j repeats itself with respect to i when j’s lower bound is 5.
Remember the handy formula? Guess t’was handy after all.
Now, we can’t assume i-3 as the frequency count as i is a variable from 1 to n. So we need to write the frequency count of line (2) in summation form, which is written as follows:
In order to sum it later with the rest of the code, we need to simplify the summation:
Evaluating line (3) will be a tad challenging. Line (3) executes 1 time, repeated by i-4 times by the inner j loop, and 2n times by the outer i loop. Since j depends on i which is a variable, the frequency count should be written in summation form. Which should be:
By simplifying the summation, we get:
In summary, the execution of each line is as follows:
Summing the frequency count of each line yields:
At worst, the code performs:
Fascinating, yes?
In part 4 of this series, we will be covering different asymptotic notations such as Big-Oh, Big-Omega, Big-Theta, Little-Oh, and Little-Omega. After which, we will explore analysing:
1. Three-nested loops.
2. Loops whose lower bound is a variable instead of a constant.
3. Reversed (descending) loops.
So far, we are discussing analysis on simple iterative algorithms. We will discuss asymptotic notations in part 4, including more examples in loop analysis.
Further, we will tackle how to analyse recursive algorithms that are famous to do divide and conquer approach (which solves a large problem by breaking it down into smaller, manageable sub-problems).
Only after which we can discuss the analysis of algorithms which does something like search and sort.
Until then, have a break, and take a good nap.
A long post deserves a potato.
That’s it for now!
Cheers!