Algorithms In Context #1: Greedy Algorithms

Can Bayar
The Startup
Published in
6 min readSep 9, 2020
Photo by Fotis Fotopoulos on Unsplash

If someone asked me to describe what a software engineer does in a single word, my answer would be optimization. The fact is most of the time we don’t look for the best solution and we don’t look for the cheapest either, but we look for the optimum one. Sacrificing your time for the best solution is a noble cause, but there may come times where the best solution is overkill and you don’t always have the resources (time, money, etc.) to go after your ideals. In fact, you don’t always need the best solution either, just a fairly good one might suffice. I hereby welcome you to the first chapter of the series: greedy algorithms…

What is a greedy algorithm? The book definition says a greedy algorithm makes the optimal choice at each step in order to find the globally optimal solution to the problem. What this British old man pretentiously tells you while smoking his pipe (that’s how they are in my twisted imagination) is actually a very simple idea. At any given point on your algorithm, instead of searching through all possible solutions and picking the best one, just pick the one that seems the best. Then on the next step, pick the one that seems the best again. Rinse and repeat until you reach the end of the road and just hope that your locally optimal choices lead you to the global optimum.

Everything up to this point was quite intangible but it gives the idea behind greedy algorithms. Soak it in and let’s dive into a real-life example.

Activity Selection Problem

This is a very well-known problem, almost a cliche. You are given a series of activities, each having a determined start and finish time. They share a common resource (your time maybe?) and some of them obviously overlap. Your task is to select the activity subset with the maximum cardinality, in which there are no overlaps.

There are given ten activities with start and finish times in the table above. Subset {2, 4, 8} does not contain any overlaps while subset {5, 6} does. We need to make greedy choices to find the biggest feasible subset. Here are some possible greedy approaches:

  • Choose the earliest starting activity first
  • Choose the earliest finishing activity first
  • Choose the shortest activity first

We defined three very simple greedy approaches to the problem. Possible approaches to a problem usually require nothing more than a good intuition. Some are much harder than others but the idea is the same.

To me, it is obvious to always choose the activity that leaves the resource available for as many other activities as possible. Then, let’s start with the third approach. First, you need to sort the activities by their execution times. Then iterate through the activities and if the current activity starts after the previous one finishes, add it into the result set. Here is the C++ code, assuming the activities are already sorted.

vector<int> findMaximumSubset(vector<Activity> activities) {

vector<int> result;
result.push_back(0);
for (size_t i = 1; i < activities.size(); i++) {
if (activities[i].start >= activities[i - 1].finish) {
result.push_back(i);
}
}
return result;
}

Note that the earliest activity comes first when two activities have the same execution time. This is a very simple algorithm and running it produces the subset {2, 4, 8} for the given table. But wait, the optimal solution is subset {1, 4, 7, 10} and our algorithm was not able to find it!

Worry not, let’s try another approach. This time we will choose the earliest finishing activity first. The source code for the second approach is exactly the same as the previous one and the only thing that changes is the sorting step. This time, we sort the activities by their finish times and execute the same algorithm. For the activities with the same finish times, the earliest starting activity comes first.

When you execute the code, you will see that it actually produces the subset {1, 4, 7, 10}, which incidentally is the optimal solution. Obviously, both approaches were using greedy algorithms but the first one didn’t give us the optimal solution. There might be times it is okay not to find the optimal solution but why do that, when you can reach the optimal one so easily?

We can actually prove that the earliest finishing activity first approach always produces the best result. I will try not to use the complex proof language and make it as simple as possible but if you are still not interested in proofs, just skip the next paragraph.

Assume you have found a feasible subset of activities and they are sorted in the earliest activity first manner so that each activity starts after all the previous ones end. For any given activity in the subset, if there exists another activity that also starts after all the previous activities are finished and it also finishes before the given activity in your current subset, you can replace it with the given activity in your subset. When you do that, the size of the subset stays the same at worst and it has a possibility to increase.

That’s it for the activity selection problem. But before we are done with this post, I want to discuss another problem.

Knapsack Problem

In this section, we will discuss the knapsack problem. More specifically the fractional knapsack problem. In the knapsack problem, you are a thief with a bag that can carry at most 10 kg (all hail to the metric system) and you are going to shoplift. For some weird reason, you are stealing flour, sugar and salt - maybe you want to slowly kill yourself by consuming unhealthy food for a long time. Your purpose is to make your bag as valuable as possible.

I believe the greedy nature of this problem is very obvious. Steal the most valuable ingredient first until there is no more left and move on to the next until your bag is full. Thus, if the corresponding prices for flour, sugar and salt is 10, 8 and 6 dollars per kg (I have no idea about the actual prices) and each of them has 4 kg of supply available, you would but 4 kg of flour, then 4 kg of sugar and finally 2 kg of salt.

Let’s change the problem a little bit. After a few years of shoplifting, you now want to play in the big kids league and in order to prove yourself, you will participate in a bank job! Oblivious of the unfortunate events awaiting, you have entered the vault of the bank with your faithful 10 kg bag. In the vault, there is one gold, one silver and one platinum ingot with 4 kg, 6 kg and 2 kg of weights correspondingly. Each gold ingot worth 100 dollars, silver worth 120 dollars and platinum worth 60 dollars. So platinum is the most valuable, then gold and then silver when their price calculated per kg.

If we apply the same strategy we were using back in the shoplifting days, we need to take platinum ingot first since it is the most valuable per kg. Then we take the gold ingot, which gives us 160 dollars in total for 6 kg of our storage capacity. We cannot take the silver because we only have 4 kg capacity left in our bag. For the optimum solution, we should have taken gold and silver ingots, which would worth 220 dollars in total for 10 kg of bag capacity.

You can try different approaches but you won’t be able to find a greedy algorithm that yields the optimal solution for this problem. You might be satisfied with the profit you made and don’t want the optimal solution, which is totally okay. Then again if you are a little like me and want more, I’ll see you in the next section: Dynamic Programming

--

--

Can Bayar
The Startup

Senior Software Engineer, Rock Vocalist & Guitarist