Photo by lee bernd on Unsplash

Min Heaps in Go (Golang)

Gabriel G Baciu
The Startup
Published in
8 min readJun 14, 2020

--

Algorithmic runtime complexity is something that I’ve got to understand quite well most recently by solving LeetCode problems. The nice thing about the LeetCode platform is that it benchmarks your code against other people's, and tells you where your solutions sit in terms of runtime and memory usage. That is important because if you are competitive in nature, as I am, and want to get better at what you are doing, you oftentimes won't be satisfied with a solution that is in the middle of the pack, but perhaps would want to push yourself to further optimize your solution.

So I came across the LeetCode 759 problem, in which you are given a bunch of employee schedules and you have to figure out the finite times when everyone’s free. For example, if you have two employees in an organization and one works from 1 to 3, and the other from 4 to 6, you’d have about 1 hour, from 3 to 4, when everyone is free. If the schedules overlap, for example, 1–3 and 2–5, then you won't have any free time when everyone is available.

I immediately recognized a pattern, as I had solved a similar problem before, or so I thought at the time, which is “Merge Intervals”, LeetCode 56. Basically, my thought process was that if I merged all the overlapping intervals, I would get a list of mutually exclusive time ranges, and then I would just loop through that and collect the spaces in between. So I did that and it did solve the problem, but to my surprise, my runtime was beating only 28.57% of Go submissions, which means that 71% of people solved it better from a runtime perspective.

Here is the first solution:

type Interval struct {
Start int
End int
}

func employeeFreeTime(schedule [][]*Interval) []*Interval {

// Flatten the interval array
var flatI []*Interval

for i := 0; i < len(schedule); i++ {
flatI = append(flatI, schedule[i]...)
}

// sort the flatten list
sort.Slice(flatI, func(i, j int) bool {
return flatI[i].Start < flatI[j].Start
})

// determine if two intervals are overlapping
overlap := func(a, b *Interval) bool {
if a.Start > b.End {
return false
}
if b.Start > a.End {
return false
}
return true
}

var merged []*Interval
merged = append(merged, flatI[0])

// merge all overlapping schedules
for i := 1; i < len(flatI); i++ {
a := flatI[i]
b := merged[len(merged)-1]

if overlap(a, b) {
b.Start = int(math.Min(float64(a.Start), float64(b.Start)))
b.End = int(math.Max(float64(a.End), float64(b.End)))
} else {
merged = append(merged, a)
}
}

var solution []*Interval

// if all work on same common schedule then return
if len(merged) == 1 {

return []*Interval{}

} else {

// push in solution the delta between overlapping intervals
for i := 0; i < len(merged); i++ {

if i + 1 < len(merged) {
solution = append(solution, &Interval{
Start: merged[i].End,
End: merged[i+1].Start,
})
}
}

}

return solution
}

Although I was happy that I got to solve it fast enough, I was not pleased with the runtime rank and the fact that 71% of the people who wrote Golang provided a better solution. So I peeked at the solution, and they were suggesting to use a Heap. Both a Min or a Max Heap would work in this case.

Ok, so what is a Min Heap? A Min Heap is a binary tree-based data structure, where the values of all the children of a particular node are greater than or equal to the value of that node. In other words, the value of each node would always be less than or equal to the value of all its children.

You would use a Min Heap, for example, if you wanted to efficiently retrieve O(1) the minimum element from a list without traversing the entire list all the time O(n). The insertion operation in a Min Heap is worst-case O(log n), but after that initial expense, you always have an O(1) retrieval of the min element. The same is true for Max Heap for that matter, where you would be able to retrieve max element in O(1), constant time.

Ok, so now back to our problem: why would a Min Heap be a solution or an optimization? It’s because the optimal solution for this problem is to take two intervals at a time, with the lowest start dates (two min-heap pop operations), and see if they overlap, and if they don't, then collect the delta as free time and repeat the process.

The lines that were inefficient in my solution were:

   // sort the flatten list
sort.Slice(flatI, func(i, j int) bool {
return flatI[i].Start < flatI[j].Start
})

It’s because basically I casually insert an O(n log n) sorting operation. If we used a min-heap, we would have a worst-case a (n * heap push operation) = O(n log n). Given that each employee’s time intervals are sorted in ascending order as we are getting higher and higher start numbers, the heap insert operation becomes closer to O(1) than to O(log n).

The heap insert operation becomes less expensive with each employee schedule because every time a node is inserted into a min-heap, there are a series of swap operations, i.e. if the inserted node is smaller than its parent, then the new node and its parent are swapped, and this continues up until the tree fulfills the min-heap criteria. As the calendar start values become higher, the fewer swaps we need to do, to the point that we could get close to O(1) insert time on the highest values.

Here’s the new solution:

package main

import (
"container/heap"
)

// Slice of Intervals
type IntHeap []*Interval

func (h IntHeap) Len() int {
return len(h)
}
func (h IntHeap) Less(i, j int) bool {
return h[i].Start < h[j].Start
}
func (h IntHeap) Swap(i, j int) {
h[i], h[j] = h[j], h[i]
}

func (h *IntHeap) Push(x interface{}) {
// Push and Pop use pointer receivers because they modify the slice's length,
// not just its contents.
*h = append(*h, x.(*Interval))
}

func (h *IntHeap) Pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}

func (h *IntHeap) Peek() interface{} {
old := *h
n := len(old)
x := old[n-1]
return x
}

func employeeFreeTime3(schedule [][]*Interval) []*Interval {

// Declare the heap structure so we insert intervals in sorted order
var IntervalHeap IntHeap
heap.Init(&IntervalHeap)

// Flatten the interval array
var flatI []*Interval
for i := 0; i < len(schedule); i++ {
flatI = append(flatI, schedule[i]...)
}

for i := 0; i < len(flatI); i++ {
heap.Push(&IntervalHeap, flatI[i])
}

// determine if two intervals are overlapping
overlap := func(a, b *Interval) bool {
if a.Start > b.End {
return false
}
if b.Start > a.End {
return false
}
return true
}

temp := heap.Pop(&IntervalHeap).(*Interval)
var result []*Interval

for IntervalHeap.Len() > 0 {
a := temp
b := heap.Pop(&IntervalHeap).(*Interval)

// if intervals overlap
if overlap(a, b) {

// take as temp the one with the higher end
if a.End < b.End {
temp = b
}

// id they don't overlap, then we need to record delta
} else {
result = append(result, &Interval{
Start: a.End,
End: b.Start,
})
temp = b
}
}

return result
}

And here’s the result:

And the runtime distribution:

Ok, much better. Still not in the 99 percentile, as I’d prefer for every solution, but closer to the 90 percentile, from both memory usage and a runtime standpoint, and I was fairly happy with this optimization. The algorithm did not change that much; the main thing that changed is the data structure from a regular Go Slice to a Min Heap, with the latest solution.

Another thing I wanted to talk about in this blog post is the “no magic” ethos of Golang and how that pertains to Heaps. Go has in the standard library a package called “container/heap.” Basically, if you declare a structure that conforms with a heap interface (Less, Len, Swap, Push, Pop), the heap container gives you helper methods to operate on the heap. Things like Init, Push, Pop. This means that one would need to write a lot of boilerplate code to provide all these methods and create a custom data structure, when in fact, the most important thing that is needed is the custom comparator (the Less function).

Languages like Java allow you to use a Priority Queue for example, where only a custom comparator is provided on initialization. There’s no need to implement a specific interface, the initial data structure needs to comply with a certain shape. In Python, you have heapq.heapify that transforms a list into a heap in constant time, as long as it is a list of tuples.

This is where the explicitness of Golang works a little against its purpose, as the language, in this case, is more verbose, less compact, for the sake of a “no magic approach”, instead of providing a simple heap constructor that takes a Slice, and perhaps a custom comparator, and that’s all. Perhaps authors of the language would have this in their backlog when they design the next version of the language.

There is a very fine line between high-level languages that use a lot of abstractions, where developers use just higher-order components, but they do not understand the under the hood implementation and at the other end languages like Go, where a lot of the implementation is exposed by being explicit and declaring specific types and specific interfaces if one wants to benefit of general-purpose utility methods. In this case, the language authors traded off productivity for the sake of legibility and transparency. It is one of the features of the language that makes it interesting.

It’s well known at this point to people that worked in Go before that it does not have Generics. That means that if you want to declare a method and you don’t really know the type of the arguments ahead of time, you would need to take the arguments as empty interfaces, i.e. interface{}. At runtime, you need to do type assertion and cast that variable into the type that you think it is.

The Heap interface, on the Push and Pop methods, is one such case where the structures in the interface can be anything, so the interface is very explicit in taking in and returning an empty interface{}. The interface does not allow you to return the specific type that you want. In my case, for the LeetCode 759 problem, I wanted to Push and Pop Intervals{}, but the Heap interface required the empty interface{} instead. That means that every time I pop or push something in my Heap, I need to type assert it into an Interval type. Again, this is where the Golang is explicit and transparent, but a bit rigid and verbose.

I know Golang v2.0 is around the corner and Generics was one of the most highly demanded features of the language so far, and the authors will implement it, so looking forward to that.

--

--

Gabriel G Baciu
The Startup

Software engineer, leader, entrepreneur, lately focusing on the science of leadership and success, healthy living, and low latency software systems.