i0exception
Published in

i0exception

Iterating over maps in Go

While the Go programming language specification only states that the iteration order over maps is not guaranteed to be the same across invocations, Go maps in action goes a step further to say that the order is randomized. So, when someone at work asked how I would design a set of integers that returned a random entry on every get , I suggested this neat trick

Turns out that this approach is incorrect because, while it returns a “random” number on every get, the probability for every element is not the same.

To test this implementation, let’s actually fill up a map with some values and see the distribution over a million runs.

This is the output you get on running this

That’s good, right? The distribution of each number is roughly equal. Let’s change the numbers a bit and see what happens. For the next run, I added the numbers 0 to 4 instead.

While the counts for 1 , 2 and 3 are roughly the same, 0 occurs almost 5 times as often. A truly random distribution would have been around 250000 occurrences of each number.

To explain this anomaly, it’s important to understand how maps are implemented in go. Unsurprisingly, maps are implemented using go. The map.go file in src/runtime contains the common parts of the implementation (there are some optimized map implementations for common types like integers and strings). The comments in map.go help lay out the structure of a map

Let’s take a look at what happens when you’re iterating over a map. If you disassemble the for loop, you’ll see something like this.

The call to mapiterinit is what sets up the iterator and then calls the mapiternext function to get the first element in the map. Here’s the part of the code in mapiterinit that actually computes where to start iterating —

We generate a random number using fastrand() and then use it to get the starting bucket and a random offset within that bucket (remember, maps in go are implemented as an array of buckets with 8 elements in each bucket). mapiternext then iterates over the elements to return the first valid entity — while doing so, it skips over any empty ones

Because the element we start with could be empty, the probability of getting a valid element is actually dependent on the number of empty buckets and elements immediately preceding it. For example, if there is 1 bucket with 2 valid entities like in the example below —

We’ll get 10 if we start with elements 0, 1 or 2 and 20 if we start with 3, 4, 5, 6 or 7. So the perceived probability of getting a 10 is 3/8 and for 20 is 5/8 .

While this was a toy problem that I was trying to solve, the broader learning for me was to not base solutions on ones interpretation of library documentation. It’s almost always a good idea to test how things behave in practice even if the documentation feels clear and correct.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store