Published in

i0exception

# Iterating over maps in Go

While the Go programming language specification only states that the iteration order over maps is not guaranteed to be the same across invocations, Go maps in action goes a step further to say that the order is randomized. So, when someone at work asked how I would design a set of integers that returned a random entry on every `get` , I suggested this neat trick

`type intSet map[int]struct{}func (s intSet) put(v int) {        s[v] = struct{}{}}func (s intSet) get() (int, bool) {        for k := range s {                return k, true        }        return 0, false}`

Turns out that this approach is incorrect because, while it returns a “random” number on every get, the probability for every element is not the same.

To test this implementation, let’s actually fill up a map with some values and see the distribution over a million runs.

`func main() {        s := make(intSet)        for i := 0; i < 8; i++ {                s.put(i)        }        counts := make(map[int]int)        for i := 0; i < 1024*1024; i++ {                v, ok := s.get()                if !ok {                        return                }                counts[v]++        }        for k, v := range counts {                fmt.Printf("Value: %v, Count: %v\n", k, v)        }}`

This is the output you get on running this

`code|⇒ ./codeValue: 1, Count: 131026Value: 7, Count: 130957Value: 3, Count: 131064Value: 5, Count: 131288Value: 2, Count: 131080Value: 0, Count: 130813Value: 4, Count: 131137Value: 6, Count: 131211`

That’s good, right? The distribution of each number is roughly equal. Let’s change the numbers a bit and see what happens. For the next run, I added the numbers `0 to 4` instead.

`code|⇒ ./codeValue: 1, Count: 131175Value: 2, Count: 131593Value: 3, Count: 130904Value: 0, Count: 654904`

While the counts for `1` , `2` and `3` are roughly the same, `0` occurs almost 5 times as often. A truly random distribution would have been around `250000` occurrences of each number.

To explain this anomaly, it’s important to understand how maps are implemented in `go`. Unsurprisingly, maps are implemented using `go`. The `map.go` file in `src/runtime` contains the common parts of the implementation (there are some optimized map implementations for common types like integers and strings). The comments in `map.go` help lay out the structure of a map

`// A map is just a hash table. The data is arranged// into an array of buckets. Each bucket contains up to// 8 key/value pairs. The low-order bits of the hash are// used to select a bucket. Each bucket contains a few// high-order bits of each hash to distinguish the entries// within a single bucket.//// If more than 8 keys hash to a bucket, we chain on// extra buckets.`

Let’s take a look at what happens when you’re iterating over a map. If you disassemble the for loop, you’ll see something like this.

`TEXT main.intSet.get(SB) /home/aniruddha/code/main.go  ...  main.go:10  0x488e56  4889442408   MOVQ AX, 0x8(SP)  main.go:10  0x488e5b  488d442418   LEAQ 0x18(SP), AX  main.go:10  0x488e60  4889442410   MOVQ AX, 0x10(SP)  main.go:10  0x488e65  e8763df8ff   CALL runtime.mapiterinit(SB)  main.go:10  0x488e6a  488b442418   MOVQ 0x18(SP), AX  ...  main.go:11  0x488e93  c3    RET`

The call to `mapiterinit` is what sets up the iterator and then calls the `mapiternext` function to get the first element in the map. Here’s the part of the code in `mapiterinit` that actually computes where to start iterating —

`r := uintptr(fastrand())if h.B > 31-bucketCntBits {  r += uintptr(fastrand()) << 31}it.startBucket = r & bucketMask(h.B)it.offset = uint8(r >> h.B & (bucketCnt - 1))it.bucket = it.startBucket`

We generate a random number using `fastrand()` and then use it to get the starting bucket and a random offset within that bucket (remember, maps in go are implemented as an array of buckets with 8 elements in each bucket). `mapiternext` then iterates over the elements to return the first valid entity — while doing so, it skips over any empty ones

`for ; i < bucketCnt; i++ {  offi := (i + it.offset) & (bucketCnt - 1)  if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {    // TODO: emptyRest is hard to use here, as we start iterating    // in the middle of a bucket. It's feasible, just tricky.        continue  }  ...}`

Because the element we start with could be empty, the probability of getting a valid element is actually dependent on the number of empty buckets and elements immediately preceding it. For example, if there is 1 bucket with 2 valid entities like in the example below —

`[NULL, NULL, 10, NULL, NULL, NULL, NULL, 20]`

We’ll get 10 if we start with elements 0, 1 or 2 and 20 if we start with 3, 4, 5, 6 or 7. So the perceived probability of getting a 10 is `3/8` and for 20 is `5/8` .

While this was a toy problem that I was trying to solve, the broader learning for me was to not base solutions on ones interpretation of library documentation. It’s almost always a good idea to test how things behave in practice even if the documentation feels clear and correct.

--

--

Wut?