Iterating over maps in Go

Aniruddha
Aniruddha
Jul 27, 2019 · 4 min read

While the Go programming language specification only states that the iteration order over maps is not guaranteed to be the same across invocations, Go maps in action goes a step further to say that the order is randomized. So, when someone at work asked how I would design a set of integers that returned a random entry on every get , I suggested this neat trick

type intSet map[int]struct{}func (s intSet) put(v int) {
s[v] = struct{}{}
}
func (s intSet) get() (int, bool) {
for k := range s {
return k, true
}
return 0, false
}

Turns out that this approach is incorrect because, while it returns a “random” number on every get, the probability for every element is not the same.

To test this implementation, let’s actually fill up a map with some values and see the distribution over a million runs.

func main() {
s := make(intSet)
for i := 0; i < 8; i++ {
s.put(i)
}
counts := make(map[int]int)
for i := 0; i < 1024*1024; i++ {
v, ok := s.get()
if !ok {
return
}
counts[v]++
}
for k, v := range counts {
fmt.Printf("Value: %v, Count: %v\n", k, v)
}
}

This is the output you get on running this

code|⇒ ./code
Value: 1, Count: 131026
Value: 7, Count: 130957
Value: 3, Count: 131064
Value: 5, Count: 131288
Value: 2, Count: 131080
Value: 0, Count: 130813
Value: 4, Count: 131137
Value: 6, Count: 131211

That’s good, right? The distribution of each number is roughly equal. Let’s change the numbers a bit and see what happens. For the next run, I added the numbers 0 to 4 instead.

code|⇒ ./code
Value: 1, Count: 131175
Value: 2, Count: 131593
Value: 3, Count: 130904
Value: 0, Count: 654904

While the counts for 1 , 2 and 3 are roughly the same, 0 occurs almost 5 times as often. A truly random distribution would have been around 250000 occurrences of each number.

To explain this anomaly, it’s important to understand how maps are implemented in go. Unsurprisingly, maps are implemented using go. The map.go file in src/runtime contains the common parts of the implementation (there are some optimized map implementations for common types like integers and strings). The comments in map.go help lay out the structure of a map

// A map is just a hash table. The data is arranged
// into an array of buckets. Each bucket contains up to
// 8 key/value pairs. The low-order bits of the hash are
// used to select a bucket. Each bucket contains a few
// high-order bits of each hash to distinguish the entries
// within a single bucket.
//
// If more than 8 keys hash to a bucket, we chain on
// extra buckets.

Let’s take a look at what happens when you’re iterating over a map. If you disassemble the for loop, you’ll see something like this.

TEXT main.intSet.get(SB) /home/aniruddha/code/main.go
...
main.go:10 0x488e56 4889442408 MOVQ AX, 0x8(SP)
main.go:10 0x488e5b 488d442418 LEAQ 0x18(SP), AX
main.go:10 0x488e60 4889442410 MOVQ AX, 0x10(SP)
main.go:10 0x488e65 e8763df8ff CALL runtime.mapiterinit(SB)
main.go:10 0x488e6a 488b442418 MOVQ 0x18(SP), AX
...
main.go:11 0x488e93 c3 RET

The call to mapiterinit is what sets up the iterator and then calls the mapiternext function to get the first element in the map. Here’s the part of the code in mapiterinit that actually computes where to start iterating —

r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
it.startBucket = r & bucketMask(h.B)
it.offset = uint8(r >> h.B & (bucketCnt - 1))
it.bucket = it.startBucket

We generate a random number using fastrand() and then use it to get the starting bucket and a random offset within that bucket (remember, maps in go are implemented as an array of buckets with 8 elements in each bucket). mapiternext then iterates over the elements to return the first valid entity — while doing so, it skips over any empty ones

for ; i < bucketCnt; i++ {
offi := (i + it.offset) & (bucketCnt - 1)
if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
// TODO: emptyRest is hard to use here, as we start iterating
// in the middle of a bucket. It's feasible, just tricky.
continue
}
...
}

Because the element we start with could be empty, the probability of getting a valid element is actually dependent on the number of empty buckets and elements immediately preceding it. For example, if there is 1 bucket with 2 valid entities like in the example below —

[NULL, NULL, 10, NULL, NULL, NULL, NULL, 20]

We’ll get 10 if we start with elements 0, 1 or 2 and 20 if we start with 3, 4, 5, 6 or 7. So the perceived probability of getting a 10 is 3/8 and for 20 is 5/8 .

While this was a toy problem that I was trying to solve, the broader learning for me was to not base solutions on ones interpretation of library documentation. It’s almost always a good idea to test how things behave in practice even if the documentation feels clear and correct.

i0exception

Wut?

Aniruddha

Written by

Aniruddha

Currently, eng @mixpanel. Previously @twitter, @google

i0exception

Wut?

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade