Update for those just arriving. bboozoo over at Reddit has done some great analysis on this situation. It appears that the performance is caused by memory caching; maps are less-efficiently cached than slices, and as they grow and are less able to fit in the cache, their performance decreases. The article is below left intact for your interest.
I’m in my second year at the University of Toronto. In our data structures class, we, like most of you, had to know the asymptotic efficiency of various types of data structures. Among these, binary search in a sorted list runs in O(log(N)) time, and lookups in a map run in constant O(1).
For one situation in an application I’m developing, we wanted to see if a number was in some list (and Bloom filters were not appropriate). Additionally, we wanted to assign that number a consecutive index. Like a naive student, I thought that since our dataset would be reasonably large, a map[int]uint would be more efficient than a sorted slice. Maps are constant time, after all! Right? Actually, no. At least, not in Go.
My name is Connor, and I optimize prematurely. Cue the AA chorus, “Hi Connor.” Well, I try not to, but… I was benchmarking my implementation of this described data set versus a binary search implementation. And something odd appeared.
The source and raw dataset for the above graph are available on my Github. The test was conducted running Go 1.4 on a Windows machine. Care was taken to avoid counting the initial generation time (the timer was reset) or GC pauses (GC was disabled and called manually before the timer was reset in each test).
I happened to have an interview a few days after discovering this phenomenon. The interviewer and I threw around some ideas; I suggested it could be due to collision checking is larger maps, he mentioned branch optimization as a possible culprit. Let’s go to the source and see!
Why is this happening?
A hashmap stores a value that corresponds to a unique key. Almost all implementations allow you to use strings as keys, so that’s what we’ll go with. Say I have a phonebook, and I want to map peoples’ names (keys) to telephone numbers (values).
When I add an entry into this map, the person’s name is put through a hash function, which creates an unsigned integer. The integer corresponds to a “bucket” in memory — a place for records to be stored . It allows the program to access that record by skipping to a certain memory address, without having a scan through and compare every record. This algorithm allows a perfect hash to work in O(1), rather than dumb search in O(N) time.
Alas, the world is not perfect. When you hash a person’s name, data is lost — a name that is 20 bytes long might be represented by a hash that’s 8 bytes long. Therefore, one inevitably ends up with collisions, when two keys share the same hash.
One method of conflict resolution is to put all conflicting records into a single bucket storing them as a list, then scan those records to pull out the target.
Now, let’s dive into Go’s algorithm and fun internals. We’ll be taking a look at mapaccess1 here.
First off, we calculate the offset of a bucket using the hashed value of our key. Each bucket is statically allocated to hold eight hash “tops.” These are smaller, 8-bit hashes of used for a quick comparison before comparing the entire hash of each key. Go iterates through each top in the bucket. If it sees one that matches, it calculates the full address of the hash/value pair in memory. If the full hash matches, then the value is returned (line 310). It gets away with only storing eight tops per bucket by simply evacuating old buckets when the map grows, and copying memory as needed.
So, why is this happening? Honestly, I’m not entirely sure; there’s nothing glaringly wrong, and indeed Go seems well-optimized for map lookups (at the expense of insertion/creation time).
Maybe as map hashes get tightly packed, the “top” hashes become more similar, and more comparisons have to be done to get the target value. If this is the case there should be a plateau in lookup time at some point, though I was unable to bench more than 10⁸ elements due to memory constraints. Maybe the interviewer was correct in that bad branch optimization causes a performance hit. Maybe a bit of both. If anyone finds a better solution (or the solution) feel free to Tweet to me at @ConnorPeet.
Update: @dgryski suggested that the processor cache could overshadow the effects of the algorithm. In an attempt to defeat this, I tested lookups by randomizing the searched index. This had very detrimental effects on the binary search time, but the map performance and shape of the graph was largely unchanged. Plot at imgur.
Update 2: There’s been some discussion in the Reddit thread tossing around hypotheses for why this is occurring. Currently it looks like results might be influenced by the processor cache, but in a different way than what I understood Gryski to mean. I might write a follow-up post in a few days.
Disclaimer: this rather inflammatory claim that Go’s maps are not O(1) comes purely from the above benchmarks and data, not formal proofs or analysis. It may be that somewhere beyond a billion entries it does indeed plateau and exhibit constant time performance, but for less fewer entries that is not the exhibited behaviour. Your input and corrections are welcome!