Golang Map

Angus
14 min readSep 23, 2019

--

什麼是Map?

In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.

Hash or Tree(Red Black Tree)

Golang Map實作

hmap
https://github.com/golang/go/blob/78e5288b5c720d996ea5132f1fa0348968ff0513/src/runtime/map.go#L115
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed
buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)
extra *mapextra // optional fields
}
  • bmap
// A bucket for a Go map.
type bmap struct {
tophash [bucketCnt]uint8
}
type bmap struct {
topbits [8]uint8
keys [8]keytype
values [8]valuetype
pad uintptr
overflow uintptr
}
  • hmap overview
  • 如果Key Value都不是指針 且size小於128 bmap會把它標示為不含指針,去避免GC,但overflow是指針,所以這時會把overflow放到mapextra
  • bmap overview: key/value/…/key/value v.s. key/key/../value/value/..

Make Map

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {

...
// allocate initial hash table
if h.B != 0 {
var nextOverflow *bmap
h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
if nextOverflow != nil {
h.extra = new(mapextra)
h.extra.nextOverflow = nextOverflow
}
}
return h
}
// runtime/slice.go
type slice struct {
array unsafe.Pointer
len int
cap int
}

Note: map is pointer, slice is struct, copy(map) vs copy(slice)

Hash function: 如果有aes則用 否則用memhash

// a copy of runtime.typeAlgtype 
typeAlg struct {
hash func(key unsafe.Pointer, seed uintptr) uintptr
equal func(unsafe.Pointer, unsafe.Pointer) bool
}

Key position 64bit

10010111|000011110110110010001111001010100010010110010101010│01010

B=5, bucket = 10 從(01010)來,

mapaccess1 https://github.com/golang/go/blob/9c384cc570fa964cea1fecc061b17d6858cbcc0d/src/runtime/map.go#L394

  1. 算出hash 跟拿對應的bucket
  2. 然後看bucket裡頭cell有沒有對應的tophash, 沒有就找overflow bucket找到沒有為止; 有emptyRest 來減少iterator次數
  3. tophash一樣在比較key 都ㄧ樣才return value; 這樣機制可以減少有些key大於128bit被轉成pointer的效能問題, t.indirectkey()
  4. 都沒有return unsafe.Pointer(&zeroVal[0])
// key 定位公式 
k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
if t.indirectkey() {
k = *((*unsafe.Pointer)(k))
}
// value 定位公式
v := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))

// 是否有被搬完
func evacuated(b *bmap) bool {
h := b.tophash[0]
return h > emptyOne && h < minTopHash
}
emptyRest = 0
emptyOne = 1 // this cell is empty
evacuatedX = 2 // Entry has been evacuated to first half
evacuatedY = 3 // same as above, but evacuated to second half
evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
minTopHash = 5 // minimum tophash for a normal filled cell.

Map 兩種get

value := m[key]
value, exist := m[key]
// src/runtime/hashmap.go
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool)

Map Scale

scale時機
loadFactor := count / (2^B)
太滿:loadFactor > 6.5
太長:if B < 16, then noverflow > 2^B || B >= 16 then noverflow > 2^15
if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
hashGrow(t, h)
goto again
// Growing the table invalidates everything, so try again
}
  1. 比較單純,就是空間不太夠,B << 1
  2. 太多Overflow buckets,但又剛好沒觸發1的條件,新增一個新的bucket把舊的直接搬過去
hashGrow 分配新的buckets
https://github.com/golang/go/blob/4e2b84ffc59610c380df3354239f037df4d3e2b2/src/runtime/map.go#L1017

growWork:真正的搬移
func growWork(t *maptype, h *hmap, bucket uintptr) {
// make sure we evacuate the oldbucket corresponding
// to the bucket we're about to use
evacuate(t, h, bucket&h.oldbucketmask())
// evacuate one more oldbucket to make progress on growing
if h.growing() {
evacuate(t, h, h.nevacuate)
}
}

evacuate

  • 每次只搬一個跟自己有關那個bucket, 搬完再多搬一個
  • 只會根據hash(多看一位)去分桶XY, 有時擴容也可能都在某一邊沒有解決collision問題
  • 理解map key是無序的
  • 紀錄tophash 得知此bucket搬過沒, evacuatedX, evacuatedY
  • 整個過程不能concurrent 所以才有 “concurrent map write error”

Overflow example

Map iterator

  • 正常就是將buckets一個一個看 然後cells一個一個看 再看有沒有overflow bucket
hiter iterator結構就是隨機找一個bucket然後再隨機找一個cell 

https://github.com/golang/go/blob/4e2b84ffc59610c380df3354239f037df4d3e2b2/src/runtime/map.go#L164
func mapiterinit(t *maptype, h *hmap, it *hiter)
https://github.com/golang/go/blob/78e5288b5c720d996ea5132f1fa0348968ff0513/src/runtime/map.go#L797
// decide where to start
r := uintptr(fastrand())
if h.B > 31-bucketCntBits {
r += uintptr(fastrand()) << 31
}
it.startBucket = r & bucketMask(h.B)
it.offset = uint8(r >> h.B & (bucketCnt - 1))
  • 如果再擴容的時候怎辦?
假設startbucket=3, offset=2, 順序就是3 -> 0 -> 1 -> 2// 檢查oldbucket是否搬過, 因為般的過程會set這些值, 搬完的就直接看新的
func evacuated(b *bmap) bool {
h := b.tophash[0]
return h > empty && h < minTopHash
}
bucket 3找完就繼續找overflow 找到 e,f,g
然後找下一個bucket 0
因為沒被搬過 所以我們找oldbucket 0且未來會被放到new bucket 0的 b,c
然後bucket 1=> h
bucket 2找oldbucket 0, 未來會放到bucket 2的 a,d

Map set

mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer
https://github.com/golang/go/blob/4e2b84ffc59610c380df3354239f037df4d3e2b2/src/runtime/map.go#L571
1. if h.flags&hashWriting != 0 {
throw("concurrent map writes")
}
2. 必須要在bucket搬移完成後才能assign
2. 找到bucket然後找每個cell來比較可以assign就用 否則去overflow bucket
3. 找到key value位置, inserti 和 insertk 分别指向第一個empty的空間
5. 最後再檢查是否需要擴容
6. 更新tophash跟key
7. 回傳value address

Map Delete

mapdelete前面部分跟set都差不多 找到key, value後直接assign nil

Q: golang map serach時間複雜度是多少?

Q: 什麼情況下用 HashMap 跟 Tree Map?

Q: Golang map scale可能會有什麼問題?

Ref:

TODO: sync.Map https://colobu.com/2017/07/11/dive-into-sync-Map/

--

--