7 ways to implement a Bit set in Go

9 min readAug 17, 2020

Some data is best modeled as a bit set. For example, the essential information about which students successfully passed a test, out of 3000 students, consists in 3000 bits (375 bytes only).

It turns out that Go doesn’t have a Bitset type in the standard library!

Abstract type: interface

First, let’s say that a Bitset can be any type that implements these 3 methods:

type Bitset interface {
    GetBit(i int) bool
    SetBit(i int, value bool)
    Len() int
}

It is not necessary in general to start with an interface type. However, it has some advantages and in our case will help comparing different implementations.

The Len method returns a number of bits, however its exact semantics is left to the choice of each implementation. We’ll focus more on GetBitand Setbit, and on fixed-sized bitsets.

Concrete implementations

Slice of booleans

A []boolis a straightforward choice.

type BitsetBool []boolfunc (b BitsetBool) GetBit(i int) bool {
    return b[i]
}func (b BitsetBool) SetBit(i int, value bool) {
    b[i] = value
}func (b BitsetBool) Len() int {
    return len(b)
}

We create trivial methods in order to satisfy the Bitset interface.

See the source on GitHub.

This works well when the size is set at creation and remains fixed:

bs := make(BitsetBool, M)

This implementation is not compact, as it takes up 1 byte per bit in memory, which is an 8x overhead.

However, it has advantages:

very simple to implement
simple to reason about
fast data access

Dynamic slice of booleans

type BitsetBoolDyn []boolfunc (b *BitsetBoolDyn) SetBit(i int, value bool) {
    if i >= len(*b) {
        b.grow(1 + i)
    }
    (*b)[i] = value
}
...
...

See the full source.

The same underlying type []bool can be used to define a bitset type able to automatically grow when needed. To achieve this, the methods have a pointer receiver, as they need to update the slice header when growing. Also, bound checks are added to GetBit and SetBit, as the given index is not expected to always be within current bounds.

Map of booleans

type BitsetMap map[int]bool

See the full source.

A map of bool is a natural dynamic Bitset. It must be created before adding elements, with an optional size hint:

bs := make(BitsetMap)

bs := make(BitsetMap, M)

An interesting property is that if the Bitset is sparse (lots of false values), then the memory taken up by the map will be proportional to the number of true values, not to the potentially big highest index of a true value.

Big Integer

*big.Int implements the basic Bitset functionality, and automatically grows when needed.

See this source which encapsulates a *big.Int.

It is internally backed by a []uint, and its implementation details are more about arithmetic than about storing raw bits.

Slice of bytes

type BitsetUint8 []uint8func NewUint8(n int) BitsetUint8 {
    return make(BitsetUint8, (n+7)/8)
}func (b BitsetUint8) GetBit(index int) bool {
    pos := index / 8
    j := index % 8
    return (b[pos] & (uint8(1) << j)) != 0
}
...
...

See the full source. In Go, the types byteand uint8 are strictly equivalent.

Bytes are the smallest addressable units in Go, so this implementation is compact: each byte is encoding 8 bits, and all bytes are contiguous in memory.

Each Get or Set access requires a small computation to locate the correct byte, and access the correct bit inside the byte.

Slice of 64-bit integers

type BitsetUint64 []uint64func NewUint64(n int) BitsetUint64 {
   return make(BitsetUint64, (n+63)/64)
}
...
...

See the full source.

Modern computers are often 64-bit platforms, so it makes sense to try to achieve compactness and speed by storing data in 64-bit words (8 bytes).

The implementation logic is the same as for []uint8.

Package willf/bitset

import wb "github.com/willf/bitset"type BitsetWillf struct {
    wb.BitSet
}

See the source encapsulating willf’s data structure.

This is a popular, ready-to-use bitset package. It has many useful features (popcount, union, etc.).

The Bitset is implemented internally as a slice of uint64 plus an exact length in bits.

Benchmark

The source of my benchmarks are on GitHub: the 4 filenames containing “bench”.

As always with benchmarks, don’t take the results too seriously. YMMV.

Perf of write operations

Slice types []bool, []uint8, []uint64 are the fastest structures to write to. They are slightly faster than willf.Bitset, but also less sophisticated (they are fixed-sized, while willf’s embeds some auto-extend code that might not be inlined).

Big integers and maps of booleans, though nice and easy to use, are slower.

Perf of read operations

The read operations numbers are more similar to each other. They are faster than write operations.

It turns out that []uint8 and []uint64 have the same memory compactness and the same read and write performance numbers. No 64-bit premium gain after all!

Cost of the Bitset interface abstraction

It may be more efficient to use a specific Bitset’s “raw” concrete type in a tight loop for reading or writing bits, than the Bitset interface type. Consider the difference between BenchmarkUint8IfcWrite and BenchmarkUint8RawWrite:

var M = 100000func NewUint8(n int) BitsetUint8 {
    return make(BitsetUint8, (n+7)/8)
}func BenchmarkUint8IfcWrite(b *testing.B) {
    bs := NewUint8(M)
    benchmarkWrite(bs, b, M)
}// Helper benchmark func, compatible with all implementations
func benchmarkWrite(bs Bitset, b *testing.B, n int) {
    for i := 0; i < b.N; i++ {
        for j := 1; j < n; j += 7 {
            bs.SetBit(j, true)
        }
        for j := 1; j < n; j += 7 {
            bs.SetBit(j, false)
        }
    }
}func BenchmarkUint8RawWrite(b *testing.B) {
    bs := NewUint8(M)
    for i := 0; i < b.N; i++ {
        for j := 1; j < M; j += 7 {
            bs.SetBit(j, true)
        }
        for j := 1; j < M; j += 7 {
            bs.SetBit(j, false)
        }
    }
}

For slices of bytes, the “raw” concrete version is 2x as fast as the “ifc” version.

For slices of booleans, the “raw” concrete version is 5x as fast as the “ifc” version!

The interface abstraction has a high cost when:

each method does a very small amount of work,
methods are called on an interface variable in a tight loop, forcing the conversion to the concrete type over and over,
compiler optimizations such as inlining, constant propagation, bounds check elimination, are not possible at all without knowing the concrete type.

In the two famous interfaces Reader and Writer from package io:

type Reader interface {
    Read(p []byte) (n int, err error)
}type Writer interface {
    Write(p []byte) (n int, err error)
}

the methods take a slice as argument, so it’s possible to have them do a large amount of work in a single call. This is not the case with our GetBit, SetBit abstractions which deal with a single bit at a time.

When benchmarking the “raw” concrete type BitsetUint8,

...
./bitset_bench_write_raw_test.go:76:13: inlining call to BitsetUint8.SetBit method(BitsetUint8) func(int, bool) { pos := index / 8; j := uint(index % 8); if value { b[pos] |= uint8(1) << j } else { b[pos] &= ^(uint8(1) << j) } }
...

the short methods GetBit and SetBit are inlined, which saves the cost of a function call. This in turns lets the compiler perform other optimizations, possibly discovering that all slice indices are inferior to M, and eliminating some bound checks.

If we want to benchmark each concrete type, we need to write the benchmark code for each of them, as it’s not possible to just call the helper func benchmarkWrite, which would skew the results.

This is actually an important lesson: if performance is really important to you, and you’re calling an interface method millions of times in a tight loop, then you should consider working with a concrete type instead.

Usually though, it’s fine and useful to abstract some behaviors in an interface type.

Fixed size or dynamic growth

The simple versions of the slice types ([]bool, []uint8, []uint64) are assumed to have fixed size at creation time.

All other implementations are dynamic: *big.Int, map[int]bool, willf.Bitset.

Turning a “fixed-sized” slice type into a dynamically growing slice type is not very complex: use a pointer receiver in methods, and add explicit bound checks.

In my benchmark results, I reckon that the overhead of the bound checks is actually very small.

Using the fixed-size version or the dynamic version would depend on the intended use case, as very often a fixed-size bitset is really what we need.

One use case needs special attention, and a trade-off:

either growing for index i is implemented by allocating a new slice to hold (i+1) bits. This is a “compact” way of growing, however if you start with an empty slice, and then flip some n bits incrementally from lower indices to large indices, you will end up with an extremely slow O(n²) runtime complexity. This must be avoided.
or growing for index i is implemented with append, which “reserves” ~25% of extra unused capacity, in order to achieve a much more sensible cost O(n) when flipping n bits.

Go versions

It’s easy and useful to install several versions of Go on one’s computer (example in bash):

for i in {10..15}; do
  go get golang.org/dl/go1.$i
  go1.$i download
done

Now I can run the benchmark on 6 major releases of Go, and look for differences.

Full benchmark run of Go 1.10 up to Go 1.15 — Smaller is better

First, most of the lines are dwarfed by 2 outliers, which are “writing bits to *big.Int in Go 1.10”. This part seems to have been tremendously optimized in Go 1.11!

Let’s zoom in.

The 2 lines above 400μs/op are “writing bits to map[int]bool”.

So far, we know that big ints and maps are not the most efficient for writing.

Let’s zoom in further.

While the chart still looks a bit messy, we can notice many peaks on Go 1.11, as if this specific release was less efficient than the previous ones and than the subsequent ones.

If we focus on Go1.12 and above, then we notice less dramatic changes from one Go version to another.

Benchmark run of Go 1.12 up to Go 1.15 (*big.Int and map[int]bool are cropped by zooming)

Comparing Go releases with regard to a benchmark is interesting because relying too much on the performance shape of some niche implementation details sometimes lead to “fragile” micro-optimizations, that may no longer hold with the next major release of the compiler.

Conclusion

A Bitset is conceptually a simple object, that can be implemented using one of several backing types.

Using a fundamental data structure is not always about importing a package, it’s also fine to write one’s own implementation. One important thing is to understand the trade-offs.

Not everything is big data, and not every use case needs peak runtime performance or perfect memory compactness. However, being legible and intelligible is always extremely valuable, so if you need a Bitset, my advice is to choose one that’s simple to use and hard to misuse. I have a preference for the slice of booleans. Your choice may be different :)

What can we do with a Bitset? Bloom Filters, of course! That would be a story for another post…

7 ways to implement a Bit set in Go

Abstract type: interface

Concrete implementations

Slice of booleans

Dynamic slice of booleans

Map of booleans

Big Integer

Slice of bytes

Slice of 64-bit integers

Package willf/bitset

Benchmark

Perf of write operations

Perf of read operations

Cost of the Bitset interface abstraction

Go versions

Conclusion

Written by Val Deleplace