Speed up data access in Elixir

Working with data structures was straightforward and predictable in a days of Erlang. Most of the time you were getting exactly what you expect.

Elixir is a different beast. It allows more ways to work with maps, lists and keyword lists. Turns out, performance-wise some of them are radically different than others.

First, some nice graphs of access time depending on structure size and then comes explanation. Raw data and benchmarking code is in the end of the article.

access time for structures of size 1–128 elements
access time for structures of size 1–16 elements

Accessing Maps

There are a few ways to access maps in Elixir.

* Using pattern matching (the fastest)

# %{^key => value} = map
map = %{foo: 1, bar: 2}
%{^:foo => foo_value} = map

Using pattern matching in case or in function’s clause head is as fast.

* Using Map.fetch (slower)

Map.fetch is slower because it not just matches the value, but also creates extra tuple.

map = %{foo: 1, bar: 2}
{:ok, foo_value} = Map.fetch map, :foo

* Using Access module (much slower)

map = %{foo: 1, bar: 2}
foo_value = map[:foo]

This is the most obscure and non-intuitive behavior in Elixir. If you coming from most other programming languages, you will immediately recognize [] notion as key/index access and most probably use it.

In Elixir [] allow you to do much more, for example addressing and updating all elements in map — I’ll show that in next article. But this comes with huge cost. [] is more than two times slower than pattern matching.

Accessing keyword lists

Erlang and Elixir, in turn, use lists with names tuples to pass around options. Before introduction of maps in Erlang it was only feasible way to have data structure with dynamic properties.

keyword_list = [ {:foo, 1}, {:bar, 2} ] 
# or compact form
keyword_list = [ foo: 1, bar: 2 ]

Because this is a list, accessing elements in it requires scan to find necessary element. So access speed is O(n) and you do not expected really use keyword lists with more than couple of elements.

* Using custom functions to speed-up lookups (fastest)

If you know that keyword list you will be using will be limited to 3–4 values, you may want to have own faster lookup function to get values from the keyword list.

def kw_list_access(kw_list, key, default\\nil) do
case kw_list do
[{h, v}|_] -> v
[_, {h, v}|_] -> v
[_, _, {h, v}|_] -> v
[_, _, _, {h, v}|_] -> v
_ -> Keyword.get kw_list, key, default
end
end

* Using Keyword.get (slow)

This is about 2 times slower than using custom function.


keyword_list = [ foo: 1, bar: 2 ]
foo_value = Keyword.get keyword_list, :foo

You can also supply default value to Keyword.get list, :foo, :default in case if key is not found in the list.

* Using Access module (slowest)

The same story as with maps. Access module is great to provide extra functionality, but it 3.5 times slower compared to custom function and 1.3 times slower compared to Keyword.get.

keyword_list = [ foo: 1, bar: 2 ]
foo_value = keyword_list[:foo]

Tuples

I will cover tuple access vs maps and lists in other article. Tuples are blazingly fast, but little bit less flexible.

How to benchmark

Clone repository from my github and run

mix bench bench/access_vs_map_fetch_vs_match.exs

Warning: If you are using any modern computer, your OS constantly changes processor speed in order to save energy and keep it cool and quiet. This is great to every day work, but terrible for benchmarks. Before running benchmarks manually set processor to some fixed speed. For example Debian instruction is here.

# show current settings
grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Number in test name means how much elements were in the tested structure (1,2,3, …). Uncomment other lines for more elements in map/kwlist/etc.

  • baseline is just function call without any processing done at all, just to see how much function call costs for benchfella.
  • match is test of %{^key => value} lookup in map
  • map_fetch is a call to Map.fetch for map :)
  • kw_get_unroll - manually unrolled function for lookup in keyword lists up to 4 elements big, then switch to Keyword.get
  • kw.get - is a call to Keyword.get on keyword list
  • access_kw is a 'kw_list[index]` call
  • access_map is a 'map[index]` call

So you can see that access_kw1 is at least 2 times slower than manual kw_get_unroll1 or match1.

I know about flaws of microbenchmarks, but I also did optimization in real code based on this input and had significant speed ups. Access module is nice, but (very) slow. case map or matching in function head are the fastest.


Raw data


## AccessVsMapFetchVsMatch

benchmark name iterations average time
baseline2 100000000 0.04 µs/op
baseline128 100000000 0.04 µs/op
baseline8 100000000 0.04 µs/op
baseline4 100000000 0.04 µs/op
baseline3 100000000 0.04 µs/op
baseline1 100000000 0.04 µs/op
baseline16 100000000 0.04 µs/op
baseline32 100000000 0.04 µs/op
baseline64 100000000 0.04 µs/op
kw_get_unroll1 100000000 0.07 µs/op
match1 100000000 0.07 µs/op
map_fetch1 100000000 0.09 µs/op
match2 10000000 0.11 µs/op
kw_get_unroll2 10000000 0.11 µs/op
kw_get1 10000000 0.13 µs/op
kw_get_unroll3 10000000 0.14 µs/op
match3 10000000 0.14 µs/op
access_map1 10000000 0.15 µs/op
kw_get_unroll4 10000000 0.16 µs/op
map_fetch2 10000000 0.16 µs/op
access_kw1 10000000 0.17 µs/op
match4 10000000 0.17 µs/op
map_fetch3 10000000 0.22 µs/op
kw_get2 10000000 0.24 µs/op
kw_get_unroll8 10000000 0.26 µs/op
map_fetch4 10000000 0.28 µs/op
access_map2 10000000 0.28 µs/op
kw_get3 10000000 0.32 µs/op
match8 10000000 0.33 µs/op
access_kw2 10000000 0.39 µs/op
access_map3 10000000 0.39 µs/op
kw_get_unroll16 10000000 0.45 µs/op
kw_get4 10000000 0.45 µs/op
access_kw3 10000000 0.46 µs/op
access_map4 10000000 0.51 µs/op
map_fetch8 10000000 0.53 µs/op
access_kw4 10000000 0.71 µs/op
match16 10000000 0.76 µs/op
kw_get_unroll32 10000000 0.85 µs/op
kw_get8 10000000 0.89 µs/op
access_map8 1000000 1.04 µs/op
map_fetch16 1000000 1.14 µs/op
access_kw8 1000000 1.44 µs/op
kw_get_unroll64 1000000 1.66 µs/op
kw_get16 1000000 1.79 µs/op
match32 1000000 2.08 µs/op
access_map16 1000000 2.72 µs/op
access_kw16 500000 2.81 µs/op
map_fetch32 1000000 2.81 µs/op
kw_get_unroll128 500000 3.20 µs/op
kw_get32 500000 4.49 µs/op
match64 500000 5.20 µs/op
access_kw32 500000 6.20 µs/op
map_fetch64 500000 6.66 µs/op
access_map32 500000 6.97 µs/op
match128 200000 8.70 µs/op
map_fetch128 100000 11.49 µs/op
kw_get64 100000 11.52 µs/op
access_map64 100000 12.01 µs/op
access_kw64 100000 15.42 µs/op
access_map128 100000 22.48 µs/op
kw_get128 50000 31.74 µs/op
access_kw128 50000 37.11 µs/op