How we adopted partial ordered_set in Erlang

Max Lapshin
5 min readDec 13, 2018

--

Erlang offers and even forces programmer to use concept of messages for distributing computations and sharing data between computation units: processes.

It is good to speak about “one process is sending message to another”, but programmers use to call methods on objects and Erlang does it excellently, much better than usual OOP languages can do: with serialised and timeoutable calls.

Standard module gen_server (I don’t know how many red eyes and white hairs it cost to developers, but it is really polished) gives a good way for a process to ask another process for some data. All calls will be serialised, so C++ developer can think about a process like about an object, protected with per-instance mutex (in fact there are many mutexes to make things work scalable).

There is very important property of such gen_server inter-process calls: they are serialised, so it means that if process (object) A is calling process (object) B and B need to ask A for some data, they will get into dead lock. Erlang will auto resolve this deadlock soon, but it will crash both processes.

If process B has data in memory, but right now is busy with writing something to disk, then process A will have to stand in queue and wait his turn to ask B for data.

So we have a problem: data is in memory, but process that holds it, may be very busy and all his callers will get blocked. This is why we sometimes in Flussonic replace gen_server calls with ETS access.

ETS is an in-memory database with off-heap storage of data. It is very fast. No, it is VERY fast (I’m repeating wrong word “fast” but in high loaded Erlang application we should not use this word, because multicore world with software and hardware locks has another very deep meaning of this word).

Simplicity ends when modifier process needs to modify several rows at once. Ets do not offer transactions, so if one process modifies several rows, other process may see intermediate state. We have either to use mnesia, either to do some tricks.

Another issue with ets that it do not offers secondary indexes, so we either use mnesia, either use tricks.

So: non-atomic adding/modification/deleting of keys and no secondary indexes. How can we live with it?

We have found in documentation briefly documented feature of ordered_set. Ets allows to choose primary index nature: hash table or binary tree. Ordered set is a bit slower (again this word) on very large amounts of data because hash table promises us constant access time and binary table offers logarithmic growth of access time. However logarithm is growing so slowly, that no one should not worry about its “slowness”.

What is our data do we store and what is the trick? We are recording video on disk and video frames are collected together in video segments with different durations: from 1 to 10 seconds depending on user settings. Video source produces about 6–60 segments per minute and 400–3600 segments per hour. All hour segments are written into 1-hour big files on disk: they are loaded and deleted from disk together at once. Each segment has it’s timestamp of the beginning (in UTC seconds). All segments are grouped by different stream names.

We need to load segments for a requested time from disk, store them in memory and cache for a while to reduce disk reads. If I was writing in SQL, I would create an index on (name, hour(timestamp), timestamp). If I was writing in C or writing complex structure in erlang, it would be a multilayered tree: first group data by stream names, then by hour, then by timestamp.

The cool thing about ordered_set is that you can do such index with ordered set. We are writing primary key in ets in format {Name, Hour, Timestamp}

Here starts some magic. How can we select all segments for one hour?

We need to make a ets:select with matchspec where all first elements in tuple are filled with static values. It will automatically reduce all further searches to a small subtree that lives inside ordered_set.

Here we have some problem: matchspec must be filled with static values and is that a living man cannot fill matchspec properly without ets:fun2ms, so we make small trick:

match_spec_hour(Name, Hour, Return) ->
[{Head,Cond,_}] = ets:fun2ms(fun
(#segment{utc={_,_,_}} = Segment) ->
true
end),
Head1 = Head#segment{utc={Name,Hour,’_’}},
Out2 = case Return of
true -> true;
segment -> ‘$_’
end,
[{Head1,Cond,[Out2]}].
read_hour_segments(Name, Hour) ->
From = minute:timestamp(Hour) div 3600,
MS = match_spec_range_hour(Name, From, segment),
Segments = ets:select(stab(Name), MS),
[Seg#segment{utc = UTC} ||
#segment{utc = {_,_,UTC}} = Seg
<- Segments].

What is happening here? We create match spec with ets:fun2ms parse_transform function and we are sure about it’s structure: [{Head, Condition, Return}] Then we are rewriting Head. fun2ms is very convenient here, because it will fill automatically all other fields of structure and if we modify header and change structure, this code will be recompiled without attracting our attention to it.

It is important that we put into match spec static values. Usual fun2ms do not allow to do it, because it creates matched variables for internal function. Such hack allows us to create match spec with some value without knowing it on compile time.

Then we return true (for deleting) or whole segment (for fetching). One function for deleting or fetching whole hour in near-constant time.

With such approach we can drop a whole branch of tree when we unload whole stream from memory or only unload some hour blocks and we can select either whole hour at once, either make a ranged select which is very highly optimised in ets:select :

read_segments(Name, From, To) ->
MS = match_spec_range(Name, From, To, segment),
Segments = ets:select(stab(Name), MS),
[Seg#segment{opened_at = UTC} ||
#segment{opened_at = {_,_,UTC}} = Seg
<- Segments].
match_spec_range(Name, From, To, Return) ->
[{Head,Cond,_}] = ets:fun2ms(fun(#segment{opened_at={_,_,Time}} = Segment) when Time >= From andalso Time < To ->
true
end),
Head1 = Head#segment{opened_at={Name,'_','$1'}},
Out2 = case Return of
true -> true;
segment -> '$_'
end,
[{Head1,Cond,[Out2]}].

It was required to modify a bit previous match_spec_range to fullfill requirements of enabling fast access. By word “fast” I mean avoiding fullscan of ets: when we have million of records in memory, we have to carefully use indexes, full scan with plain ets:select will take too much time.

Conclusion

Erlang ets is a simple memory tuple storage, however it has some features that distinguish it from dump key-value storage. Use them!

--

--