Beyond the Limits of ETS

Brujo Benavides
Erlang Battleground
7 min readDec 20, 2016

--

I recently gave an introductory Erlang course at Cali, Colombia. The students there were by far the most inquisitive ones I ever had. This article started as one of their multiple What if…? questions.

To boldly go where no man has gone before — Original from Pinterest

The Lesson: ETS

The lesson I was delivering that day was about ETS tables. So eventually I get to the point where I explain how to use ets:first/1 and ets:next/1 to iterate over the keys of a table.

For those who don’t know, ets:first/1

Returns the first key Key in table Tab. For an ordered_set table, the first key in Erlang term order is returned. For other table types, the first key according to the internal order of the table is returned. If the table is empty, ‘$end_of_table’ is returned.

…while ets:next/2

Returns the next key Key2, following key Key1 in table Tab. For table type ordered_set, the next key in Erlang term order is returned. For other table types, the next key according to the internal order of the table is returned. If no next key exists, ‘$end_of_table’ is returned.

The Question

As with a couple of other classical OTP functions (more examples below), the students spotted an edge case very easily. Precisely after I have shown my slide and before I could even get to the console to show them how these two functions can be used in a dummy example, one of them asked…

What happens if ‘$end_of_table’ is one of the keys in my table?

It didn’t actually take me by surprise. I’ve made the same kind of observation myself more than once. Each time I find a function that returns some special result (in this case ‘$end_of_table’) to indicate an abnormal status (in this case the fact that there is no next key), I wonder: what if that result is not so special after all? But even when I had seen this special result, I never checked what happened in this particular scenario.

My response to the student was, of course: Let’s see! And what was going to be a boring recursive function showing how to use ets:first/1 and ets:next/2 turned out to be a nice live coding session that eventually morphed into this blog post.

So What Happens?

Covering the Bases

First, we need to check the basic stuff:

1> spawn(fun() -> ets:new(wat, [named_table, public]), receive _ -> ok end end).
<0.63.0>
2> ets:first(wat).
'$end_of_table'
3> ets:next(wat, '$end_of_table').
** exception error: bad argument
in function ets:next/2
called as ets:next(wat,'$end_of_table')
6> ets:next(wat, a).
** exception error: bad argument
in function ets:next/2
called as ets:next(wat,a)
6> ets:insert(wat, {a, 1}).
true
7> ets:first(wat).
a
8> ets:next(wat, a).
'$end_of_table'
9> ets:next(wat, '$end_of_table').
** exception error: bad argument
in function ets:next/2
called as ets:next(wat,'$end_of_table')
10>

We first spawn a process to hold a public table called wat. We do that to avoid exceptions in the console killing our table (because ets tables are linked to the process who created them).

Then we check that everything works as expected:

  • If you ask for the first key of an empty table, you get ‘$end_of_table’.
  • If you ask for the next key of an inexistent key, you get an exception.
  • if you ask for the next key of the last key in your table, you get ‘$end_of_table’.

Adding the Strange Key

So far, so good. Now, let’s try with our crazy key ‘$end_of_table’ atom…

10> ets:insert(wat, {'$end_of_table', 2}).
true
11> ets:first(wat).
'$end_of_table'

12> ets:next(wat, '$end_of_table').
a
13> ets:next(wat, a).
'$end_of_table'
14>

Suddenly, our table is empty! (At least, according to the docs, right?). Well, to be fair, now ‘$end_of_table’ is actually a key in our table. So, what happens if we ask for the next key after it? Of course, we get the next one: a. And after a we get ‘$end_of_table’, since a is, in fact, the last key in that table.

Making Things Even More Complex

Then again, why is ‘$end_of_table’ the first key and not a? That is actually related to the internal order of the table (as stated in the docs). Let’s add a key that occupies a lower position than ‘$end_of_table’ to our table:

14> ets:insert(wat, {'_a', 0}).
true
15> ets:first(wat).
'_a'
16> ets:next(wat, '_a').
'$end_of_table'
17> ets:next(wat, '$end_of_table').
a
18> ets:next(wat, a).
'$end_of_table'
19>

As you can see, there is no way to be sure that ‘$end_of_table’ will be the first or the last element in your table, which might have been convenient (e.g. if you want to create a ring).

The Impact

Well, now that we’ve played for a bit, let’s see what can actually happen if we add an element to an ets table with ‘$end_of_table’ as the key…

There might be more, but I would like to show you 2 ways in which you can use ets:first/1 and ets:next/2 to traverse an ets table. Check them out in this handy module:

For matching/1,2 we let the iterator end when it finds the value ‘$end_of_table’ instead of a key.

For catching/1,2 we don’t rely on that value, we determine we reached the end of the table when there is no next key (that is, when ets:next/2 fails with a badarg error).

In the simplest scenario, both work as expected:

1> spawn(fun() -> ets:new(wat, [named_table, public]), receive _ -> ok end end).
<0.129.0>
2> iterator:catching(wat).
done
3> iterator:matching(wat).
done
4> ets:insert(wat, [{'_a', 0}, {b, 1}, {c, 2}]).
true
5> iterator:matching(wat).
'_a': [{'_a',0}]
b: [{b,1}]
c: [{c,2}]
done
6> iterator:catching(wat).
'_a': [{'_a',0}]
b: [{b,1}]
c: [{c,2}]
done
7>

Once we add ‘$end_of_table’, none of them work anymore:

7> ets:insert(wat, {'$end_of_table', 3}).
true
8> iterator:matching(wat).
'_a': [{'_a',0}]
done
9> iterator:catching(wat).
'_a': [{'_a',0}]
'$end_of_table': [{'$end_of_table',3}]
b: [{b,1}]
c: [{c,2}]
'$end_of_table': [{'$end_of_table',3}]
b: [{b,1}]
c: [{c,2}]
'$end_of_table': [{'$end_of_table',3}]
b: [{b,1}]

So, if you want to break a system and somehow you find out its turning user input into atoms, using those atoms as keys on a ets table and later traversing it using ets:first/1 and ets:next/2, now you know how to break it.

What’s going on here?

I’m sure this time the underlying situation is pretty clear, but I would like to remark a couple of things that are happening, just in case:

First of all, ets:next/1 is not an iterator. It doesn’t work like ets:match/1 or ets:select/1 where there is a Continuation object that moves through the results of a match or select. ets:next/1 knows nothing about the actual data on the table or the purpose you have in mind when using it. This function just returns the key that goes after the one you passed in as a parameter in the table’s index. ets:next/1 does not check if the Key you provided is actually ‘$end_of_table’ or not, and that’s correct: it should not do that, since that’s not its responsibility.

The main problem here comes from a valid result for both ets:first/1 and ets:next/2 that has two perfectly valid meanings: ‘$end_of_table’ can both mean “the key ‘$end_of_table’” and “The key you provided is the last one on the table”. It’s a semantic overload.

How to Solve the Problem?

Semantic overload is a problem that affects many other functions, not just the ones in the ets module. And there are a couple of ways to deal with it.

Tagging Results

The simplest one is to tag your results. ets:first/1 and ets:next/2 instead of replying with just the keys, could use tuples to differentiate keys from semantically charged atoms. In other words instead of their current specs…

-spec first(tab()) -> term() | '$end_of_table'.
-spec next(tab(), term()) -> term() | '$end_of_table'.

…they could be…

-spec first(tab()) -> {ok, term()} | '$end_of_table'.
-spec next(tab(), term()) -> {ok, term()} | '$end_of_table'.

This way, there is no way you can confuse ‘$end_of_table’ for an actual key. This is how functions like lists:keyfind/3 and its friends work. The downside of this method is that functions like the ones in our iterator module would become more verbose since they would have to discard the tags.

Using Exceptions

I know people like Robert, Juan, Federico (if I recall correctly) and others will never recommend this, but there is another way (my favorite one, I must admit): Instead of returning something when the caller reaches the end of the table, our functions may instead throw an exception. In a sense, this is semantically correct: you are asking for the first key in an empty table, there is no first key, the function can’t give you a result, therefore it throws an exception. This is how erlang:hd/1 works.

But this is costly: throwing exceptions or raising errors generates an overhead that may very well impact the performance of systems using ets tables heavily. And there are lots of systems relying heavily on ets tables.

Filtering the Input

Of course, you can prevent this whole thing from happening in the first place by simply not allowing ‘$end_of_table’ to be a key in any ets table. This solution is what erlang:register/2, erlang:whereis/1 and ! use.

In our case, it would be responsibility of ets:insert, ets:insert_new, etc. to prevent ‘$end_of_table’ from being used as a key. Since the functions in the ets module are the only ones allowed to insert things in ets tables, by using them to prevent callers from inserting invalid keys in the table we would be defusing the semantic overload altogether. There will be no doubt about the meaning of ‘$end_of_table’ as a result since it could never be a key.

Final Words

The example above (about register, whereis and !) is tricky but documented: you basically can’t register a process as undefined and therefore undefined ! Something fails instead of trying to find a process called undefined to send the message to. It will be the subject of an upcoming blog post, but it also came to my attention when the students at Cali asked “Why should ! throw an error if we use an atom on the left side when it doesn’t do that if we use a Pid there?”.

So, if you’re going to take something away from this article, besides the ideas on how to deal with semantic overload written above, let it be this:

When learning something, be inquisitive. Asking even the weirdest questions can lead you (and the ones with you) to an amazing (or at least funny) journey of discovery!

--

--