Advanced List Incomprehensions

Published in

Erlang Battleground

7 min readOct 31, 2017

Do you remember List Incomprehensions? Well… there is more! And it will take us all the way down to the core of Erlang! Fasten your seatbelt, this will be a hell of a ride!

What we know so far

If you don’t remember the previous article, in a nutshell, this is what we’ve found so far:

2> [1 || 1].
[]
3> [something || nothing].
[]
4> [2 || []].
[]
5> [x || 1, 2, 3].
[]
6> [1 || true].
[1]
7> [wat || 1 == 1].
[wat]
8> [2 || true, true, true].
[2]
9> [3 || true, false, true].
[]
10> [throw(x) || false, throw(y)].
[]
11> [throw(x) || true, throw(y)].
** exception throw: y

In other words: You can totally write list comprehensions without generators. You can use one or more filters and, if they’re all true the result will be a list with just one element: the evaluation of the expression. If one of the filters is not true (remember: it doesn’t need to be false), the other filters will not be evaluated and the result will be an empty list.

But that is not the whole story…

Let’s go further!

Let’s define a very simple function that I know most of you will be familiar with already:

12> Id = fun(X) -> X end.
#Fun<erl_eval.6.99386804>
13> Id(true).
true
14> Id(false).
false
15> Id(nothing).
nothing

Nothing new so far, let’s try building some LCs with it…

16> [x || Id(true)].
[x]
17> [x || Id(false)].
[]

…wait for it…

18> [x || Id(nothing)].
** exception error: bad filter nothing

And that’s not all…

Let’s try to do the same in a module…

As you can see, y/0 uses a named function id/1 while x/0 uses an anonymous one, like we did on the shell. That seems to make no difference, but…

19> c(x).
{ok, x}
20> x:x().
** exception error: no case clause matching nothing
     in function  x:x/0 (x.erl, line 9)
21> x:y().
** exception error: no case clause matching nothing
     in function  x:y/0 (x.erl, line 5)

It seems like when the code is compiled, instead of bad_filter we get a case_clause error. The plot thickens…

What’s going on here?

As usual: First try to find what’s happening on your own. I’ll wait…
OK, let me show you what I have found so far.

The first thing to notice is this rather obscure part of the Erlang documentation:

A list comprehension:

[Expr(E) || E <- List]

is basically translated to a local function:

'lc^0'([E|Tail], Expr) ->
    [Expr(E)|'lc^0'(Tail, Expr)];
'lc^0'([], _Expr) -> [].

Although it’s not exactly the root cause of the behavior seen above, it looks like a promising starting point. My hypothesis is that the code that translates LCs into functions has some particularities on how it treats function applications in generators or filters. Let’s see if we can prove that…

Compiling a File

As we saw above, different things happen when the code is on a file or evaluated on a shell. Let’s first check how list comprehensions are translated when a module is compiled from a file.

In this case, x:c(). fails with case_clause, whilex:d(). and x:e(nothing) just return an empty list. Adding to the weirdness, Erlang even produces warnings for d/0 but not for c/0(see below).

The trick to understand what’s going on is to check how code looks like in core bsm, to do that we compile the module like this:

$ erlc +dcbsm x.erl
x.erl:6: Warning: no clause will ever match
x.erl:6: Warning: the guard for this clause evaluates to 'false'

Despite the warnings, that command generated the following file.

If we translate the important functions to something more Erlang-y, we get:

d() -> [].

In this case, the compiler is smart enough to know that the LC will always result in an empty list (hence the warnings above) and so the compiled code is just that: a function that returns [].

e(X) ->
  case '<>' of
    '<>' when X =:= true -> ['x'];
    '<>' -> []
  end.

…or…

e(X) ->
  if X =:= true -> ['x']
   ; true -> []
  end.

In other words, Erlang is transforming the LC in a simple case/if so that if X is true it returns a list with one element, otherwise it returns an empty list. More or less what we expected considering what we already knew so far.

c() ->
  case id(nothing) of
    true -> ['x'];
    false -> []
    % Generated code has another clause but that clause is the one
    % added to case statements without catch-all clauses.
  end.

In this case Erlang compiler uses pattern-matching directly on the result of the function application instead of using guards. And it explicitly matches on boolean values. Why? I don’t know. Second clause could’ve been a catch-all clause, to be consistent with the guard behavior.

Evaluating Expressions in the Shell

As we’ve seen a while back, the shell doesn’t compile expressions like the compiler. It evaluates them through erl_eval:expr(…).

With the same procedure we used previously, we’ll try to see how the expressions to be evaluated look like in our case. To achieve that, we put the expression in an anonymous function and use erlang:fun_info/1 on it.

1> F = fun() -> [x || nothing], [x || x:id(nothing)] end.
#Fun<erl_eval.20.99386804>
2> F().
** exception error: bad filter nothing
3> rp(erlang:fun_info(F)).
[{pid,<0.166.0>},
 {module,erl_eval},
 {new_index,20},
 {new_uniq,<<189,144,182,154,187,236,207,96,89,18,34,161,
             52,152,16,216>>},
 {index,20},
 {uniq,99386804},
 {name,'-expr/5-fun-3-'},
 {arity,0},
 {env,[{[],
        {eval,#Fun<shell.21.98304428>},
        {value,#Fun<shell.5.98304428>},
        [{clause,1,[],[],
                 [{lc,1,{atom,1,x},[{atom,1,nothing}]},
                  {lc,1,
                      {atom,1,x},
                      [{call,1,
                             {remote,1,{atom,1,x},{atom,1,id}},
                             [{atom,1,nothing}]}]}]}]}]},
 {type,local}]
ok

I highlighted the important pieces, the ones that represent the two LCs. As you can see here LCs are not translated into cases or ifs or anything else, they have their own tuple representation.

Checking the code on erl_eval, we can see that lc tagged tuples are evaluated with eval_lc/6. In particular, what matters to us is this clause ofeval_lc1:

eval_lc1(E, [F|Qs], Bs0, Lf, Ef, Acc) ->
    CompFun = fun(Bs) -> eval_lc1(E, Qs, Bs, Lf, Ef, Acc) end,
    eval_filter(F, Bs0, Lf, Ef, CompFun, Acc).

It will be evaluated once for each lc tuple because they both have just one F in their list of Qs (declarative variable naming was not a priority in erl_eval 🙄). Let’s see what eval_filter/6 looks like…

eval_filter(F, Bs0, Lf, Ef, CompFun, Acc) ->
  case erl_lint:is_guard_test(F) of
    true ->
      case guard_test(F, Bs0, Lf, Ef) of
        {value,true,Bs1} -> CompFun(Bs1);
        {value,false,_} -> Acc
      end;
    false ->
      case expr(F, Bs0, Lf, Ef, none) of
        {value,true,Bs1} -> CompFun(Bs1);
        {value,false,_} -> Acc;
        {value,V,_} ->
          erlang:raise(error, {bad_filter,V}, stacktrace())
      end
  end.

The tricky part is erl_lint:is_guard_test/2: nothing is a guard, but x:id(nothing) is not. guard_test/4 evaluates guards within a try...catch statement and it always result in either true or false. But expressions that are not guards are evaluated recursively using expr/5 and therefore they may result in something that’s not a boolean (they might even raise an error). If expr/5 returns something that’s not a boolean, erl_eval raises a bad_filter error.

The question is once more: why this distinction? Why is eval_filter not implemented as follows?

eval_filter(F, Bs0, Lf, Ef, CompFun, Acc) ->
  case erl_lint:is_guard_test(F) of
…
    false ->
      case expr(F, Bs0, Lf, Ef, none) of
        {value,true,Bs1} -> CompFun(Bs1);
        {value,_,_} -> Acc
      end
  end.

And… should that case be a try to precisely match the behavior of guard_test/4? should eval_filter/6 not distinguish between guards and regular expressions?

If you find out, let me know in the comments, please.

Final Words

In conclusion, while LCs are still one of my favorite features of Erlang, there are a couple of things to remember while working with them:

The contents of this article apply to all LCs, with or without generators.
LCs are syntactic sugar, and as such, they require an extra translation phase, from LCs to functions.
Don’t trust the shell. You may get different results if you evaluate the same LC in the shell and in a compiled module. Same goes for escripts.
Mind the guards. When you’re writing LC filters, you have to remember that they will behave differently (particularly in case of errors or non boolean results) if they are guards or just regular expressions.

OffTopic Shameless Plug

Two amazing happening in the near future:

SpawnFest

We’re resurrecting SpawnFest! Marcos Almonacid and I are organizing the 2017 edition of this amazing contest. We both participated on the 2011 and 2012 editions organized by Yurii Rashkovskii and they were a blast!

SpawnFest is a 48 hour free online development competition for Beamers around the world. Your team (1–4 members) will get exactly one weekend (December 9th 00:00 UTC to December 10th 23:59 UTC) to create the best applications you can. You will be judged by a hand-chosen group of elite judges (including Fred Hebert, Christopher Meiklejohn and Joe Armstrong) and you can win great prizes provided by our sponsors.

Register your team for FREE!

Erlang Camp @ OpenCamps NY

But before SpawnFest (November 17–20) I’ll be in New York City for the first time of my life. I’ll be joining an impressively large group of speakers at OpenCamps, talking about Opaque Data Structures like I did at the EFLBA this year (but in English, of course).