Elixir : Basics of Metaprogramming

Published in

Elemental Elixir

29 min readJan 13, 2024

Metaprogramming is a programming technique where you can write code that generates more code during its execution. It lets you treat code as data that can be operated on to modify and dynamically generate new code during the compilation or runtime phase. In Elixir, metaprogramming is possible through macros that allow manipulating, generating and transforming source code at compile time. Their major applications in Elixir are to remove boilerplate code, provide extensibility by creating new features and syntax, and to create domain specific languages etc. This article discusses the basics of metaprogramming, its syntax, usage and applications in Elixir.

Abstract Syntax Tree

Abstract Syntax Trees(AST) are tree-based hierarchical data structures that represent the structure of source code in a programming language. It serves as an intermediate representation of source code created and used by compilers, interpreters, and other language processing tools during the compilation or interpretation process. In Elixir, the source code is tokenised and parsed into ASTs by the compiler, before being finally converted into bytecode that is run by machines. Hence, if you have access to a lower level representation of your source code through ASTs, you essentially have the power to manipulate code in a lower level and obtain a higher level of control over your code, as available for the compilers. Metaprogramming in Elixir involves accessing and manipulating the ASTs to inject and expand code during the compilation process.

Even though the ASTs have a general tree structure, their actual specific representation will vary for different languages. As you have seen above, metaprogramming treats code as data. In our case, code = ASTs and in order to treat ASTs as data, we have to represent them using data that can be readily manipulated by the language. In Elixir, the ASTs are represented as three element tuples containing other Elixir terms such as atoms, integers, lists etc. Hence by representing ASTs using simple Elixir terms, Elixir lets you easily manipulate ASTs by directly manipulating the tuple, list, atoms, integers and other simple elixir terms that they are represented in.

quote

In Elixir, any expression’s AST can be accessed using the quote
macro and the resulting representation of ASTs are referred to or called as quoted expressions. The syntax involves using quote followed by a do/end block containing the expressions that have to be converted into quoted expressions. Similar to other constructs in Elixir, if the expression to convert is of a single line, then a single lined syntax without the end keyword can be used. Simple Elixir terms called as quote literals that comprises positive numbers, atoms, strings are kept as is when converted into quoted expressions. Lists and 2-element tuples retain the same structure while only the elements contained inside are converted into their respected quoted expression forms. Hence if lists and 2-element tuples contain simple terms mentioned above as their elements, then the same structure and the elements will be retained in the respective quoted expression form.

quote do: 1
1

quote do: 1.5
1.5

quote do: "hello"
"hello"

quote do: :ok
:ok

quote do: [1, 2]
[1, 2]

quote do: {1, 2}
{1, 2}

Structure of quoted expressions

Except the quote literals mentioned above, all other terms and expressions such as negative numbers, tuples containing more than 2 elements, maps, bitstrings, binaries, variables, functions, modules etc. are converted into a three element tuple representation of quoted expressions whose general structure is as follows.

{atom | tuple, [metadata1: val,..], [arg1, arg2,..] | atom}

The first element of the three element tuple contains an atom that represents either a function that is being called or a variable that is being referenced. For most of the operators such as +, -, /, * etc., there are equivalent inbuilt functions in Kernel module such as Kernel.+/2, Kernel.-/2 etc. Similarly for all other Elixir expressions, there is an inbuilt special syntax that can depict any expression as a function like syntax that takes in arguments. These special syntaxes are internally macros whose information can be accessed from the Kernel.SpecialForms module documentation. Other than an atom, the first element can also be another three element tuple in case of an explicit function call that uses the dot operator special syntax e.g. Module.function(arg).

The second element of the three element tuple contains a keyword list of metadata related to the function call or the special syntax. It may contain keys such as context that carries the value related to the scope of the function or the special syntax, the keys imports and aliases related to the modules involved and imported in the expression and some other keys depending on the type of the expression being quoted.

The third element contains a list of arguments that will be passed on to the function or special syntax. In case of a variable referencing, it can also contain an atom denoting the scope of the variable.

quote do: -1
{:-, [context: Elixir, import: Kernel], [1]}

quote do: 1 + 2
{:+, [context: Elixir, import: Kernel], [1, 2]}

quote do: 3 * 4
{:*, [context: Elixir, import: Kernel], [3, 4]}

quote do: {1, 2, 3}
{:{}, [], [1, 2, 3]}

quote do: %{1 => 2}
{:%{}, [], [{1, 2}]}

quote do: <<1, 2>>
{:<<>>, [], [1, 2]}

quote do: Enum                         # module 
{:__aliases__, [alias: false], [:Enum]}

quote do: x                            # variable
{:x, [], Elixir}

As you can see in the code above, simple expressions produce their respective three element tuple representation of quoted expressions. For complex expressions with multiple operators, function calls and definitions, the tree structure comes into play with nested three element tuple representations.

quote do: (1 + 2) * 4
{:*, [context: Elixir, import: Kernel],
 [{:+, [context: Elixir, import: Kernel], [1, 2]}, 4]}

Let us now remove the metadata to simplify the visualization of the AST’s tree structure.

{:*, _, [{:+, _, [1, 2]}, 4]}

As you can see, the addition operation present inside the primary three element tuple is represented itself as a quoted expression and is also present as the first argument of the multiplication operation, thus producing a nested tree structure via nested tuples.

   :*
   / \
 :+   4
 / \
1   2

quote do: <<1, 2::4>>
{:<<>>, [], [1, {:"::", [], [2, 4]}]}

   :<<>>
    / \
   1  :"::"
       / \
      2   4
--------------------------------------------------------------------------

quote do: x = 1
{:=, [], [{:x, [], Elixir}, 1]}

    :=
    / \
  :x   1
--------------------------------------------------------------------------       

quote do: String.length("hello")
{{:., [], [{:__aliases__, [alias: false], [:String]}, :length]}, [], ["hello"]}
   ___________________
   |        :.       |
   |       / \       |
   | :String :length |
   -------------------
            |
         "hello"
--------------------------------------------------------------------------

quote do
  defmodule Test do
    def double(x), do: 2 * x
  end
end
{:defmodule, [context: Elixir, import: Kernel],
 [
   {:__aliases__, [alias: false], [:Test]},
   [
     do: {:def, [context: Elixir, import: Kernel],
      [
        {:double, [context: Elixir], [{:x, [], Elixir}]},
        [
          do: {:*, [context: Elixir, import: Kernel],
           [2, {:x, [], Elixir}]}
        ]
      ]}
   ]
 ]}

    :defmodule
   /      \
:Test   [do: :def]
             \
             :double
             /   \
           :x  [do: :*]
                 /  \
                2   :x

As you can see in the code above, quoted expressions can get very complex with multiple levels of nesting when dealing with complex expressions. Elixir offers the Macro module which provides multiple functions that operate on quoted expressions. One such function is the Macro.to_string/1 that converts a quoted expression back to a string representation of its high level source code.

{{:., [], [{:__aliases__, [alias: false], [:String]}, :length]}, [], ["hello"]} 
|> Macro.to_string

"String.length(\"hello\")"
-------------------------------------------------------------------------------

{:defmodule, [context: Elixir, import: Kernel],
 [
   {:__aliases__, [alias: false], [:Test]},
   [
     do: {:def, [context: Elixir, import: Kernel],
      [
        {:double, [context: Elixir], [{:x, [], Elixir}]},
        [
          do: {:*, [context: Elixir, import: Kernel],
           [2, {:x, [], Elixir}]}
        ]
      ]}
   ]
 ]} |> Macro.to_string

"defmodule Test do\n  def double(x) do\n    2 * x\n  end\nend"

Similar to the Macro module, Elixir offers the Code module that provides functions related to code compilation and evaluation. The Code.eval_quoted/3 function can be used to directly evaluate the quoted expressions, that returns a tuple with the result of the evaluated quoted expression and a list of optional variable bindings passed into it.

ast = quote do: elem({1, 2, 3}, 0)
{:elem, [context: Elixir, import: Kernel], [{:{}, [], [1, 2, 3]}, 0]}

Code.eval_quoted(ast)
{1, []}

unquote

When you try to use the variables from the outer scope directly inside a quote block, it does not use the value bound to the variable and instead treats the variable as a variable reference. The unquote macro solves this by evaluating whatever is passed into it and injecting the result of the evaluation into the quoted expression in place of the variable. It works very similarly to the string interpolation construct where the value of a particular expression is evaluated and then injected into the string. Similar to the string interpolation construct that can be used only within a string literal, the unquote macro can be used only inside a quote block.

x = 1

"x" # uses x directly and not its value.

"#{x}" # string interpolation that injects the value of x in the string.
"1"

x = 1

ast = quote do: x * 2 # variable x used directly inside the quote macro
{:*, [context: Elixir, import: Kernel], [{:x, [], Elixir}, 2]}

Macro.to_string(ast)
"x * 2" # code containing x instead of its value 1

ast = quote do: unquote(x) * 2 # uses unquote to inject the value of x
{:*, [context: Elixir, import: Kernel], [1, 2]}

Macro.to_string(ast)
"1 * 2" # code containing the value 1 in place of the variable x

The unquote macro is widely used for constructing quoted expressions using multiple smaller chunks of other quoted expressions and for injecting quote literals bound to variables. The syntax involves using unquote(variable to inject) inside the quote block where the code is to be injected. For the quote literals mentioned above such as positive numbers, atoms, lists, two element tuples and strings, whose quoted expression’s form is the same as their normal form, their bound variable can be directly used inside the unquote macro.

x = 1
ast = quote do: unquote(x) * 2
{:*, [context: Elixir, import: Kernel], [1, 2]}

Macro.to_string(ast)
"1 * 2"

Code.eval_quoted(ast)
{2, []}
---------------------------------------------------------------------------

a = [1, 2]
ast = quote do: Enum.sum(unquote(a))
{{:., [], [{:__aliases__, [alias: false], [:Enum]}, :sum]}, [], [[1, 2]]}

Code.eval_quoted(ast)
{3, []}
---------------------------------------------------------------------------

a = {1, 2}
ast = quote do: elem(unquote(a), 0) + elem(unquote(a), 1)
{:+, [context: Elixir, import: Kernel],
 [
   {:elem, [context: Elixir, import: Kernel], [{1, 2}, 0]},
   {:elem, [context: Elixir, import: Kernel], [{1, 2}, 1]}
 ]}

Code.eval_quoted(ast)
{3, []}
---------------------------------------------------------------------------

module = Enum
ast = quote do: unquote(module).sum([1, 2])
{{:., [], [Enum, :sum]}, [], [[1, 2]]}

Code.eval_quoted(ast)
{3, []}

But using variables bound to complex terms directly inside the unquote macro will cause an error while executing the quoted expression. Normally when a term is directly used inside the quote block, it will be converted into the quoted expression form when constructing the overall quoted expression. But the unquote macro merely substitutes or replaces the value of the variable in the resulting quoted expression similar to a plain String.replace call. Hence in the resulting quoted expression, the quoted expression used to refer to the provided variable will be replaced by the raw value of the variable.

ast_1 = quote do: x * 2
{:*, [context: Elixir, import: Kernel], [{:x, [], Elixir}, 2]}

ast_2 = quote do: unquote(x) * 2
{:*, [context: Elixir, import: Kernel], [1, 2]}

If you look closely at the two different quoted expressions generated above, the quoted expression related to referencing the variable x,
{:x, [], Elixir} is directly replaced by the value of x, 1 when using the unquote macro.

For a quoted expression to be valid, all of its nodes must also be valid AST nodes. For quote literals there were no issues since both their raw form and the quoted expression form are the same and even a mere substitution of the value in place of the referenced variable’s AST works fine since they are valid AST nodes by themselves. But for complex terms, a mere replacement that directly injects the raw form will not work since they are not valid AST nodes. Instead the injected value must first be converted into an AST node before injecting it in the resulting quoted expression.

ast = quote do: %{1 => 1}
{:%{}, [], [{1, 1}]} # valid AST node that can be executed without issues

Code.eval_quoted(ast)
{%{1 => 1}, []}
---------------------------------------------------------------------------
x = %{1 => 1}

ast = quote do: unquote(x)
%{1 => 1} # not a valid AST node since it is a raw form of map

Code.eval_quoted(ast)
error: invalid quoted expression: %{1 => 1}

Please make sure your quoted expressions are made of valid AST nodes. 
If you would like to introduce a value into the AST, such as a 
four-element tuple or a map, make sure to call Macro.escape/1 before
└─ nofile

** (CompileError) cannot compile code (errors have been logged)

This situation can be handled by making use of the Macro.escape/2 function that directly converts or escapes a value bound to a variable into a quoted expression, which can then be injected using the unquote macro.

x = %{1 => 1}

ast = Macro.escape(x)
{:%{}, [], [{1, 1}]}

Code.eval_quoted(ast)
{%{1 => 1}, []}

---------------------------------------------------------------------------
x = {1, 2, 3}

x_ast = Macro.escape(x)
{:{}, [], [1, 2, 3]}

ast = quote do: elem(unquote(x_ast), 0)
{:elem, [context: Elixir, import: Kernel], [{:{}, [], [1, 2, 3]}, 0]}

Code.eval_quoted(ast)
{1, []}

---------------------------------------------------------------------------
x = {1, 2, 3}

ast = quote do: elem(unquote(Macro.escape(x)), 0)
{:elem, [context: Elixir, import: Kernel], [{:{}, [], [1, 2, 3]}, 0]}

Code.eval_quoted(ast)
{1, []}

But, only simple values of certain data types are allowed to be escaped using the Macro.escape/2 function such as lists, tuples, maps, atoms, numbers, bitstrings, PIDs and remote functions in the capture operator format &Module.function/arity. Hence variables bound to any other type of terms apart from the above mentioned types cannot be injected using the unquote macro. But if you explicitly have the quoted expression of any term instead of its raw form, it can still be injected using unquote since it is already a valid AST node. The restriction is only for the raw complex terms bound to a variable.

func = fn x -> x * x end

Macro.escape(func)
** (ArgumentError) cannot escape #Function<42.125776118/1 in :erl_eval.expr/6>.
The supported values are: lists, tuples, maps, atoms, numbers, bitstrings, 
PIDs and remote functions in the format &Mod.fun/arity
    (elixir 1.16.0) src/elixir_quote.erl:523: :elixir_quote.argument_error/1
    iex:95: (file)
------------------------------------------------------------------------------

func_ast = {:fn, [],
 [
   {:->, [],
    [
      [{:x, [], Elixir}],
      {:*, [context: Elixir, import: Kernel],
       [{:x, [], Elixir}, {:x, [], Elixir}]}
    ]}
 ]}

ast = quote do: unquote(func_ast).(2)
{{:., [],
  [
    {:fn, [],
     [
       {:->, [],
        [
          [{:x, [], Elixir}],
          {:*, [context: Elixir, import: Kernel],
           [{:x, [], Elixir}, {:x, [], Elixir}]}
        ]}
     ]}
  ]}, [], [2]}

Code.eval_quoted(ast)
{4, []}

There is a variation for the unquote macro called unquote_splicing/1 that can be used for lists to expand and flatten them when nested inside another list.

a = [1, 2]

quote do: [0, unquote(a), 0]
[0, [1, 2], 0]

quote do: [0, unquote_splicing(a), 0]
[0, 1, 2, 0]

Macros

Now that we have seen how to manipulate and construct quoted expressions using the quote and unquote macro, let’s dive into how to inject these quoted expressions into source code using macros. Macros are the building blocks of Metaprogramming in Elixir. They are special kinds of functions that resemble functions in many ways such as being allowed to be defined only within a module, ability to define both private and public macros, ability to have multiple function clauses and the ability to pattern match arguments and contain guard clauses. The major differences are that any argument sent into the macros are automatically converted into quoted expressions and then bound to the parameter variables. They run at compile time as opposed to functions that run at runtime. They can only return a valid quoted expression and returning any other term will lead to a compile time error.

A macro takes in quoted expressions as arguments, uses them to create or generate another quoted expression which is then injected into source code during compilation. Most of the Elixir’s standard library starting from defmodule, def, if etc right down to almost everything is internally made up of macros and this same feature is available for anyone to utilise and build on top of it.

A macro can be defined using defmacro or defmacrop. These are internally macros themselves and they are used to create new public and private macros respectively. The syntax involves using defmacro or defmacrop followed by the name of the macro and a do block. Any term or expression passed into a macro as arguments are automatically converted into quoted expressions and are available for use within the do block of the macro. A typical do block of a macro uses the quote macro and injects the passed in arguments using unquote macro to create and return a new quoted expression. Let us now create an if construct using macros and use it in code.

defmodule MacroTest do
  defmacro if(condition, do: do_block, else: else_block) do
    quote do
      case unquote(condition) do
        result when result != false and result != nil -> unquote(do_block)
        _ -> unquote(else_block)
      end
    end
  end
end

We have defined a macro called if/2 that takes in a conditional expression as its first argument and a keyword list as its second argument. Since the keyword list is the last argument, the square braces enclosing the key value pairs are optional. All of the arguments passed into a defmacro will be automatically converted into quoted expressions. Hence in our case the parameter variables condition, do_block and else_block will be bound to the quoted expressions and not the raw terms, and hence they can be readily passed into the unquote macro. A macro must also return code in the form of a quoted expression that will be expanded and injected into the caller module during compilation. We know how a quoted expression will be represented using a three element tuple and we could construct them manually. But we saw how complex the quoted expression could get even for a simple expression. This is where the quote macro comes in and lets us write code in the easy-to-write high level Elixir syntax and converts it for us into the complex quoted expression form. In our case, we are creating code that contains and uses the existing case construct to execute the do_block if the expression evaluates to a truthy value and the else_block otherwise. Please note that we are using unquote to inject all of the arguments condition, do_block and else_block into the quoted expression.

Let us now use the above defined if macro in code.

defmodule Test do
  require MacroTest
    
  def even?(num) do
    MacroTest.if(rem(num, 2) == 0, do: true, else: false)
  end
end
---------------------------------------------------------------------------
Test.even?(5)
false

Test.even?(4)
true

In the code above, we have used the require/2 macro before using the if macro. This is not required for calling functions because functions are executed during runtime where all the modules will be compiled and loaded, making all the functions available. But macros, unlike functions, are executed during compile time. Hence we have to explicitly make the macro definition compiled, loaded and available before using it in modules. The require macro does exactly this and makes all the macros from the source module available for use during compilation. You can also use the import/2 macro that lets you use all the functions and macros from a particular module by referring only to the function/macro’s name without needing the fully qualified name.

To verify that we are indeed getting access to the passed in arguments as quoted expressions, let’s inspect them inside the macro definition.

defmodule MacroTest do
  defmacro if(condition, do: do_block, else: else_block) do
    IO.inspect(condition, label: "Condition:")
    IO.inspect(do_block, label: "Do_block:")
    IO.inspect(else_block, label: "Else_block:")
    quote do
      case unquote(condition) do
        result when result != false and result != nil -> unquote(do_block)
        _ -> unquote(else_block)
      end
    end
  end
end
---------------------------------------------------------------------------
require MacroTest

MacroTest.if(5 == 5.0, do: IO.puts("do block"), else: :else_block)
Condition:: {:==, [line: 56], [5, 5.0]}
Do_block:: {{:., [line: 56], [{:__aliases__, [line: 56], [:IO]}, :puts]}, [line: 56], ["do block"]}
Else_block:: :else_block
do block
:ok

You can also use the multi line syntax with a do/else/end block similar to the original if construct.

num = 5
MacroTest.if rem(num, 2) == 0 do
  :even
else
  :odd
end
:odd

This syntax is possible because Elixir provides this syntactic sugar only for certain standard library constructs like the do/end block, do/else/end block and their single line syntax alternative, which will be transformed internally into a keyword list of the structure[do: do_block] and
[do: do_block, else: else_block] respectively. Hence this syntax is not possible if other custom keys are used in the keyword list arguments in a macro. You can also note that there are no parentheses enclosing the arguments when the macro is called, which is again possible since macros are only functions and they follow the same syntax.

Macros behave the same as functions in terms of pattern matching the arguments and using multiple clauses. Again, the arguments in this case will be quoted expressions and not terms in their raw form. Hence they should be pattern matched accordingly. In the if macro that we have written we are pattern matching both the keys :do and :else from the keyword list passed as the second argument and hence if we only send in the do key and the do block into the macro, the pattern matching will fail and a FunctionClauseError will be thrown.

MacroTest.if true, do: "Do block"
** (FunctionClauseError) no function clause matching in MacroTest.if/2
    expanding macro: MacroTest.if/2
    iex:56: (file)

Let’s fix this by adding another clause and using nil as the default else block expression.

defmodule MacroTest do
  defmacro if(condition, do: do_block) do
    quote do: MacroTest.if(unquote(condition), do: unquote(do_block), else: nil)
  end
  defmacro if(condition, do: do_block, else: else_block) do
    quote do
      case unquote(condition) do
        result when result != false and result != nil -> unquote(do_block)
        _ -> unquote(else_block)
      end
    end
  end
end
---------------------------------------------------------------------------
require MacroTest
MacroTest.if true, do: "Do Block"
"Do block"

MacroTest.if false, do: "Do Block"
nil

Unlike functions where you can just call other function clauses, macros must return a quoted expression. Hence we are generating a quoted expression that contains code for calling the if/2 macro with :else bound to nil as the default else_block expression.

There is a possibility to build the custom if macro that we have built with just normal functions to achieve the same outcome.

defmodule FuncTest do
  def if(condition, do: do_block, else: else_block) do
    case condition do
      result when result != nil and result != false -> do_block
      _ -> else_block
    end
  end
end
---------------------------------------------------------------------------
FuncTest.if(true, do: "Do block", else: "Else block")
"Do block"

But the main problem is that when we are passing an expression as an argument to functions, it gets evaluated and reduced before the execution reaches the function body. Because of this, every argument will get evaluated irrespective of what is happening inside the function body. Let’s demonstrate this using the following code.

FuncTest.if(true, do: IO.puts("Do block"), else: IO.puts("Else block"))
Do block
Else block
:ok
-----------------------------------------------------------------------------
InvalidModule.if(true, do: IO.puts("Do block"), else: IO.puts("Else block"))
Do block
Else block
** (UndefinedFunctionError) function InvalidModule.if/2 is undefined 
   (module InvalidModule is not available)
    InvalidModule.if(true, [do: :ok, else: :ok])
    iex:72: (file)

As you can see above, the IO.puts calls get evaluated and printed, and the return value of the IO.puts call, :ok is passed in as the argument for both do_block and else_block irrespective of what is happening inside the function body. This is not the case when using macros. The arguments will not be evaluated but instead converted as a whole into quoted expressions before getting passed into the macro body. They are only evaluated inside the function body and not before the execution gets to the function body.

MacroTest.if(true, do: IO.puts("Do block"), else: IO.puts("Else block"))
"Do block"
:ok

Macro expansion

Macro execution and code injection in Elixir happens during compilation to transform the source code. Internally once the compilation process starts, the Elixir source files will be read first, tokenised and then parsed into the initial ASTs. The initial ASTs reflect code as present in the source file and all of the macro calls are represented using the general AST tuple representation, {macro to be called, metadata, arguments for the macro}

defmodule MacroTest do
  defmacro macro_1 do
    quote do
      macro_var_1 = 1
      MacroTest.macro_2(macro_var_1)
    end
  end
  
  defmacro macro_2(n) do
    quote do
      macro_var_2 = unquote(n) + 1
      MacroTest.macro_3(macro_var_2)
    end
  end

  defmacro macro_3(n) do
    quote do: unquote(n) + 1
  end
end
---------------------------------------------------------------------------
require MacroTest

initial_ast = quote do: MacroTest.macro_1()
{{:., [], [{:__aliases__, [alias: false], [:MacroTest]}, :macro_1]}, [], []}

In the code above, we have defined some simple macros to explain the macro expansion process. In the above generated initial AST for MacroTest.macro_1, the macro call is represented using the above mentioned {macro to be called, metadata, arguments} AST form, reflecting what is present in the source code. This is how all the macro calls will be represented in the initial AST. Once the initial AST has been generated by the compiler, the macro expansion phase kicks in. This is where all the macro calls are identified in the initial AST and executed by passing in the respective arguments as quoted expressions. After the macros are executed, they return quoted expressions that are then injected into the initial AST in place of the previously present AST form,
{macro to be called, metadata, arguments} of the respective macro calls. The Macro module provides a function called Macro.expand_once/2 that can be used to simulate this expansion.

initial_ast = quote do: MacroTest.macro_1()
{{:., [], [{:__aliases__, [alias: false], [:MacroTest]}, :macro_1]}, [], []}

expanded_ast = Macro.expand_once(initial_ast, __ENV__)
{:__block__, [],
 [
   {:=, [], [{:macro_var_1, [counter: -576460752303423260], MacroTest}, 1]},
   {{:., [],
     [
       {:__aliases__, [counter: -576460752303423260, alias: false],
        [:MacroTest]},
       :macro_2
     ]}, [], [{:macro_var_1, [counter: -576460752303423260], MacroTest}]}
 ]}

Macro.to_string(expanded_ast) |> IO.puts
macro_var_1 = 1
MacroTest.macro_2(macro_var_1)
:ok

When we expand the initial AST, we can see that the initial AST representing the MacroTest.macro_1 call has been expanded into something else. During the expansion phase, the MacroTest.macro_1 will first be identified as a macro call and will be called and executed. This macro execution will return a quoted expression as provided in its macro definition, which will then replace the initially present AST for the MacroTest.macro_1 call. You can verify this by going through the expanded AST. You can see nodes that refer to :macro_var_1 variable and the MacroTest.macro_2 call which are indeed present in the quoted expression returned by the macro_1 macro. We can also use the Macro.to_string/1 to print a higher level source code representation of the expanded AST to verify what it represents. The __ENV__ macro contains various compile time information about the current file, line number, module etc and is required by the Macro.expand_once and a lot of other functions in the Macro module.

Thus the initial AST representing the source code has been transformed by expansion. But if you see the expanded AST we still have another macro call MacroTest.macro_2 which was present in the quoted expression injected after the first expansion. This macro call will also be identified next , executed and expanded using the quoted expression returned by macro_2. Similarly, during the expansion phase, the compiler will repeat the process of traversing all the AST nodes, identifying macro calls, executing them and injecting the quoted expressions returned by them recursively until all the nested macro calls have also been expanded. The final form of AST after complete expansion will not contain any macro calls except for the elixir core constructs and special forms. They cannot be expanded any further by the compiler.

Let’s simulate further expansion of the example AST provided above. Unlike the expansion phase during compilation, the Macro.expand_once function expands only the root node of the provided AST and will not traverse the child nodes and expand all of the macro calls present in the inner nodes. In our expanded AST, the root node is the __block__ special form. The macro call MacroTest.macro_2 is present in one of the inner nodes and hence if we use the Macro.expand_once function on the expanded AST, the expansion will not happen for the macro call in the inner nodes. We will instead manually extract the inner AST node representing the macro call and use it in the function call to expand it further.

expanded_ast = {:__block__, [],
 [
   {:=, [], [{:macro_var_1, [counter: -576460752303423260], MacroTest}, 1]},
   {{:., [],
     [
       {:__aliases__, [counter: -576460752303423260, alias: false],
        [:MacroTest]},
       :macro_2
     ]}, [], [{:macro_var_1, [counter: -576460752303423260], MacroTest}]}
 ]}
-----------------------------------------------------------------------------
# next expansion
{_block, _metadata, [_macro_var_1, macro_call_ast]} = expanded_ast 

expanded_ast_1 = Macro.expand_once(macro_call_ast, __ENV__)

{:__block__, [],
 [
   {:=, [],
    [
      {:macro_var_2, [counter: -576460752303423164], MacroTest},
      {:+, [context: MacroTest, import: Kernel],
       [{:macro_var_1, [counter: -576460752303423260], MacroTest}, 1]}
    ]},
   {{:., [],
     [
       {:__aliases__, [counter: -576460752303423164, alias: false],
        [:MacroTest]},
       :macro_3
     ]}, [], [{:macro_var_2, [counter: -576460752303423164], MacroTest}]}
 ]}

Macro.to_string(expanded_ast_1) |> IO.puts
macro_var_2 = macro_var_1 + 1
MacroTest.macro_3(macro_var_2)
:ok
-------------------------------------------------------------------------------
# final expansion

{_, _, [_, macro_call_ast]} = expanded_ast_1

final_ast = Macro.expand_once(macro_call_ast, __ENV__)
{:+, [context: MacroTest, import: Kernel],
 [{:macro_var_2, [counter: -576460752303423451], MacroTest}, 1]}

Macro.to_string(final_ast) |> IO.puts
macro_var_2 + 1
:ok

                           MacroTest.macro_1
                                   |
                                   V
                           MacroTest.macro_2
                                   |
                                   V
                           MacroTest.macro_3
                                   |
                                   V
                               final ast

Once the expansion phase is over and the initial AST has been completely expanded into the final AST, a series of processes such as conversion to erlang AST, conversion to core AST etc will be performed during compilation before code is finally transformed into bytecode that is stored in the .beam files. This is how macro expansion happens during compilation to transform source code by injecting code produced by the macros.

Macro hygiene

Macros involve injection of code and consists of two different contexts such as the macro’s context and the caller’s context. The macro’s context is the scope that binds and refers to all variables defined inside the macro’s definition and the caller’s context is the scope that binds and refers to all the variables defined inside the caller function or module where the macro’s code is injected into. Macro hygiene ensures that the macro’s context is protected and that the variables defined inside the macros don’t leak into the caller’s context. Even though the code from macro’s definition will be injected into the caller’s context, by default, any variable defined and used inside the macro’s definition will not be available and will not clash with variables of same names in the caller’s context.

defmodule MacroTest do
  defmacro test do
    quote do: x = "macro"
  end 
end
---------------------------------------------------------------------------
defmodule HygieneTest do
  require MacroTest
  def test do
    x = "caller"
    MacroTest.test()
    x
  end
end
---------------------------------------------------------------------------
HygieneTest.test()
"caller"

As you can see above, in the caller’s context, which is the function Hygiene.test, we are binding the value "caller" to the variable x. Then we have the MacroTest.test macro call which will be executed during compilation and code that binds the value "macro" to a variable of the same name x, is injected into the caller’s context. When the variable x is returned from the function, it is still bound to the value "caller" that was bound in the caller’s context. Hence the variable x and its binding to "macro" in the macro’s context did not affect or leak into the caller’s context.

The main reason why macro hygiene is maintained by default is to prevent caller context pollution and to make macros operate reliably. When callers use macros, they don’t necessarily know the internal workings of the macro and the name of the variables used in it. If there is no hygiene by default and if the caller unknowingly uses variables of same names used in the macro’s context, then the caller’s context will be polluted by the macro’s context, overriding the variables behind the screen, leading to undesired outcomes. To avoid this, macro hygiene is enforced by default. But there may be certain scenarios where the variables defined inside the macro’s context need to be exposed explicitly to the caller’s context. This can be achieved using var!. It lets you define and introduce variables in the macro’s context that override the macro hygiene and leak into the caller’s context.

defmodule MacroTest do
  defmacro test do
    quote do: var!(x) = "macro"
  end 
end
---------------------------------------------------------------------------
defmodule HygieneTest do
  require MacroTest
  def test do
    x = "caller"
    MacroTest.test()
    x
  end
end
---------------------------------------------------------------------------
HygieneTest.test()
"macro"

As you can see above, inside the macro’s context, we are defining the variable x using the var! macro that makes it override macro hygiene. When the macro call is executed and code is injected into the caller’s context, it overrides the already performed binding in the caller’s context, thus leaking into the caller’s context. In terms of AST representation, for variables defined normally, the third element will specify the context as a module where it has been defined, while for variables defined using the var! macro, the third element in the AST representation will be nil, denoting that this variable will override macro hygiene.

quote do: x
{:x, [], Elixir} # third element specifies context as Elixir

---------------------------------------------------------------------------
var_ast = quote do: var!(x)
{:var!, [context: Elixir, import: Kernel],
 [{:x, [], Elixir}]}

Macro.expand_once(var_ast, __ENV__)
{:x, [if_undefined: :raise], nil} # third element specifies context as nil

Extracting information from AST

So far, we have written simple macros that compose AST by just injecting the AST arguments. But the incoming AST arguments contain a lot of information that can be extracted to create complex macros. This data extraction can be performed by pattern matching the AST arguments to extract a specific AST node or its metadata. But in order to pattern match the incoming AST, you should understand its detailed internal structure to know exactly where to obtain the information that you are looking for.

Let’s try to create a simple assert macro that asserts if the result of the left hand side equals the result of the right hand side expression. If the assertion fails, it should print out information about the failure of the assertion. The macro will take a single argument as the expression in the form lhs == rhs. If the expression evaluates to true, then the assertion has passed, but if the expression evaluates to false, the assertion has failed and we have to raise an error with information about the failure in the following format.

Assertion failed - #{expression}
LHS => #{lhs_expression} = #{lhs_result}
RHS => #{rhs_expression} = #{rhs_result}

We only get the whole expression as a single argument and we have to somehow extract the left hand side and the right hand side expressions separately to provide the above information. Let’s try to read the AST of a simple expression in the format lhs == rhs to understand its structure and figure out where we can get the lhs and the rhs expressions separately.

quote do: 1 == 2
{:==, [context: Elixir, import: Kernel], [1, 2]}

We have used a simple term 1 as the lhs expression and a simple term 2 as the rhs expression. If we study the resulting AST of the expression, we can see that it is in the format {operator, metadata, [lhs, rhs]}. Thus by reading the structure of a sample AST, we have determined that the lhs and rhs expressions can be obtained from the third element of the AST.

defmodule Assertion do
  defmacro assert({_, _, [lhs, rhs]} = expr) do
    [exp_str, lhs_str, rhs_str] = Enum.map([expr, lhs, rhs], &Macro.to_string/1)
    quote do
      lhs_result = unquote(lhs)
      rhs_result = unquote(rhs)
      if lhs_result == rhs_result do
        IO.puts("Assertion passed")
      else
        raise """
        Assertion failed - #{unquote(exp_str)}
        LHS => #{unquote(lhs_str)} = #{lhs_result}
        RHS => #{unquote(rhs_str)} = #{rhs_result}
        """
      end
    end
  end
end
------------------------------------------------------------------------------
require Assertion

Assertion.assert(String.length("hello") == 5)
Assertion passed
:ok

Assertion.assert(String.length("hello") == 2 + 4)
** (RuntimeError) Assertion failed - String.length("hello") == 2 + 4
LHS => String.length("hello") = 5
RHS => 2 + 4 = 6
    iex:86: (file)

Thus we have successfully studied the internal structure of an expression’s AST and have parsed it using pattern matching to create the assert macro that spits out extracted information about the expression in case of a failure. But, this was fairly a simple AST and parsing it using pattern matching was also simple. In case of complex ASTs with multiple levels of nested nodes, it will be tedious to extract information nested deeply in the AST with just pattern matching. To tackle this, the Macro module provides various functions that can be combined with pattern matching to parse and manipulate complex ASTs.

Since the ASTs are tree based structures they can be traversed by using the the functions prewalk/2, prewalk/3, postwalk/2, postwalk/3, traverse/4 which perform pre-order and post-order traversals of the AST nodes to read and access information in complex nested ASTs. The decompose_call/1 can be used to parse an AST representing a function call and return a tuple containing the name of the function being called and the arguments passed into it. The update_meta/2 function can be used to update the metadata of a particular AST.

:bind_quoted

Whenever the unquote macro is used inside the quote block, the AST argument passed inside the unquote macro is evaluated and the result of the expression is injected. Hence if you need to inject a value multiple times and if you use unquote on it multiple times, then the AST expression will get evaluated every time to inject the result. This is not efficient when unquote is operating on expressions. For expressions that have side effects such as logging messages, the side effect will get performed every time and unquote is used on that expression. To avoid this we could use a variable inside the quote block and assign the result of unquote to it and then reuse it in places where we need to inject the result. This is what we have done in our assert macro definition, assigning the result of unquote(lhs) and unquote(rhs) to variables and reusing them to avoid evaluating the expressions multiple times.

The quote macro provides an alternative for this by utilising the :bind_quoted option. It takes in a keyword list of bindings, with variables as atom keys and the AST arguments as values. These values in the keyword list will be unquoted and bound to their respective atom keys, which can then be reused anywhere inside the quote block without needing to explicitly unquote it again. But whenever you use the :bind_quoted option, unquoting is disabled inside the quote block by default. Hence if you have to use both :bind_quoted and the unquote macro inside the quote block, then an explicit unquote: true option must also be passed along with the :bind_quoted option. Let’s rewrite the assert macro with :bind_quoted option.

defmodule Assertion do
  defmacro assert({_, _, [lhs, rhs]} = expr) do
    [exp_str, lhs_str, rhs_str] = Enum.map([expr, lhs, rhs], &Macro.to_string/1)
    quote bind_quoted: [lhs_result: lhs, rhs_result: rhs], unquote: true do
      if lhs_result == rhs_result do
        IO.puts("Assertion passed")
      else
        raise """
        Assertion failed - #{unquote(exp_str)}
        LHS => #{unquote(lhs_str)} = #{lhs_result}
        RHS => #{unquote(rhs_str)} = #{rhs_result}
        """
      end
    end
  end
end

As you can see above, the AST expressions lhs and rhs are automatically unquoted and their results are bound to their respective variables, which are reused inside the quote block. We are also passing the unquote: true option to explicitly enable the unquote macro as it is disabled due to the use of :bind_quoted option.

In our code above, we have an if block inside the quote block that performs the logic of asserting the expression and raising error with information on assertion failure. Whenever the assert macro is used, this if block and its logic will be injected in all of its callers, which is not recommended. The amount of code injected into the callers by macros must be as minimal as possible. In such cases the common logic can be moved into functions and they can be called from within the quote macro. This way only the function call will be injected into callers and the macro becomes more maintainable.

defmodule Assertion do
  defmacro assert({_, _, [lhs, rhs]} = expr) do
    [exp_str, lhs_str, rhs_str] = Enum.map([expr, lhs, rhs], &Macro.to_string/1)
    quote bind_quoted: [lhs_result: lhs, rhs_result: rhs, exp_str: exp_str,
                        lhs_str: lhs_str, rhs_str: rhs_str] do
      Assertion.assert(exp_str, {lhs_str, lhs_result}, {rhs_str, rhs_result})
    end
  end

def assert(exp_str, {lhs_str, lhs_result}, {rhs_str, rhs_result}) do
    if lhs_result == rhs_result do
        IO.puts("Assertion passed")
    else
      raise """
      Assertion failed - #{exp_str}
      LHS => #{lhs_str} = #{lhs_result}
      RHS => #{rhs_str} = #{rhs_result}
      """
    end
  end
end

Dynamic code generation using unquote fragments

Dynamic code such as function definitions can be created based on some data using macros. Elixir also offers an alternate way to create dynamic code using unquote fragments. Unquote fragments involve using the unquote macro to inject values bound to variables into the function definition. The unquote fragments behave the same as when it is used inside the quote block to inject values into AST.

defmodule Test do
  x = :func
  def x, do: 1
end
---------------------------------------------------------------------------
Test.x()
1

Test.func()
** (UndefinedFunctionError) function Test.func/0 is undefined or private
    Test.func()

In the code above, we are assigning the atom :func to the variable x and we are creating a function definition using the def macro. The first argument passed into the def macro is x and the second argument is the keyword list [do: 1]. The result of this code is that a function with the name :x will be defined. But what if our aim was instead to create a function definition dynamically by using the value :func bound to the variable x as the function name instead of :x itself. This is where unquote fragments come in.

defmodule Test do
  x = :func
  def unquote(x)(), do: 1
end
---------------------------------------------------------------------------
Test.x()
** (UndefinedFunctionError) function Test.x/0 is undefined or private
    Test.x()

Test.func()
1

We have used unquote to inject the value bound to x as the function name thus creating a dynamic function definition. Similarly we can inject any values bound to variables into the function arguments and inside the function body to generate function definitions dynamically using unquote fragments. Please note that since function names are internally atoms, only valid atoms can be used for function names.

defmodule Test do
  list = [one: 1, two: 2, three: 3]
  for {k, v} <- list do
    def unquote(k)(), do: unquote(v) 
  end 
end
---------------------------------------------------------------------------
Test.one
1
Test.two
2
Test.three
3

You can also achieve the same with macros, but creating dynamic function definitions using unquote fragments are much simpler and easier. Dynamic function definitions are widely created this way using data from external sources like files. Elixir uses this technique to dynamically build numerous function definitions for Unicode support from a file containing Unicode data. During compilation the file is read and function definitions are created dynamically for different characters with information extracted from the external file, thus avoiding thousands of function definitions for each character being manually written.

Macros for DSLs

Domain Specific Languages(DSLs) are languages that are designed to solve problems related to a specific domain as opposed to general programming languages that can be used to create solutions across multiple domains. DSLs have simple specific syntax reflecting a particular domain and are easier to use for domain experts while general programming languages contain a general complex syntax that has to be used to create domain specific solutions. Using DSLs increases productivity within the particular domain by using concise and readable domain specific keywords as syntax. Elixir offers a way to build any DSL by using macros. This eliminates the need to use general complex syntax to deal with domain specific problems.

HTML is a markup language that deals with structuring web pages. But when we have to create or manipulate HTML in backend languages, we may have to use string manipulation to achieve the same. This gets very tedious and complex when handled with the general programming language syntax, involving a lot of character escaping and formatting, making the code hard to read and error prone. With the help of Elixir’s macros you can define a HTML specific DSL that lets you create and work with HTML in a much easier way with a simple specific syntax similar to the original HTML tags.

html do
  head do
    title do: "HTML DSL"
  end
  body do
    p id: "p1", do: "Paragraph"
  end
end
--------------------------------------------------------------------------------------------
"<html><head><title>HTML DSL</title></head><body><p id=\"p1\">Paragraph</p></body></html>"

In the code above is a possible HTML DSL built using macros with domain specific custom syntax that could generate and inject a string of HTML content in place of the DSL macros during compilation. Another practical example for DSLs built with macros in Elixir, is the ecto library that provides a DSL for database queries.

Even though metaprogramming via macros is a powerful feature that opens up a lot of possibilities in Elixir, they also have some downsides. Extensive use of macros in the codebase complicates the clarity between Elixir’s native constructs and the custom syntax and constructs introduced by users. This makes it harder for developers to read, understand, debug, maintain and test the codebase, as a lot of things happen behind the screen. A lot of care must be taken to handle all the edge cases carefully when dealing with ASTs. Hence macros should only be used if the solution for the particular problem that you are trying to solve cannot be achieved by using Elixir’s available built-in constructs and features.