Elixir and NIF: A Case Study
Hello, devs! I’m Rodrigo Caldeira, Software Engineer at SumUp in São Paulo and I’ll be sharing some thoughts about Elixir and NIF with a real story. This is my first contribution here, so I welcome any feedback!
A little bit of context
I’ve been working at SumUp since Jan 2020 and after some reshuffles, I’m now in the Business Bank unit working directly with PIX features.
long with others events and meetings, there are two main events that the EP&D (Engineering, Product and Design) departments attend:
Lunch and Learn: In some places this is known as brown bag, the goal here is to have the opportunity to share knowledge during lunch time every Friday –anyone in EP&D can present about any related subject.
HackDay: Every two weeksan entire Friday is reserved to work with any non-SumUp related project. We can do work onliterally anything on this day.
With that said, this post details a bit more about my experience with theseevents.
During one HackDay, I was studying a way to do Slack integrations (without success =( ), when suddenly a QA Engineer called me.
- QA Engineer: Hi, Caldeira! How are you? Could you help me with my HackDay project, please? I was trying to do something here, but now I’m stuck =/
- Me: Sure! How can I help you? (at this point I had already given up on my project)
- QA Engineer: Awesome! So, I’m trying to use a software to automate some tests here, but this software doesn’t have an Elixir plugin (we use Elixir here in the Bank). Looking in the docs I found out that it is possible to create a plugin with their C library. Do you know how to do that?
- Me: Whoa! That’s a tough one! I never did that, but I know that’s possible with NIF. Let’s create a simple project to study a little bit, and then we come back with that library. What do you think?
- QA Engineer: Great! Let’s do it!
Spoiler: We didn’t came back with that library.
So, that’s the whole scenario here. The QA Engineer and I started a study case about Elixir and NIF on HackDay. The outcome of that HackDay was a Lunch and Learn that I presented two weeks after, and that’s what I’m sharing with you right now.
The case
Elixir systems run over the Erlang BEAM virtual machine, and NIFs (Native Implemented Functions) are the way to extend Erlang software through loading and executing native pieces of software. Here, those software can be written in any language that compiles to native components, like Rust or C, and in this example I’ll create and use a library in C and use it in a simple Elixir module.
So, let’s start with our native library. It’s a simple calculator that exposes four functions that receives two integer parameters and returns one integer:
For non-ortuguese speakers, my guess is that the functions’ names should be straightforward, but here is the translation:
somar -> sum
subtrair -> subtract
multiplicar -> multiply
dividir -> divide
Great! Now that we have our library defined, let’s compile it:
$ gcc -o lib_calc.so -c lib_calc.c
No errors, no warnings. Click noice!
BTW, I’m using Ubuntu on WSL2 for our example, but you should not face any problems with other distros.
Now it’s time to test our library. To do that, I’ll create another C program that will receive three parameters:
- An integer number
- The operator
- An integer number
And compiling it:
$ gcc -o calc calc.c lib_calc.so
Once again, no errors.
Now, let’s run our calculator:
$ ./calc 1 + 1
1 + 1 = 2
Awesome! It works!
Now, notice that in our calculator program I didn’t do any kind of checking about the parameters sent to it. So, if we run it with an unexpected input, this is the result:
$ ./calc 1
[1] 388 segmentation fault ./calc 1
That’s OK for us, for we are not interested in the C calculator.
So, with all that ready, how can we use our library inside an Elixir program?
The solution
To achieve that, we need to write another C program that will represent our NIF to Erlang and expose our calculator library to our Elixir module.
Holy moly! That’s a lot of code for a simple library! Let’s dig in.
The first line:
#include <erl_nif.h>
is the baseline of our NIF. It is the header of all basic NIF libraries with the functions and macros needed to create the NIF. It is located in the erlang-dev package.
After that, we have four more functions, each one representing the functions in our C library from a vertical point of view, so I will focus on only one of them, as the rest is basically the same.
static ERL_NIF_TERM somar_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
int a, b, result;
enif_get_int(env, argv[0], &a);
enif_get_int(env, argv[1], &b);
result = somar(a, b);
return enif_make_int(env, result);
}
These lines declare a new static function called somar_nif that returns an ERL_NIF_TERM (a type that represent any Erlang term), and expects three arguments:
ErlNifEnv* env
is a pointer that represents an environment that can host Erlang terms. Let’s consider it as the environment that is running our NIF.int argc
contains the number of arguments that was passed to the function.const ERL_NIF_TERM argv[]
are the arguments passed to the function.
This resembles a lot like a regular main
function in any C program.
int main(int argc, char ** argv)
When you have to read argv to get the values passed to your function, based on the number or arguments contained in argc.
And that’s exactly what is happening inside the function:
// Our C variables
int a, b, result;// Reads the first value and stores it in a
enif_get_int(env, argv[0], &a);// Reads the second value and stores it in b
enif_get_int(env, argv[1], &b);// Our lib_calc function been called!
result = somar(a, b);// Transforms the result into an ERL_NIF_TERM and returns it
return enif_make_int(env, result);
Here the argc is being totally ignored, as we already know that exactly 2 values are being passed as arguments.
After defining all our NIF functions, we have to inform the Erlang NIF API how to call them.
static ErlNifFunc nif_funcs[] = {
{"somar", 2, somar_nif},
{"subtrair", 2, subtrair_nif},
{"multiplicar", 2, multiplicar_nif},
{"dividir", 2, dividir_nif},
};ERL_NIF_INIT(Elixir.Calc, nif_funcs, NULL, NULL, NULL, NULL)
The static ErlNifFunc nif_funcs[] is an array of ErlNifFunc struct. This struct is defined as having the following variables:
- name: The NIF function’s name, that will be exposed in our NIF
- arity: The NIF function’s arity
- function: The pointer to the function that will be called when the Erlang/Elixir module calls the NIF
There is a fourth variable in ErlNifFunc struct that is the flags, but for our example it can be omitted.
The last piece of code in our NIF is the ERL_NIF_INIT macro call, passing the module name, the functions that will be exposed in our NIF and points to functions dedicated to treat load, reload, upgrade and unload events (ignored here in our example).
Notice that the module name is Elixir.Calc, and not just Calc. That’s necessary because our goal is to use this NIF in an Elixir module, and all Elixir modules from the Erlang perspective start with Elixir.
Phew! A lot of work here! Let’s compile it and see what happens.
$ gcc -shared -o lib_calc_nif.so -fPIC lib_calc_nif.c lib_calc.so
Great! Again, no errors or warnings.
Notice the -fPIC flag passed to gcc. This is to inform gcc to create a Position Independent Code, which will generate an assembly code with relative address references.
And now, the moment of the truth! Let’s create an Elixir module!
This, dear devs, is our Elixir module that will call our NIF! Taking a look you will notice this:
@on_load :load_nifsdef load_nifs do
:erlang.load_nif('./lib_calc_nif', 0)
end
This defines a callback that will be executed when the module is loaded (@on_load :load_nifs), and the callback will load our NIF (:erlang.load_nif(‘./lib_calc_nif’, 0)). Let’s see it in action!
$ iex
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1]Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> c("Calc.ex")
[Calc]
iex(2)> Calc.somar(1, 2)
3
iex(3)>
Hooray! It works!
Now, notice all the other functions defined in our module. They’re just fallback functions, just in case the NIF was not loaded for whatever reason.
That’s really great! But…
Not so fast! There are a lot of things to be considered here before we move on with our excitement.
First, remember how the arguments are declared in our C library?
int somar(int a, int b);
This function expects integer values. This behavior was passed all along with our journey here.
In fact, our NIF tries to convert the first argument to an integer.
enif_get_int(env, argv[0], &a);
What happens if we try to pass a value from another type?
Like a string, for example:
iex(3)> Calc.somar("test", 1)
32753
What?
Let’s try with a float:
iex(7)> Calc.somar(1.0,1)
32753
iex(8)> Calc.somar(1,1.0)
1251267553
iex(9)>
oO???
What is happening here?!
Worse than that, if we keep calling the function with exactly the same value:
iex(8)> Calc.somar(1,1.0)
1251267553
iex(9)> Calc.somar(1,1.0)
1251267553
iex(10)> Calc.somar(1,1.0)
1251531297
iex(11)> Calc.somar(1,1.0)
1251531297
iex(12)> Calc.somar(1,1.0)
1251288249
iex(13)>
The return value changes!
This is because we didn’t check inside our NIF to see whether the conversion was successfully done or not. So basically here we are getting junk values from the conversion.
Just trying to convert and sum (subtract, multiply or divide) and ignore the variable itself. So this can do no harm to our module. But remember, we are dealing with C here. Not Elixir, not Erlang. C. Pointers, memory… Can you imagine the scenario?
Besides that, what if we try to divide by zero?
iex(13)> Calc.dividir(1,0)
[1] 501 floating point exception iex
The BEAM crashed! That’s a huge problem with using NIFs: if the NIF crashes during the execution, the entire BEAM crashes.
So this is a feature that must be used very, very, very carefully.
Final considerations
I really love to study these kinds of topics. Understanding how things work under the hood is a powerful method to discover new possibilities and explore and extend my knowledge about a subject.
I don’t intend to put any of these things in production, unless it’s totally necessary:
- If there is no time to develop an entire feature that is already implemented in a native library.
- If there is a bug in Erlang modules that prevents you to deploy your feature.
- If that native lib is so exclusive, so unique, and solves a huge problem, and there is no alternative to it.
I think you get the point here.
The source code of this project can be found at https://github.com/rodrigocaldeira/nif_cgo and there is a bonus there!
Thank you so much for following along!
Quer uma nova oportunidade de carreira com desafios globais? Confira nossas vagas em Engenharia e Produto e conheça mais sobre a SumUp.