Journey into the machine: an Optimistic Killer
It happens now and then. I start very confident in my code and then:
What happened exactly ?
I started looking into it as I was testing a program. As a student at Holberton School, my teachers are pretty adamant about writing good code. By that, for the beginner I am it means accounting for all kind of edge cases without failing. Since we are not given those cases, the hunt is on to find where my code can fail and fix it before submitting.
As it happens, at one point I learned about
malloc , and with it was its coronary:
something = malloc( ... );
if (something == NULL)
And there was an exercise we had to do: Write a function that returns a pointer to a newly created 2 dimensional integers grid whose members have all been set to
And I was missing a case… So this thought came up: “I never really test this
NULL condition, maybe there is something wrong there ?”. And I set up to trigger this
if statement. I created a main function that would make bigger and bigger grids, until it would be too big for memory and I would see a
NULL , or so I thought… Here is the code
row = 1;
while (row < 50000)
printf("grid[%i][%i]\n", row, row);
g = alloc_grid(row, row);
if (g == NULL)
I compiled, and ran my code.
137: command not found
$? returns the exit status of the last command. Here it returns the exit status of the process that was launched when I ran my code. What is 137 ? It is nowhere in my code. Looking at the documentation 137 corresponds to 128 + n meaning fatal error signal n. In my case n = 9, it corresponds to a signal otherwise named SIGKILL or kill signal. This signal causes the immediate termination of the process by the Linux kernel.
What happened ? With this code I was using more and more memory, and never stopped. At one point I reached the Out Of Memory condition. To Deal with it, linux has an Out of Memory Manager (OOM), and its weapon: the OOM Killer. When I am running processes taking too much memory, the OOM makes its counts, and chooses the “best” choice to kill. In my case it was obvious. It was me making bigger and bigger grids never freeing anything.
As I was testing my code and things were not turning out as I wanted, I thought maybe it was due to the way I created my grid. Basically, there are two ways illustrated below:
The process is killed earlier when I allocate rows separately, which makes sense as it uses more memory.
Then I tried to do the same thing without initializing the grid to
0. I try to build a grid with up to 49999 rows and columns.
This means that when malloc() returns non-NULL there is no guarantee that the memory really is available. In case it turns out that the system is out of memory, one or more processes will be killed by the OOM killer.
In my case, with the first strategy, firstI use
malloc to create an array of pointers, and then I access that memory when I initialize all those pointers. So at one point, the process gets killed. It works the same way as when I initialized all the values to
0 . However, in the second case, I do not ever try to access the memory allocated by
malloc , and I can finally trigger my
So, my linux is like this crooked trader, “You want memory ? I have memory, loads of it, hardly ever runs out !”, and then, after it kills my process, “You never said you wanted to use it”. Or you can see it as “There are so many bad coders that this is a safety measure against malloc enthusiasts”.
As I am learning to code in C, I find it is very interesting to see how intertwinned the OS and C are. I always looked down on the hardware as a limitation to my creativity. But now, as I see where and get a little bit of the why it does not work, I marvel at the complexity and overall efficiency of the work the OS is doing.
Note: I obtained those results running in Ubuntu 14.04.5 LTS inside Vagrant 1.8.6 . I used gcc (Ubuntu 5.4.1–2ubuntu1~14.04) 5.4.1 20160904 to compile. Those results are machine dependent, so if you try this by yourself, you might get something different depending on your OS, its configuration, and the version of
malloc you use. Something to learn more about here…