Finding Memory Leaks in C - 2.0

Published in

The Startup

5 min readSep 15, 2020

The place matters

The main idea of this article is to describe an approach that gives an ability to find memory leaks in C code on macOS. It considers one of the possible options for finding memory leaks and represents a skeleton that may be extended if necessary.

In contrast to the previous article, this approach is not entirely POSIX compatible since it uses backtrace and backtrace_symbols functions that are not a part of the IEEE 1003.1 standard. So, if your operating system does not provide these functions, you should find alternatives on your own.

The approach was tested on macOS Catalina 10.15.6 only with clang compiler by version 11.0.3.

The article has a more practical nature than theoretical. It touches some deep and exciting system mechanisms, but it only mentions them and does not dive too deep into system runtime. Though it can be a quite exciting journey, it’s beyond of article’s scope.

In the previous article was considered a basic approach to detect memory leaks. The only thing that approach does is informing a developer if any memory leaks happened in the run application and their number. Generally speaking, it’s not enough in most cases. We usually want not just to know that there are any leaks in the application, but we also want to fix them. To fix them we need to know where they happened. More precisely — we want to know the exact place where a memory piece was allocated.

Below we will consider one of the possible solutions on how to achieve that.

Just run

The process is quite the same as in the previous article:

Download the file with leak checking code here.
Place it into your project anywhere you want. Just be sure that it compiles and links with the entire project. So you need to add it to your Makefile, project file, etc.
In your main.c file add the declaration of check_leaks function at the beginning of the file.
Add check_leaks function call before exiting main() function end.

Usage

5. Run your application.

6. Check the output.

If you have any memory leaks you should see detailed information about them — their addresses, leaked memory sizes and call stack.

Dig a little bit deeper

If you need just to run the code to detect memory leaks, you may stop reading here. This section contains some technical details about the main concepts lying in our solution. Let’s consider the main idea of the approach that we used. It based on two key concepts:

Intercept malloc/free functions calls.
Obtain the call stack info to get more details where a leak happened.

First of all, to start our discussion, let’s define the terms. What is a memory leak? In a narrow meaning, it’s a piece of memory that was allocated and wasn’t freed. So our task is to detect all memory allocation and all memory deallocations in our application. After that, we can compare allocations and deallocations counts, and if they are different — we have got memory leaks.

The other task — detect the places where leaked pieces were allocated as precisely as possible.

Look into these tasks one by one.

System functions interception

To intercept malloc/free functions, we used dlsym function. dlsym is a part of the POSIX standard, so you may easily find its description in man dlsym or in the POSIX standard document (see References section). Here we will just demonstrate how it may be used for our goals.

The code snippet above demonstrates the main idea of system functions interception. We will follow it not line by line but by calling logic.

In (4) we placed a function that has the exact same signature as the system malloc function does. When we call malloc anywhere from our application, it is this implementation will be called. Next (5) we check if we have already initialized our memory leaks detection logic, and if we haven’t, we call malloc_init. The malloc_init does the second trick — it calls dlsym function that returns a pointer to the real malloc function and stores it in reall_malloc static variable declared at (1).

Then in (6) we increment malloc_counter variable declared at (2) that gives us the ability to count memory leaks at the end of our program.

And as the final step (7) we call real_malloc function that does real memory allocation and returns a pointer to the caller.

In the same way we intercept function free and may intercept others.

We should notice here that any system call may be intercepted in this way. It gives the developer a huge room to experiment and tune their applications.

Obtaining call stack

Having information where a malloc call has been made give the developer an ability to find it fast and fix it. Call stack is one of the approaches that can help with that. It’s not 100% precise since it does not give any information about a file name and a line where the call has been made, but it still provides a lot of information and makes it easier to find the leaked call.

Here is the example from man backtrace. It gives the main idea of how we can obtain a call stack information.

In (1) we declare an array of pointer to void, we will store pointers to functions on the call stack. Then we call backtrace (2) function that fills the pointers array with call stack functions pointers and returns its size. In (3) we call backtrace_symbols that ‘converts’ pointers to functions to their names. As a result, we obtain C strings array with information about functions on the stack, including their names.

Here we should notice that it’s not necessary to call free function for strs and its content. backtrace_symbols returns pointers but does not pass ownership to that memory area. It handles this memory somewhere under the hood. Moreover, it does not call malloc, so it won’t lead to stack overflow.

After obtaining call stack info, we may parse it and use it in any way. See the full implementation for details. Probably you may want to do something different, so feel free to write your own implementation that requires your needs.

What’s next

For sure not all the allocation functions has been considered in this article. There are still calloc, realloc and others. You may intercept them too if you use them in your project. You may also write logs into a file instead of stderr, group leaks if they have same call stack and so on.

Actually, this approach gives you the total power on memory allocation management. If you are brave enough, you may experiment with the code and implement almost any logic to play with memory leaks.

Happy coding.

References

Sources for this solutions on gist.
POSIX IEEE 1003.1-2017 document available for free download
man dlsym
man backtrace — documentation for both backtrace and backtrace_symbols functions.