C++ Lambda Under the Hood

Understand the closure generated from a lambda expression

EventHelix
Software Design
9 min readJun 29, 2019

--

Lambda expressions in C++ let developers define anonymous functions that can be used to inject code. Lambda expressions are more than just functions, they can capture the context from the code enclosing the lambda function. Here we will go under the hood and see how lambda functions operate by generating a closure.

A simple lambda expression

Consider the code presented below, a function visit that iterates over a vector and invokes the passed visitor. Note here that visitor does not need to be function, it could be any type that overloads operator ().

The main function passes the lambda function:

[] (int a) { printf(%d\n", a); }

#include <vector>
#include <cstdio>
template <typename T, typename F>
void visit (std::vector<T>& vec, const F& visitor) {
for (auto item : vec) {
visitor(item);
}
}
int main() {
std::vector<int> vec = {0, 1, 2};
visit(vec, [](int a){ printf("%d\n", a);});
return 0;
}

If we use cppinsights.io to convert this code in plain C++ code, we observe a few key points:

  1. The visitor function call is really an invocation of the operator overload.
  2. A class __lamba_15_16 has been generated for the lambda function. For reasons that will become clear later, this class is called the closure.
  3. The __lamba_15_16 overloads operator() and includes the body of the lambda function [] (int a) { printf(%d\n", a); }.
  4. The function passed to the visit function is not limited to a lambda function. It could be a pointer to any function that returns void and takes an int as the only parameter ( retType_15_16).
  5. Finally, the closure object of the type __lamba_15_16is passed to the visit function.

These points have been highlighted in bold in the cppinsights.io generated code.

#include <vector>
#include <cstdio>
template <typename T, typename F>
void visit (std::vector<T>& vec, const F& visitor) {
for (auto item : vec) {
visitor(item);
}
}
/* First instantiated from: insights.cpp:15 */
#ifdef INSIGHTS_USE_TEMPLATE
template<>
void visit<int, __lambda_15_16>(std::vector<int,
std::allocator<int> > & vec, const __lambda_15_16 & visitor)
{
{
std::vector<int, std::allocator<int> > & __range1 = vec;
__gnu_cxx::__normal_iterator<int *, std::vector<int, std::allocator<int> > > __begin0 = __range1.begin();
__gnu_cxx::__normal_iterator<int *, std::vector<int, std::allocator<int> > > __end0 = __range1.end();
for(; __gnu_cxx::operator!=(__begin0, __end0); __begin0.operator++())
{
int item = __begin0.operator*();
// 1. Invoke the lambda via the () overloaded () operator
visitor.operator()(item);
}
}
}
#endif
int main()
{
std::vector<int> vec = std::vector<int, std::allocator<int> > {std::initializer_list<int>{0, 1, 2}, std::allocator<int>()};
// 2. Each lambda function is mapped to a light weight class
// that overloads the () operator.
class __lambda_15_16
{
public:
// 3. Lambda implemented as an overload.
inline void operator()(int a) const
{
printf("%d\n", a);
}


// 4. Operator overload for invoking the lamdba
// via function pointer.

using retType_15_16 = void (*)(int);
inline operator retType_15_16 () const
{
return __invoke;
};

private:
static inline void __invoke(int a)
{
printf("%d\n", a);
}
};

// 5. The closure object is passed to the visit function.
visit(vec, __lambda_15_16{});
return 0;
}

Looking at the above code, one might fear that the compiler will generate pretty complex code. The Compiler Explorer generated code sets those fears to rest.

The generated code is really efficient. The compiler performs the following optimizations:

  • Inlines the visitor function.
  • Unrolls the for loop in the visitor.
  • Inlines the lambda function [] (int a) { printf(%d\n", a); }.
main: # @main
push rax
mov edi, offset .L.str.1
xor esi, esi
xor eax, eax
call printf
mov edi, offset .L.str.1
mov esi, 1
xor eax, eax
call printf
mov edi, offset .L.str.1
mov esi, 2
xor eax, eax
call printf
xor eax, eax
pop rcx
ret
.L.str.1:
.asciz "%d\n"

The generated code shows the power of lambda functions over function pointers. Since the compiler could see the code for the lambda function, it was able to deeply optimize the code.

Lambda expression with value capture

Lambda expressions can capture of outer scope variables into a lambda function. The captured variables are stored in the closure object created for the lambda function.

The following code demonstrates how a lambda function could be executed in a new thread using the std::async. The main function and the std::async lambda operate in parallel.

Further, we see how variables in the outer scope can be captured by value into the std::async lambda function. Here we see a variable x is captured at the time of invoking the new thread via std::async. If the main thread goes ahead and changes the value of x, the captured x in the lambda function is not impacted.

The lambda in the following code is highlighted. [=] specifies that the outer scope variables may be captured by value. In this example, x has been captured. The [=] capture specifies that any variable may be captured. If you wish to be explicit about the captures, you can enclose them within the square brackets. For example, [x] would have captured x by value.

#include <cstdio>
#include <chrono>
#include <thread>
#include <future>
// Simulate a long computation
int long_computation(int x, int duration_seconds,
const char identifier) {
for (auto i = 0; i < duration_seconds*10; ++i)
{
std::this_thread::sleep_for(std::chrono::milliseconds{100});
printf("%c",identifier);
}
return 42 + x;
}
int main()
{
auto x = 5, y = 6;
printf("main thread : x = %d, y = %d\n", x, y);
printf("main thread : launching lambda thread\n");
// Execute the lambda function in a separate thread.
// The function returns a future.
auto result_future = std::async(std::launch::async, [=]() {
printf("lambda thread : x = %d\n", x);
auto answer = long_computation(x,6,'l');
printf("\nlambda thread : long computation completed\n");
printf("lambda thread : x = %d\n", x);
return answer;}
);
// At this point main thread and lambda thread
// are executing in parallel.
x = 0;
printf("main thread : x = %d, y = %d\n", x, y);
auto main_thread_result = long_computation(x, 3, 'm');
printf("\nmain thread : long computation completed\n");
// Wait for the result from the lambda thread
auto lambda_thread_result = result_future.get();
printf("main thread : result = %d\n", main_thread_result);
printf("lambda thread : result = %d\n", lambda_thread_result);
}

The output of the above code clearly illustrates the value capture of x.

  1. The main thread starts with x = 5.
  2. The std::async lambda function picks up x = 5 by value.
  3. The main thread sets x = 0. This value is not updated in the std::async lambda thread.
main thread   : x = 5, y = 6
main thread : launching lambda thread
main thread : x = 0, y = 6
lambda thread : x = 5
lmlmmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmllmlm
main thread : long computation completed
llllllllllllllllllllllllllllll
lambda thread : long computation completed
lambda thread : x = 5
main thread : result = 42
lambda thread : result = 47

Now let’s get under the hood and look at the highlighted function object code generated for the lambda capture.

  1. The __lambda_22_54 function object captures the value of x via a constructor.
  2. The captured x is stored in the __lambda_22_54 object when the lambda expression is instantiated.
  3. The lambda expression is mapped as an operator () overload.
  4. Also, note here that the lambda expression can no longer be passed as a regular C-style function pointer as there is no place to store x in a closure. The does not contain a C-function pointer overload.
#include <cstdio>
#include <chrono>
#include <thread>
#include <future>
// Simulate a long computation
int long_computation(int x, int duration_seconds, const char identifier)
{
for(int i = 0; i < duration_seconds * 10; ++i)
{
std::this_thread::sleep_for(std::chrono::duration<long, std::ratio<1, 1000> >{100});
printf("%c", static_cast<int>(identifier));
}

return 42 + x;
}
int main()
{
int x = 5;
int y = 6;
printf("main thread : x = %d, y = %d\n", x, y);
printf("main thread : launching lambda thread\n");

class __lambda_22_54
{
int x; // NOTE: Capture is saved as a value
public:
inline int operator()() const
{
printf("lambda thread : x = %d\n", x);
int answer = long_computation(x, 6, 'l');
printf("\nlambda thread : long computation completed\n");
printf("lambda thread : x = %d\n", x);
return answer;
}

public: __lambda_22_54(int _x)
: x{_x}
{}

};


std::future<int> result_future = std::async(std::launch::async, __lambda_22_54{x});
x = 0;
printf("main thread : x = %d, y = %d\n", x, y);
int main_thread_result = long_computation(x, 3, 'm');
printf("\nmain thread : long computation completed\n");
int lambda_thread_result = result_future.get();
printf("main thread : result = %d\n", main_thread_result);
printf("lambda thread : result = %d\n", lambda_thread_result);
}

Lambda expression with reference capture

Lambda expressions may be invoked with a reference capture. When a reference capture is used, the captured value is really a reference to the variable in the outer scope of the lambda.

The lambda in the following code is highlighted. [&] specifies that the outer scope variables may be captured as a reference to the outer scope. In this example, a reference to x has been captured. The [&] capture specifies that any variable may be captured as a reference. If you wish to be explicit about the captures, you can enclose them within the square brackets. For example, [&x] would have captured x by reference.

#include <cstdio>
#include <chrono>
#include <thread>
#include <future>
// Simulate a long computation
int long_computation(int x, int duration_seconds,
const char identifier)
{
// Print a character to mark the progress of the computation.
for (auto i = 0; i < duration_seconds * 10; ++i)
{
std::this_thread::sleep_for(std::chrono::milliseconds{100});
printf("%c", identifier);
}
return 42 + x;
}
int main()
{
auto x = 5, y = 6;
printf("main thread : x = %d, y = %d\n", x, y);
printf("main thread : launching lambda thread\n");
// Execute the lambda function in a separate thread.
// The function returns a future.
auto result_future = std::async(std::launch::async, [&]() {
printf("lambda thread : x = %d\n", x);
auto answer = long_computation(x, 6, 'l');
printf("\nlambda thread : long computation completed\n");
printf("lambda thread : x = %d\n", x);
return answer; }
);
// At this point main thread and lambda thread
// are executing in parallel.
x = 0;
printf("main thread : x = %d, y = %d\n", x, y);
auto main_thread_result = long_computation(x, 3, 'm');
printf("\nmain thread : long computation completed\n");
// Wait for the result from the lambda thread
auto lambda_thread_result = result_future.get();
printf("main thread : result = %d\n", main_thread_result);
printf("lambda thread : result = %d\n", lambda_thread_result);
}

Now contrast the output with a reference captured x with the previous example where the x was captured by value.

  1. The main thread starts with x = 5.
  2. The main thread sets x = 0. This value is reflected in the std::async lambda thread.
  3. The std::async lambda function picked up x = 0by value. Note that the lambda thread initial value happens to be 0 as it is quite likely that the main thread with set x = 0 before the lambda thread has a chance to run. There is however no guarantee. Due to vagaries of scheduling, it is possible that the lambda thread may find x to be 5. During execution, x gets updated to 0 when the main thread sets the value to 0.
main thread   : x = 5, y = 6
main thread : launching lambda thread
main thread : x = 0, y = 6
lambda thread : x = 0
mlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlmlm
main thread : long computation completed
lllllllllllllllllllllllllllllll
lambda thread : long computation completed
lambda thread : x = 0
main thread : result = 42
lambda thread : result = 42

The generated function object in the reference capture differs from the value capture case at just two points:

  1. A reference to x is passed in the constructor of the function object.
  2. The reference to x is saved in the function object.
#include <cstdio>
#include <chrono>
#include <thread>
#include <future>
// Simulate a long computation
int long_computation(int x, int duration_seconds, const char identifier)
{
for(int i = 0; i < duration_seconds * 10; ++i)
{
std::this_thread::sleep_for(std::chrono::duration<long, std::ratio<1, 1000> >{100});
printf("%c", static_cast<int>(identifier));
}

return 42 + x;
}
int main()
{
int x = 5;
int y = 6;
printf("main thread : x = %d, y = %d\n", x, y);
printf("main thread : launching lambda thread\n");

class __lambda_26_55
{
int & x; // NOTE: Capture is saved as a reference
public:
inline int operator()() const
{
printf("lambda thread : x = %d\n", x);
int answer = long_computation(x, 6, 'l');
printf("\nlambda thread : long computation completed\n");
printf("lambda thread : x = %d\n", x);
return answer;
}

public: __lambda_26_55(int & _x)
: x{_x}
{}

};


std::future<int> result_future = std::async(std::launch::async, __lambda_26_55{x});
x = 0;
printf("main thread : x = %d, y = %d\n", x, y);
int main_thread_result = long_computation(x, 3, 'm');
printf("\nmain thread : long computation completed\n");
int lambda_thread_result = result_future.get();
printf("main thread : result = %d\n", main_thread_result);
printf("lambda thread : result = %d\n", lambda_thread_result);
}

Explore more

Sample lambda function code

The examples presented above can be downloaded from the C++ tutorial repository on GitHub. The repository also includes the additional example of a this capture.

Lambda functions are syntactic sugar for function objects

The following video introduces lambda functions as a convenient syntax for using function objects.

--

--