Solving memory bump while migrating from D1 to D2 using array stomping

Written by Shivendra Tiwari and Konstantinos Stamatiadis

Published in

dunnhumby Science blog

6 min readNov 4, 2019

There are many changes to the D programming language that affect migrating source code from D1 to D2 [1]. Recently, we migrated our D applications from D1 to D2 and noticed a significant memory usage bump. In this blog, we walk you through the steps needed to solve the high memory usage using array stomping. It is quite simple to use, but it requires awareness about the appropriate scenarios where it can be used.

Introduction

D is a programming language developed based on C++ also known as Dlang. It is a multi-paradigm systems programming language created by Walter Bright at Digital Mars and released in 2001. The first version of D is referred as D1. It has redesigned some core C++ features, while also sharing characteristics of other languages, notably Java, Python, Ruby, C#, and Eiffel [2]. It is expected that you have some basic knowledge about Dlang; however, D code is pretty similar to well-known high-level languages, so it should be fairly easy to understand.

In dunnhumby Media group, we have been using D language to implement high-performance bidding applications for over a decade. The Bidding systems need high availability and low response time (in order of milliseconds). In 2010, a new version of D language was introduced which is referred as D2. We have been migrating our D1 applications into D2 to benefit from the different language improvements around thread safety in the variables, arrays, strings, templates, etc. The details of the D2 features are out of the scope of this discussion; however, they can be found in [1, 4, 5]. D1 raised the following challenges in our existing applications:

Cannot use the new language features until applications are migrated to D2.
Cannot use the new external libraries which work only in D2.
Maintenance overhead — the legacy libraries must be maintained in both D1 and D2 versions to support the applications in D1 and D2 separately.

Problem Definition

Issues with high memory usage

After we migrated one of our bidding applications, they started to consume more runtime memory as shown in the graph below. Fig 1 shows the memory usage of the application in D1 which is ~1 GB; however, in Fig 2, the memory usage of the application in D2 reached to ~1.5 GB.

Fig 1. Memory usage per instance with D1: The blue line is “GC total” at ~1 GB level and the orange line is “GC used” at ~0.95 GB level

Fig 2. Memory usage per instance with D2: The blue line is “GC total” is at ~1.5 GB level and the orange line is “GC used” at ~1.25 GB level

Reason — an aggressive memory allocation during array resizing in D2

Decreasing the length of a dynamic array, then increasing its length again and appending to it will reuse the same chunk of memory; this is called array stomping. The array stomping is not enabled in D2 by default. D2 may aggressively perform precautionary memory allocation for future use. We resize the array sizes quite often in the bidding applications to adjust the data size and reuse the existing memory pool.

To simplify the illustration of the problem, we have taken simple examples. In the below code snippet at line #5, D2 does not reallocate additional memory for the new array b; instead, it just points to the memory location of the array a (see line #8), therefore the assert is successful. As soon as D2 executes line #11 to append a new element in the array, it reallocates a new array with additional memory. The problem is that the memory re-allocation is done considering similar append operations possibilities in the future. In case, you don’t intend to really perform more future appends, it is useless to allocate additional memory.

Now, imagine a huge number of places performing array resizing. It is even more dangerous if the allocation for every request. Our bidding applications receive over 400k requests per second. Therefore, this behavior of precautionary memory allocation resulted in a huge memory bump.

Another example is as given below which creates a similar behavior after increasing the capacity of the array. In line #11, the length of the array is set to 4 which is just same as the size of the array a. Even after resizing, it could use the same memory location as array a. However, setting the new length triggers memory reallocation instead as D2 thinks that this operation might repeat. Now, as a result, the arrays a and b now point to the different memory locations in line #14.

How to Solve the Memory Bump in D2?

Use assumeSafeAppend() method when possible

The method assumeSafeAppend() call tells the runtime to assume that it is safe to append to this array’s contiguous memory. The append operation made to this array after calling assumeSafeAppend() function, appends the new values into the same array. It is to be noted that this method should be used only when it is certain that the contiguous memory is safe to use. It is also important to ensure that the available amount of memory matches the required size. If there are values in the target memory locations, those elements will be overwritten.

Why the problem did not occur in D1?

The array stomping is enabled by default in D1. Therefore, it was not a problem with reallocation in the D1 applications. D2 instead prevents array stomping by default.

Further related tip: Use the Garbage Collector’s (GC) Runtime Configuration

Since version 2.067, The garbage collector can now be configured through the command line, the environment or by options embedded into the executable. It is important to keep in mind to use the correct GC parameters while running the application. We needed to use the same parameter we were using in D1 as below:

After using the above runtime GC parameter, and with the use of assumeSafeAppend() function call before the array resizing, the memory usage went drastically down which is similar to the applications running in D1 (shown in Fig 3).

Fig 3. Memory usage per instance with D2 (after array stomping and GC configuration): The blue line is “GC total” is at ~1 GB level and the orange line is “GC used” at ~0.95 GB level

Conclusion

Array stomping is nice to use when you know what you are doing. Enabling array stomping has two major benefits:

Performance — since the re-allocation of the memory is reduced, the performance of the systems remains favorable.
Controlled memory usage — since it enables reuse of the existing memory, it can be beneficial to use memory stomping.

On the other hand, memory stomping has its own drawbacks:

Chances of making mistakes are high. In case, a mistake is made, it could result in memory corruption as it can overwrite the memory which may belong to other objects.
When using array stomping, the level of memory management required may defeat the benefit of the garbage collector.

Overall, we found the array stomping a great help for resolving the memory usage in our applications.

Credit

This improvement is a result of teamwork. We want to tender special thanks to our colleague Stefan Koch for his help on the topic.

References

1. Migrating D1 Code to D2 — D Programming Language

2. D (programming language) — Wikipedia

3. Real-time bidding — Wikipedia

4. D1->D2 Part 3: magic module | Sociomantic Labs

5. sociomantic-tsunami/d1to2fix: Tool to automatically port code from D1 to D2 (based on dfix)