Understanding storage of strings in C++ - Part 2: Stack

Heart Bleed
Obscure System
Published in
4 min readFeb 15, 2021

In the previous article(Understanding storage of strings in C++-Part 1…. Stack or heap?), we tried to understand if string class stores the sting in stack or heap. In a gist, what we concluded in the previous article was that, when the strings are small, they are stared in the stack. When they are large they are stored in the heap. How large? that is defined by the compiler and underlying Operating System

In this article, we will try to understand how the strings are stores in the stack. We will be using GDB to understand this. GDB is a GNU debugger where we can debug programs, watch the stack and heap of the program’s memory. Before we head on to GDB analysis, let’s go over a few string functions available in string class which will help us understand the memory layout better.

Size() — return the actual length of the string(number of chars).Capacity()- the allocated memory of the string. This can be larger or equal to the size, but never less.c_str() — It returns c style string. (Address to the first character terminated by a NULL character).

Strings in Stack

Let’s begin with our GDB analysis to understand how strings are stored in stack. This article assumes you have a basic understanding of the stack/heap memory and basic GDB commands. Let’s use a slightly modified program from the previous article.

In the above program, we store a small string in a variable and print it out. The strings are less than 16 characters, so they are stored in the stack. Let’s GDB it.

$ g++ StringStack.cpp -g -o StringStack;./StringStack 
testString1

The -g flag is required to add debug info into the executable. We start the DGB for the StringStack executable. We break at line 13 of the program. Let’s see the contents of the variables. Also, let’s try all the string functions we saw above on the variables.

$ gdb ./StringStack 
...
Reading symbols from ./StringStack...
(gdb) break StringStack.cpp:13
Breakpoint 1 at 0x13d2: file StringStack.cpp, line 13.
(gdb) run
Starting program: /home/ragha/MyIntr/test/StringStack
testString1
Breakpoint 1, main () at StringStack.cpp:13
15 cout << testString1 << endl;
(gdb) print testString1
$1 = "testString1"
(gdb) print testString1.size()
$2 = 11
(gdb) print testString1.capacity()
$3 = 15
(gdb) print testString1.c_str()
$4 = 0x7fffffffdbc0 "testString1"

You can see the capacity of the variable is 15. As we discussed in the previous articles. Till the string length of 15 (16 -1 null char), we store the string in the stack, beyond 15 we move the string to the heap. So the default capacity is 15. This is again implementation-specific. It will not be the same in all OS and compilers.

Dump the memory of the string address

Let’s dump the memory when the string is stored. This is a 64 bit system. The addresses will increment in 8bytes. Let’s dump 2 words(64 bit data). GDB lets you dump the data is various sizes(byte, halfword - 2bytes, word - 4bytes, gaint word - 8bytes). GDB also let’s you dump the data in various formats(strings, chars, int, hex, oct, etc..). Let’s dump the memory in both integer and hex.

print &testString1
$5 = (std::string *) 0x7fffffffdbb0
(gdb) x/2gd 0x7fffffffdbb0
0x7fffffffdbb0: 140737488346048 11
(gdb) x/2gx 0x7fffffffdbb0
0x7fffffffdbb0: 0x00007fffffffdbc0 0x000000000000000b

The memory dump follows the below format.

x /[Length][Format]

Length is the length of the data we need.
Format specified the way we want to represent the data

Let’s represent the above dump in a more understandable format

+----------------+-----------------+----------------+
| Address | Value Int | Value Hex |
+----------------+-----------------+----------------+
| 0x7fffffffdbb0 | 140737488346048 | 0x7fffffffdbc0 |
| 0x7fffffffdbb8 | 11 | 0xb |
+----------------+-----------------+----------------+

The value stored in the address 0x7fffffffdbb8 is pretty evident. It indicates the length/size of the string. The value in address 0x7fffffffdbb0 looks like an address. Also, it looks like the next address after 0x7fffffffdbb8. Let’s dump that as a string.

(gdb) x/s 0x7fffffffdbc0
0x7fffffffdbc0: "testString1"

Let’s try to map all the data we have got so far to a table

+----------------+----------------+
| Address | Value |
+----------------+----------------+
| 0x7fffffffdbb0 | 0x7fffffffdbc0 |
| 0x7fffffffdbb8 | 11 |
| 0x7fffffffdbc0 | "testString1" |
+----------------+----------------+

We can see that the address to the string contains an address to the starting character(0x7fffffffdbc0) followed by the length(11).

Hack the string

Let’s try to make it more interesting by hacking the string a little. We will try to increase the size of the string manually in GDB.

(gdb) set {int}0x7fffffffdbb8 = 15

Let’s set the size to the max allocated capacity of the string(15). Increasing beyond that may mess up the stack. No harm is increasing beyond the 15. It may just crash.

(gdb) print testString1.size()
$6 = 15
(gdb) print testString1
$7 = "testString1\000UU"
(gdb) print testString1.c_str()
$8 = 0x7fffffffdbc0 "testString1"

Printing the size now returns 15 rather than 11. See the difference between printing the string as C++ string and C string. C++ believes the length 15 and tries to print 15 chars. The original string is printed with the null at the end followed by garbage. Whereas C terminates after null char and prints the expected string disregarding the length.

Summary

The GDB analysis shows that the string class is a address followed by length. The capacity is fixed when stored in stack. The C++ believes the length and tries to print the string with the specified length. Let’s see how much of this holds when the string is stored in heap in out next article - Understanding storage of strings in C++ - Part 3: Heap.

--

--