Symbolism in NM

Heart Bleed
Obscure System
Published in
5 min readJul 1, 2019

NM is a GNU development tool available in linux. It is used to extract symbols from object file and depict what kind of symbol it is. There are various types symbols nm can extract. In this article I am planning to explain a few of them.

Before we understand the types of symbols, we need to understand the segments of a program.

A object file contains various segments as described in the images. All variable declared in the program will be in one of these segments.

Text: Contains the program code. It is a read-only segment

Data: Contains global initialized variables. It can be divided in two segments again. Read-only and read-write segment.

BSS: Uninitialized global variables. Global variable which does not hold a value.

Stack: All section mentioned above are of fixed size. Stack and Heap can grow dynamically. The stack stores info such as local variables, command arguments and function calls.

Heap: Contains variable allocated dynamically in the program. Memory allocated using malloc , new, etc., are stored in the heap.

Now let’s get back to NM. NM basically says which segment the variables belong to. NM can be more specific that he above diagram, but they ultimately end up within the segments depicted above.

Lets write a simple program to understand the symbols above.

If we try to compile the program into executable it will through up an error.

$gcc nm_symbols.c -o nm_symbols.o
/tmp/cc2Dlrvz.o: In function `main':
nm_symbols.c:(.text+0x43): undefined reference to `sub'
collect2: error: ld returned 1 exit status

This is because of the a undefined function sub. When an executable is built we need all the symbols to be defined. Let’s try to compile it into a object file. Object files cannot be executed and can live with undefined functions.

$ gcc -c nm_symbols.c -o nm_symbols.o

Let’s use nm on the above object file and try to understand the various variables.

$ nm test.o 
0000000000000000 T add
0000000000000000 D globalInitializedVariable
U _GLOBAL_OFFSET_TABLE_
0000000000000004 C globalUninitializedVariable
0000000000000008 d local_static.1807
0000000000000014 T main
0000000000000004 r readOnlyStatic
0000000000000000 R readOnlyVariable
0000000000000004 d staticInitializedVariable
0000000000000000 b staticUninitializedVariable
U sub

Let us take it one by one.

int add(int a, int b)

T add: Add is a function. T means text. So as expected, functions will appear in the text section because all code appears in the text section.

int globalInitializedVariable = 10;

D globalInitializedVariable: D is Data segment. Data segments are initialized data. As we have initialized the global variable to a value, it makes sence it falls within this segment.

int globalUninitializedVariable;

d staticInitializedVariable: d/D is data segment. Variables that are initialized. d/D is initialized variables belonging to the data section. The difference between d and D is d is for local variable and D is for global variable. All static declared variables are local in scope and global in lifetime irrespective of where they are declared. I will explain what I mean by this in a separate article. For now just understand static variables are local by scope. This is the reason all the static variables are lower cased.

static int staticUninitializedVariable;

b staticUninitializedVariable: This variable is uninitialized and therefore belong to the b/B section. As above it is lower cased because it is static variable. All uninitialized variables belong to the BSS segment.

int globalUninitializedVariable;

C globalUninitializedVariable: C is common Uninitialized section. The variable we have defined is global uninitialized data. So shouldn't it be in B(BSS), the uninitialized section. Let me take this separately at the end of the article.

static int local_static = 20;

d local_static.1807: This is local_static variable we defined inside the main function. The number at the end it name mangling added by the compiler to differentiate various local static variable with the same name.

int main()

T main: There is nothing much to explain here as the main function is code and it should belong to the text section.

const static int readOnlyStatic = 5;

r readOnlyStatic: r/R is read only variable. As said before r is for local and R is for global. All static variables are local in scope.

Since we added a const at the beginning it cannot be modified at run time so it becomes a read only data.

const int readOnlyVariable = 20;

R readOnlyVariable: Similarly this is global read only. So it is identified by R in NM. The const at the beginning makes it read only.

int sub(int a, int b);

U sub: U is undefined symbols. This usually for undefined functions. Since this function is used but not defined. Had is been defined as add. It would have been as D.

So that defines all the variables. No WAIT!!!. Where is

int local = 20;

Local variables are usually defined in the stack at run time via code. So it wont be available in the Data or the BSS segment.

As promised earlier lets get back to the C globalUninitializedVariable.

int globalUninitializedVariable;

Why is it C rather than B since it is a undefined global variable. Let me explain it with some programs. What we built was object file. It is still not a executable. One or more object file combined together to form a executable.

Now lets add another program to define sub such that we can create executable.

Now let’s compile both nm_symbols.c and nm_sub.c.

$ gcc nm_symbols.c nm_sub.c -o nm_test.o
$ nm nm_test.o
00000000000005fa T add
...................................
0000000000200e00 d _DYNAMIC
000000000020101c D _edata
0000000000201028 B _end
00000000000006d4 T _fini
00000000000005f0 t frame_dummy
0000000000200df0 t __frame_dummy_init_array_entry
000000000000087c r __FRAME_END__
0000000000201010 D globalInitializedVariable
0000000000200fc0 d _GLOBAL_OFFSET_TABLE_
0000000000201024 B globalUninitializedVariable
w __gmon_start__
....................................
0000000000201018 d local_static.1807
000000000000060e T main
00000000000006e8 r readOnlyStatic
00000000000006e4 R readOnlyVariable
0000000000000560 t register_tm_clones
00000000000004f0 T _start
0000000000201014 d staticInitializedVariable
0000000000201020 b staticUninitializedVariable
0000000000000648 T sub
0000000000201020 D __TMC_END__

I have truncated the nm output such that it shows only the needed output. Where did all this new symbols come from. These are added by the compiler for executable to work.

First thing to notice T sub. The sub has changed from U to T. This is because the function is defined. so now the code will be part of the text segment.

Next lets look at the important variable B globalUninitializedVariable. It has changed from C to B. This is what we expected earlier. Before I write the conclusion let’s do another small similar experiment.

The above program is the same as nm_sub.c. We have additionally initialized the globalUninitializedVariable. Lets compile this and test on nm.

$ gcc nm_symbols.c nm_sub1.c -o nm_test.o
ragha:systems$ nm nm_test.o
00000000000005fa T add
..............................
0000000000200e00 d _DYNAMIC
0000000000201020 D _edata
0000000000201028 B _end
00000000000006d4 T _fini
00000000000005f0 t frame_dummy
0000000000200df0 t __frame_dummy_init_array_entry
000000000000087c r __FRAME_END__
0000000000201010 D globalInitializedVariable
0000000000200fc0 d _GLOBAL_OFFSET_TABLE_
000000000020101c D globalUninitializedVariable
w __gmon_start__
..................................
0000000000201018 d local_static.1807
000000000000060e T main

00000000000006e8 r readOnlyStatic
00000000000006e4 R readOnlyVariable

0000000000000560 t register_tm_clones
00000000000004f0 T _start
0000000000201014 d staticInitializedVariable
0000000000201024 b staticUninitializedVariable

0000000000000648 T sub
0000000000201020 D __TMC_END__

Look at the output above. globalUninitializedVariable symbol is associated with D, meaning it is initialized. So i guess that gives a clear picture of what C means. As long as the compiled object is not executable. The Uninitialized variable remains as C. When compiled to executable it changes to B or D based on whether it is not initialized or initialized in another file respectively.

Conclusion: Hope this gives a good understanding on the allocation of variables in various segments of the program.

--

--