Demystifying Linking in Software Development

28 min readJul 3, 2023

Linking is an essential aspect of software development that plays a crucial role in ensuring the smooth execution of programs. From simple scripts to complex applications, understanding how linking works is vital for developers aiming to optimize performance and manage dependencies effectively. In this comprehensive guide, we delve into the world of linking, unraveling its intricacies, and shedding light on various types of linking, their purpose, and best practices. Whether you’re a seasoned developer or a newcomer to the field, this article will equip you with the knowledge needed to navigate the complexities of linking in software development.

Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed. Linking can be performed at compile time when the source code is translated into machine code, at load time, when the program is loaded into memory and executed by the loader, and even at run time, by application programs. Linking is performed by programs called linkers. Linkers enable separate compilation. Instead of organizing a large application as one monolithic source file, we can decompose it into smaller, more manageable modules that can be modified and compiled separately. When we change one of these modules, we simply recompile it and relink the application, without having to recompile the other files.

To keep concepts concrete and understandable, we will couch our discussion in the context of an x86-64 system running Linux and using the standard ELF-64 (hereafter referred to as ELF) object file format. However, it is important to realize that the basic concepts of linking are universal, regardless of the operating system or the object file format. Details may vary, but the concepts are the same.

1. Compiler Drivers (Program Building Process)

A compiler driver is a software component that coordinates the execution of a compiler and other necessary tools to transform source code into an executable program, handling tasks such as parsing command-line arguments, invoking the appropriate compiler stages, and managing dependencies.

Consider the C program, which will serve as a simple running example that will allow us to make some important points about how linkers work.

The example program consists of two source files, main.c and sum.c. The
main function initializes an array of ints, and then calls the sum function to sum the array elements.

Most compilation systems provide a compiler driver that invokes the language preprocessor, compiler, assembler, and linker, as needed on behalf of the user. For example, to build the example program using the GNU compilation system, we might invoke the gcc driver by typing the following command to the shell:

linux> gcc -Og -o prog main.c sum.c

Summarizes the activities of the driver as it translates the example
program from an ASCII source file into an executable object file.

The driver first runs the C preprocessor (cpp), which translates the C source file main.c into an ASCII intermediate file main.i:

cpp [other arguments] main.c /tmp/main.i

Next, the driver runs the C compiler (cc1), which translates main.i into an ASCII assembly-language file main.s:

cc1 /tmp/main.i -Og [other arguments] -o /tmp/main.s

Then, the driver runs the assembler (as), which translates main.s into a binary relocatable object file main.o:

as [other arguments] -o /tmp/main.o /tmp/main.s

The driver goes through the same process to generate sum.o. Finally, it runs the linker program ld, which combines main.o and sum.o, along with the necessary system object files, to create the binary executable object file prog:

ld -o prog [system object files and args] /tmp/main.o /tmp/sum.o

To run the executable prog, we type its name on the Linux shell’s command
line:

linux> ./prog

The shell invokes a function in the operating system called the loader, which copies the code and data in the executable file prog into memory, and then transfers control to the beginning of the program.

Let’s begin by exploring static linking, a straightforward form of linking that provides a clear understanding of how dependencies are resolved and bundled into the executable.

2. Static Linking

Static linkers such as the Linux ld program (which is used in compiler drivers) take as input a collection of relocatable object files and command-line arguments and generate as output a fully linked executable object file that can be loaded and run. The input relocatable object files consist of various code and data sections, where each section is a contiguous sequence of bytes. Instructions are in one section, initialized global variables are in another section, and uninitialized variables are in yet another section.

To build the executable, the linker must perform two main tasks:

Step 1. Symbol resolution. Object files define and reference symbols, where each symbol corresponds to a function, a global variable, or a static variable (i.e., any C variable declared with the static attribute). The purpose of symbol resolution is to associate each symbol reference with exactly one
symbol definition.
Step 2. Relocation. Compilers and assemblers generate code and data sections that start at address 0. The linker relocates these sections by associating a memory location with each symbol definition, and then modifying all of the references to those symbols so that they point to this memory location. The linker blindly performs these relocations using detailed instructions, generated by the assembler, called relocation entries.

As you read, keep in mind some basic facts about linkers: Object files are merely collections of blocks of bytes. Some of these blocks contain program code, others contain program data, and others contain data structures that guide the linker and loader. A linker concatenates blocks together, decides on run-time locations for the concatenated blocks, and modifies various locations within the code and data blocks. Linkers have minimal understanding of the target machine. The compilers and assemblers that generate the object files have already done most of the work.

3. Object Files

Let’s describe forms of object files that help us to understand how linkers can work with them. Object files come in three forms:
Relocatable object file. Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
Executable object file. Contains binary code and data in a form that can be
copied directly into memory and executed.
Shared object file. A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.

Compilers and assemblers generate relocatable object files (including shared object files). Linkers generate executable object files. Object files are organized according to specific object file formats, which vary from system to system. The first Unix systems from Bell Labs used the a.out format. (To this day, executables are still referred to as a.out files.) Windows uses the Portable Executable (PE) format. Mac OS-X uses the Mach-O format. Modern x86–64 Linux and Unix systems use Executable and Linkable Format (ELF). Although our discussion will focus on ELF, the basic concepts are similar, regardless of the particular format.

4. Relocatable Object Files

As we continue our exploration of linking in software development, it is essential to dive into the realm of relocatable object files. Relocatable object files play a crucial role in the compilation and linking process, providing a portable and adaptable format for code that can be seamlessly integrated into various environments. In this section, we will demystify the concept of relocatable object files, shedding light on their purpose, structure, and the benefits they bring to the development workflow. By understanding relocatable object files, developers gain valuable insights into achieving code portability and ensuring the smooth execution of their software across different platforms and architectures. Let’s start by showing the format of a typical ELF relocatable object file.

The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file. The rest of the ELF header contains information that allows a linker to parse and interpret the object file. This includes the size of the ELF header, the object file type (e.g., relocatable, executable, or shared), the machine type (e.g., x86–64), the file offset of the section header table, and the size and number of entries in the section header table. The locations and sizes of the various sections are described by the section header table, which contains a fixed-size entry for each section in the object file.
Sandwiched between the ELF header and the section header table are the
sections themselves. A typical ELF relocatable object file contains the following sections:

.text The machine code of the compiled program.
.rodata Read-only data such as the format strings in printf statements, and jump tables for switch statements.
.data Initialized global and static C variables. Local C variables are maintained at run time on the stack and do not appear in either the .data or .bss sections.
.bss Uninitialized global and static C variables, along with any global or static variables that are initialized to zero. This section occupies no actual space in the object file; it is merely a placeholder. Object file formats distinguish between initialized and uninitialized variables for space efficiency: uninitialized variables do not have to occupy any actual disk space in the object file. At run time, these variables are allocated in memory with an initial value of zero.
.symtab A symbol table with information about functions and global variables that are defined and referenced in the program. Some programmers mistakenly believe that a program must be compiled with the -g option to get symbol table information. In fact, every relocatable object file has a symbol table in .symtab (unless the programmer has specifically removed it with the strip command). However, unlike the symbol table inside a compiler, the .symtab symbol table does not contain entries for local variables.
.rel.text A list of locations in the .text section that will need to be modified when the linker combines this object file with others. In general, any instruction that calls an external function or references a global variable will need to be modified. On the other hand, instructions that call local functions do not need to be modified. Note that relocation information is not needed in executable object files, and is usually omitted unless the user explicitly instructs the linker to include it.
.rel.data Relocation information for any global variables that are referenced or defined by the module. In general, any initialized global variable whose initial value is the address of a global variable or externally defined function will need to be modified.
.debug A debugging symbol table with entries for local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file. It is only present if the compiler driver is invoked with the -g option.
.line A mapping between line numbers in the original C source program and machine code instructions in the .text section. It is only present if the compiler driver is invoked with the -g option.
.strtab A string table for the symbol tables in the .symtab and .debug sections and for the section names in the section headers. A string table is a sequence of null-terminated character strings.

5. Symbols and Symbols Tables

Symbols serve as essential markers that represent functions, variables, and other program elements, enabling efficient communication and coordination between different modules. In this section, we delve into the realm of symbols and symbol tables, unraveling their role in the linking process, their structure, and how they facilitate the seamless resolution of references, ultimately ensuring the smooth execution of complex software systems. Each relocatable object module, m, has a symbol table that contains information about the symbols that are defined and referenced by m. In the context of a linker, there are three different kinds of symbols:
Global symbols that are defined by module m and that can be referenced by other modules. Global linker symbols correspond to nonstatic C functions and global variables.
Global symbols that are referenced by module m but defined by some other module. Such symbols are called externals and correspond to nonstatic C functions and global variables that are defined in other modules.
Local symbols that are defined and referenced exclusively by module m. These correspond to static C functions and global variables that are defined with the static attribute. These symbols are visible anywhere within module m, but cannot be referenced by other modules.

It is important to realize that local linker symbols are not the same as local
program variables. The symbol table in .symtab does not contain any symbols that correspond to local nonstatic program variables. These are managed at run time on the stack and are not of interest to the linker.
Interestingly, local procedure variables that are defined with the C static
attribute are not managed on the stack. Instead, the compiler allocates space in .data or .bss for each definition and creates a local linker symbol in the symbol table with a unique name. For example, suppose a pair of functions in the same module define a static local variable x:

In this case, the compiler exports a pair of local linker symbols with different names to the assembler. For example, it might use x.1 for the definition in function f and x.2 for the definition in function g.
Symbol tables are built by assemblers, using symbols exported by the compiler into the assembly-language .s file. An ELF symbol table is contained in the .symtab section. It contains an array of entries. The following figure shows the format of each entry.

The name is a byte offset into the string table that points to the null-terminated string name of the symbol. The value is the symbol’s address. For relocatable modules, the value is an offset from the beginning of the section where the object is defined. For executable object files, the value is an absolute run-time address. The size is the size (in bytes) of the object. The type is usually either data or function. The symbol table can also contain entries for the individual sections and for the path name of the original source file. So there are distinct types for these objects as well. The binding field indicates whether the symbol is local or global.

Each symbol is assigned to some section of the object file, denoted by the section field, which is an index into the section header table. There are three special pseudo sections that don’t have entries in the section header table: ABS is for symbols that should not be relocated. UNDEF is for undefined symbols that is, symbols that are referenced in this object module but defined elsewhere. COMMON is for uninitialized data objects that are not yet allocated. For COMMON symbols, the value field gives the alignment requirement, and size gives the minimum size. Note that these pseudo sections exist only in relocatable object files; they do not exist in executable object files.

NOTE: C programmers use the static attribute to hide variable and function declarations inside modules, much as you would use public and private declarations in Java and C++. In C, source files play the role of modules. Any global variable or function declared with the static attribute is private to that module. Similarly, any global variable or function declared without the static attribute is public and can be accessed by any other module. It is good programming practice to protect your variables and functions with the static attribute wherever possible.

6. Symbol Resolution

Let’s delve into the mechanics of symbol resolution performed by linkers to gain a clear understanding of how these crucial components establish connections between symbols, enabling seamless integration and execution of code across multiple modules.

The linker resolves symbol references by associating each reference with exactly one symbol definition from the symbol tables of its input relocatable object files. Symbol resolution is straightforward for references to local symbols that are defined in the same module as the reference. The compiler allows only one definition of each local symbol per module. The compiler also ensures that static local variables, which get local linker symbols, have unique names.

Resolving references to global symbols, however, is trickier. When the compiler encounters a symbol (either a variable or function name) that is not defined in the current module, it assumes that it is defined in some other module, generates a linker symbol table entry, and leaves it for the linker to handle. If the linker is unable to find a definition for the referenced symbol in any of its input modules, it prints an (often cryptic) error message and terminates. For example, if we try to compile and link the following source file on a Linux machine,

then the compiler runs without a hitch, but the linker terminates when it cannot resolve the reference to foo:

linux> gcc -Wall -Og -o linkerror linkerror.c
/tmp/ccSz5uti.o: In function ‘main’:
/tmp/ccSz5uti.o(.text+0x7): undefined reference to ‘foo’

Symbol resolution for global symbols is also tricky because multiple object
modules might define global symbols with the same name. In this case, the linker must either flag an error or somehow choose one of the definitions and discard the rest. The approach adopted by Linux systems involves cooperation between the compiler, assembler, and linker and can introduce some baffling bugs to the unwary programmer.

6.1 Linking with Static Libraries

So far, we have assumed that the linker reads a collection of relocatable object files and links them together into an output executable file. In practice, all compilation systems provide a mechanism for packaging related object modules into a single file called a static library, which can then be supplied as input to the linker. When it builds the output executable, the linker copies only the object modules in the library that are referenced by the application program.

The benefits of using static libraries include:

They can improve the performance of programs because the linker can perform optimizations that are not possible when linking individual object files.
They can reduce the size of programs because the linker can eliminate duplicate code from the object files.
They can make it easier to maintain programs because changes to a function in a static library only need to be made in one place.

The drawbacks of using static libraries include:

They can make it more difficult to debug programs because the source code for the functions in a static library is not available.
They can make it more difficult to port programs to different platforms because the static library may not be available on the target platform.

The notion of a static library was developed to resolve the disadvantages of
these various approaches. Related functions can be compiled into separate object modules and then packaged in a single static library file. Application programs can then use any of the functions defined in the library by specifying a single filename on the command line. For example, a program that uses functions from the C standard library and the math library could be compiled and linked with a command of the form:

linux> gcc main.c /usr/lib/libm.a /usr/lib/libc.a

At link time, the linker will only copy the object modules that are referenced by the program, which reduces the size of the executable on disk and in memory. On the other hand, the application programmer only needs to include the names of a few library files. (In fact, C compiler drivers always pass libc.a to the linker, so the reference to libc.a mentioned previously is unnecessary.)

On Linux systems, static libraries are stored on disk in a particular file format known as an archive. An archive is a collection of concatenated relocatable object files, with a header that describes the size and location of each member object file. Archive filenames are denoted with the .a suffix.

To make our discussion of libraries concrete, consider the pair of vector
routines:

Member object files in the libvector library

Each routine, defined in its own object module, performs a vector operation on a pair of input vectors and stores the result in an output vector. As a side effect, each routine records the number of times it has been called by incrementing a global variable. To create a static library of these functions, we would use the ar tool as follows:

linux> gcc -c addvec.c multvec.c
linux> ar rcs libvector.a addvec.o multvec.o

To use the library, we might write an application such as main2.c:

which invokes the addvec library routine. The include (or header) file vector.h defines the function prototypes for the routines in libvector.a.

To build the executable, we would compile and link the input files main2.o
and libvector.a:

linux> gcc -c main2.c
linux> gcc -static -o prog2c main2.o ./libvector.a

or equivalently,

linux> gcc -c main2.c
linux> gcc -static -o prog2c main2.o -L. -lvector

The -static argument tells the compiler driver that the linker should build a fully linked executable object file that can be loaded into memory and run without any further linking at load time. The -lvector argument is a shorthand for libvector.a, and the -L. argument tells the linker to look for libvector.a in the current directory. When the linker runs, it determines that the addvec symbol defined by addvec.o is referenced by main2.o, so it copies addvec.o into the executable. Since the program doesn’t reference any symbols defined by multvec.o, the linker does not copy this module into the executable. The linker also copies the printf.o module from libc.a, along with a number of other modules from the C run-time system.

6.2 How Linkers Use Static Libraries to Resolve References

The Linux linker utilizes static libraries to resolve external references, a process that can sometimes confuse programmers. During the symbol resolution phase, the linker scans the relocatable object files and archives in the sequential order they appear on the command line, maintaining sets for relocatable object files to be merged (E), unresolved symbols (U), and previously defined symbols (D). By matching symbols in unresolved set (U) with those defined in archive members, the linker gradually resolves references until a fixed point is reached. However, the ordering of libraries and object files on the command line is crucial, as incorrect ordering can result in unresolved references and link-time errors. To ensure correct symbol resolution, libraries should typically be placed at the end of the command line, with the order reflecting dependencies between library members. Alternatively, combining interdependent libraries into a single archive can resolve dependency issues effectively.

7. Relocation

Relocation is a critical process performed by the linker to adjust the addresses of symbols and references within object files, enabling the correct execution of a program. Once the linker has completed the symbol resolution step, it has associated each symbol reference in the code with exactly one symbol definition (i.e., a symbol table entry in one of its input object modules). At this point, the linker knows the exact sizes of the code and data sections in its input object modules. It is now ready to begin the relocation step, where it merges the input modules and assigns run-time addresses to each symbol. Relocation consists of two steps:

Relocating sections and symbol definitions. In this step, the linker merges all sections of the same type into a new aggregate section of the same type. For example, the .data sections from the input modules are all merged into one section that will become the .data section for the output executable object file. The linker then assigns run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules. When this step is complete, each instruction and global variable in the program has a unique run-time memory address.
Relocating symbol references within sections. In this step, the linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses. To perform this step, the linker relies on data structures in the relocatable object modules known as relocation entries, which we describe next.

The linker’s relocation process involves two main types of addresses: relative addresses and absolute addresses. Relative addresses are offsets from a known location, while absolute addresses are fixed addresses in memory. When the linker encounters a reference to a symbol, it examines the symbol’s value and determines whether it is an absolute address or a relative address. If the symbol’s value is an absolute address, the linker leaves it unchanged. However, if the symbol’s value is a relative address, the linker needs to adjust it to reflect the correct final address.

The linker accomplishes relocation by adding an appropriate offset to each reference to align it with the final address of the symbol. The offset is calculated based on the difference between the symbol’s final address and its original address. To perform relocation, the linker uses a relocation table, also known as a relocation entry or relocation record. This table contains information about each symbol reference that requires relocation, such as the address to be adjusted and the type of relocation needed. The relocation table allows the linker to efficiently process and modify the addresses in the object files.

During the relocation process, the linker updates the addresses of instructions and data within the object files to reflect their correct positions in the final executable. By applying the necessary adjustments, the linker ensures that the program can run correctly regardless of its final load address in memory. Relocation is a crucial step in the linking process as it enables the creation of position-independent code, which can be loaded and executed at different memory locations. It allows programs to be more flexible and adaptable, particularly in scenarios where the program needs to be loaded at various memory addresses or when multiple programs are sharing the same memory space.

In summary, the linker’s relocation process adjusts the addresses of symbols and references within object files, using a relocation table to determine the necessary modifications. By performing relocation, the linker ensures that the program can execute correctly regardless of its actual memory location, enabling greater flexibility and portability in software development.

8. Executable Object Files

As software developers, we often encounter the term “executable object files” in the context of building and running our programs. These files represent the culmination of our coding efforts, containing machine code, data, and symbols that enable the execution of our applications. We have seen how the linker merges multiple object files into a single executable
object file. Our example C program, which began life as a collection of ASCII text files, has been transformed into a single binary file that contains all of the information needed to load the program into memory and run it. The following figure summarizes the kinds of information in a typical ELF executable file.

The format of an executable object file is similar to that of a relocatable object file. The ELF header describes the overall format of the file. It also includes the program’s entry point, which is the address of the first instruction to execute when the program runs. The .text, .rodata, and .data sections are similar to those in a relocatable object file, except that these sections have been relocated to their eventual run-time memory addresses. The .init section defines a small function, called _init, that will be called by the program’s initialization code. Since the executable is fully linked (relocated), it needs no .rel sections.

The executable object file also contains relocation information, which allows the operating system to adjust the addresses of symbols at runtime based on the final load address of the program. This relocation information is essential for resolving references to symbols and ensuring the proper execution of the program.

Additionally, the executable object file may include other sections such as debug information, which aids in debugging and analysis of the program during development. Once the executable object file is created, it can be loaded into memory by the operating system. The operating system maps the sections of the object file into appropriate memory regions and starts the program’s execution by transferring control to the entry point specified in the file. Executable object files provide a standardized format for storing programs, enabling them to be executed on different platforms. They encapsulate the compiled code, data, and symbol information necessary for the operating system to correctly run the program.

In summary, executable object files are the result of the linking process, containing machine code, data, symbol information, and relocation details. These files serve as the bridge between the compiled program and the operating system, allowing for the execution of programs on computer systems.

9. Loading Executable Object Files

To run an executable object file prog, we can type its name to the Linux shell’s command line:

linux> ./prog

Since prog does not correspond to a built-in shell command, the shell assumes that prog is an executable object file, which it runs for us by invoking some memory-resident operating system code known as the loader. Any Linux program can invoke the loader by calling the execve function. The loader copies the code and data in the executable object file from the disk into memory and then runs the program by jumping to its first instruction, or entry point. This process of copying the program into memory and then running it is known as loading.

Every running Linux program has a run-time memory image similar to the
one in the following figure:

On Linux x86–64 systems, the code segment starts at address 0x400000, followed by the data segment. The run-time heap follows the data segment and grows upward via calls to the malloc library. This is followed by a region that is reserved for shared modules. The user stack starts below the largest legal user address (2⁴⁸ − 1) and grows down, toward smaller memory addresses. The region above the stack, starting at address 2⁴⁸, is reserved for the code and data in the kernel, which is the memory-resident part of the operating system.

When the loader runs, it creates a memory image similar to the one shown
in the above figure. Guided by the program header table, it copies chunks of the executable object file into the code and data segments. Next, the loader jumps to the program’s entry point, which is always the address of the _start function. This function is defined in the system object file crt1.o and is the same for all C programs. The _start function calls the system startup function, __libc_start_main, which is defined in libc.so. It initializes the execution environment, calls the user-level main function, handles its return value, and if necessary returns control to the kernel.

10. Dynamic Linking with Shared Libraries

Static libraries address many of the issues associated with making extensive collections of related functions available to application programs. However, static libraries still have some significant disadvantages. Static libraries, like all software, need to be maintained and updated periodically. If application programmers want to use the most recent version of a library, they must somehow become aware that the library has changed and then explicitly relink their programs against the updated library.

Another issue is that almost every C program uses standard I/O functions such as printf and scanf. At run time, the code for these functions is duplicated in the text segment of each running process. On a typical system that is running hundreds of processes, this can be a significant waste of scarce memory system resources. (An interesting property of memory is that it is always a scarce resource, regardless of how much there is in a system. Disk space and kitchen trash cans share this same property.)
Shared libraries are modern innovations that address the disadvantages of
static libraries. A shared library is an object module that, at either run time or load time, can be loaded at an arbitrary memory address and linked with a program in memory. This process is known as dynamic linking and is performed by a program called a dynamic linker. Shared libraries are also referred to as shared objects, and on Linux systems they are indicated by the .so suffix. Microsoft operating systems make heavy use of shared libraries, which they refer to as DLLs (dynamic link libraries).

Shared libraries are “shared” in two different ways. First, in any given file
system, there is exactly one .so file for a particular library. The code and data in this .so file are shared by all of the executable object files that reference the library, as opposed to the contents of static libraries, which are copied and embedded in the executables that reference them. Second, a single copy of the .text section of a shared library in memory can be shared by different running processes.

The following figure summarizes the dynamic linking process for the example program which we discussed earlier.

To build a shared library libvector.so of our example vector routines, we invoke the compiler driver with some special directives to the compiler and linker:

linux> gcc -shared -fpic -o libvector.so addvec.c multvec.c

The -fpic flag directs the compiler to generate position-independent code. The -shared flag directs the linker to create a shared object file. Once we have created the library, we would then link it into our example program:

linux> gcc -o prog2l main2.c ./libvector.so

This creates an executable object file prog2l in a form that can be linked with libvector.so at run time. The basic idea is to do some of the linking statically when the executable file is created, and then complete the linking process dynamically when the program is loaded. It is important to realize that none of the code or data sections from libvector.so are actually copied into the executable prog2l at this point. Instead, the linker copies some relocation and symbol table information that will allow references to code and data in libvector.so to be resolved at load time.

When the loader loads and runs the executable prog2l, it loads the partially
linked executable prog2l, using the techniques discussed earlier. Next, it notices that prog2l contains a .interp section, which contains the path name of the dynamic linker, which is itself a shared object (e.g., ld-linux.so on Linux systems). Instead of passing control to the application, as it would normally do, the loader loads and runs the dynamic linker. The dynamic linker then finishes the linking task by performing the following relocations:

Relocating the text and data of libc.so into some memory segment
Relocating the text and data of libc.so into some memory segment
Relocating any references in prog2l to symbols defined by libc.so and
libvector.so

Finally, the dynamic linker passes control to the application. From this point on, the locations of the shared libraries are fixed and do not change during execution of the program.

11. Position Independent Code and Library Interposing

Position Independent Code (PIC) refers to machine code that can execute properly regardless of its actual memory location. PIC is designed to be relocatable, allowing it to be loaded at different memory addresses without requiring modifications to the code itself. This property is particularly useful when sharing code across multiple processes or when the exact memory location is not known in advance. PIC enables greater flexibility and portability in software development.

Library interpositioning is a technique that allows a programmer to override or intercept function calls made to shared libraries. By interposing their own implementation of a function, the programmer can modify the behavior of the program without modifying the original library or the application’s source code. This technique is often used for debugging purposes, performance monitoring, or implementing custom functionality. Library interpositioning offers a powerful means to modify the behavior of a program dynamically, providing flexibility and control over its execution.

Sure, here is some text about position independent code (PIC) and library interposing:

Position independent code (PIC) is code that can execute regardless of its absolute address. This is achieved by using relative addressing instead of absolute addressing. Relative addressing means that the addresses of functions and variables are not stored as absolute values, but as offsets from a fixed point in memory. This allows the code to be loaded at any address in memory and still execute correctly.
Library interposing is a technique that allows you to replace the functions in a shared library with your own custom functions. This can be useful for debugging, security, or performance reasons. To interpose a library, you need to create a new shared library that contains your custom functions. The new shared library must have the same name and version as the original library. When the program is executed, the operating system will load your custom library instead of the original library.

Here are some of the benefits of using PIC and library interposing:

PIC can improve the portability of code. Since PIC code is not tied to a specific address, it can be loaded at any address in memory and still execute correctly. This makes PIC code more portable to different platforms.
Library interposing can be used to debug code. If you want to debug a function in a shared library, you can interpose the library with your own custom function that prints out the arguments and the return value of the function. This can be helpful for tracking down bugs in the library.
Library interposing can be used to improve the security of code. You can interpose a library with your own custom function that checks the arguments to the function and prevents them from being malicious. This can help to protect your program from security vulnerabilities.
Library interposing can be used to improve the performance of code. You can interpose a library with your own custom function that is optimized for speed. This can improve the performance of your program.

Summary

Linking can be performed at compile time by static linkers and at load time and run time by dynamic linkers. Linkers manipulate binary files called object files, which come in three different forms: relocatable, executable, and shared. Relocatable object files are combined by static linkers into an executable object file that can be loaded into memory and executed. Shared object files (shared libraries) are linked and loaded by dynamic linkers at run time, either implicitly when the calling program is loaded and begins executing, or on demand, when the program calls functions from the dlopen library.
The two main tasks of linkers are symbol resolution, where each global symbol in an object file is bound to a unique definition, and relocation, where the ultimate memory address for each symbol is determined and where references to those objects are modified.
Static linkers are invoked by compiler drivers such as gcc. They combine
multiple relocatable object files into a single executable object file. Multiple object files can define the same symbol, and the rules that linkers use for silently resolving these multiple definitions can introduce subtle bugs in user programs. Multiple object files can be concatenated in a single static library. Linkers use libraries to resolve symbol references in other object modules. The left-to-right sequential scan that many linkers use to resolve symbol references is another source of confusing link-time errors.

Loaders map the contents of executable files into memory and run the program. Linkers can also produce partially linked executable object files with unresolved references to the routines and data defined in a shared library. At load time, the loader maps the partially linked executable into memory and then calls a dynamic linker, which completes the linking task by loading the shared library and relocating the references in the program.
Shared libraries that are compiled as position-independent code can be loaded anywhere and shared at run time by multiple processes. Applications can also use the dynamic linker at run time in order to load, link, and access the functions and data in shared libraries.

Dear Reader,

Thank you for taking the time to explore this article on demystifying linking in software development. We hope that the insights provided have shed light on the intricacies of code integration and have enhanced your understanding of the linking process.

We understand that linking can be a complex topic, but by delving into concepts such as static libraries, relocatable object files, symbols and symbol tables, position independent code, and library interpositioning, we aimed to make this subject more approachable and accessible.

As a developer, having a solid grasp of linking is invaluable for building robust and efficient software solutions. By understanding how different components come together during the linking process, you can create applications that are modular, flexible, and easily maintainable.

I encourage you to continue exploring and deepening your knowledge of software development. By staying curious and embracing new concepts, you can unlock endless possibilities in your coding journey.

Once again, thank you for your time and interest in this article. I hope that it has provided you with valuable insights that will benefit you in your future software development endeavors.

Happy coding!