Demystifying the ELF Format: Unveiling the Secrets of Binary Executables

15 min readJul 3, 2023

Every time you run a program or see an application come to life, you’re attending the culmination of a complex dance 🕺💃 between hardware and software.

But what lies beneath the surface of these apparently mundane files?

What secrets do they hold within their binary fibers?

The answer is hidden in the ELF format, the key to unlocking the hidden wonders of executable files.

So, embark on an adventure that will unravel the secrets and reveal the inner workings of the ELF format, the very foundation of the digital world we inhabit.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

TABLE OF CONTENTS :

1) INTRODUCTION

2) WHAT IS ELF ?

3) WHY IS THE ELF FORMAT USED?

4) THE INFORMATIONS STORED IN THE ELF FORMAT

5) THE ARCHITECTURE OF ELF

6) THE readelf COMMAND

7) THE nm COMMAND

8) THE objdump COMMAND

9) CONCLUSION

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

1) INTRODUCTION

Ladies and gentlemen, gather round!
It’s time to explore the complex but wonderful world of the ELF format!

So, what’s the deal with it?
Well, it’s a file that is both executable and linkable.
it can store programs, libraries, and even objects.

Moreover, it’s the VIP backstage pass for your programs to enter the magical world of Unix-like operating systems!

ELF is like the common language spoken by Unix-like operating systems.
It’s the format they understand and embrace with open arms.
While the ELF format can be used outside of Unix systems with appropriate tooling and compatibility layers, its native support, tooling ecosystem, and widespread adoption make it more suitable and prevalent within this operating system environment.
So if you want your program to run smoothly in the Unix world, ELF is your ticket!

And guess what? These ELF files are loaded with top-secret informations!
They contain valuable insights about your program, such as memory layout, entry points, and more.
It’s like having a detailed roadmap for your software.

And let’s not forget the amazing tools at our disposal, like readelf, nm, and objdump.
They will let us peek into ELF files, uncover their secrets, and understand their inner workings.

Let’s go!

2) WHAT IS ELF ?

ELF stands for “Executable and Linkable Format”.

The ELF format was introduced in the early 1990s as part of the System V Release 4 (SVR4) specification by UNIX System Laboratories (USL).

🗒️Note:

The System V Release 4 (SVR4) specification was a version of the Unix operating system developed by UNIX System Laboratories (USL).

It was released in 1989 and was a significant step in the evolution of Unix.

SVR4 became a widely adopted Unix specification and formed the basis for many Unix variants and commercial Unix operating systems. It played a crucial role in the standardization and interoperability of Unix systems.

However, with the emergence of Linux and other open-source Unix-like operating systems, the influence of SVR4 diminished over time.

ELF is a binary executable file format used for saving compiled code.

It aimed to replace the previous a.out (“Assembler Output”) and COFF (“Common Object File Format”) formats, which had limitations in terms of flexibility and extensibility.

Therefore, ELF was designed to provide a more powerful, flexible and versatile format for holding binary executable files, object codes, shared libraries, and core dumps in Unix-like operating systems.

It contains machine code instructions (binary instructions) that the processor can understand and execute directly.

It also provides a good, better support for modern features.

ELF defines a structured format with specific sections and headers (that we will explore on this blogspot’s next section).

Here is a list of the ELF format’s benefits:

Flexibility and portability:

ELF provides a flexible structure that accommodates various types of data and sections, making it suitable for different program elements, including executables, object code, shared libraries, and debugging information.

Moreover, it does not exclude any particular central processing unit (CPU) or instruction set architecture. This allows it to be adopted by some other operating systems on some different hardware platforms, with additional tools and processes to make an ELF file compatible.

Debugging and Profiling:

ELF includes support for debugging symbols and other debugging information, that facilitates in software debugging, profiling, and analysis.

Developers can extract valuable information about functions, variables, and line numbers to understand program behavior.

Security features:

ELF supports security features such as the address space layout randomization (ASLR), a security technique employed by operating systems which randomizes the memory layout of executables to reduce some types of attacks and to make it difficult for attackers to predict and exploit memory locations.

It also provides read-only relocations, enhancing the security of shared libraries by preventing unauthorized code modifications.

🗒️Note:

ASLR is not a foolproof security measure and may not protect against all types of attacks. Sophisticated attackers with advanced techniques or vulnerabilities in the system can still bypass ASLR.

However, it adds an additional layer of defense and raises the bar for exploitation, making it an important security feature in modern operating systems.

Standardization:

ELF has become the basic standard for executable and object code formats in many Unix-like operating systems.

Its widespread adoption has led to a rich ecosystem of tools, libraries, and support, making it easier for developers to work with ELF files.

3) WHY IS THE ELF FORMAT USED?

Here are the main uses of the ELF format:

For executable files:

ELF format is primarily used to store executable files. These files contain machine code instructions that can be directly executed by the operating system or the CPU (“Central Processing Unit”).

For object files:

ELF format is also used to store object code, which is the compiled output of source code files. Object files contain machine code instructions and data that can be combined and linked together to create executable files or shared libraries.

For shared libraries:

ELF format can store shared libraries, separate files which contain precompiled code and data that can be used by multiple executable files.

Moreover, the ELF format supports the mechanic of dynamic linking and loading, which allows programs to access shared libraries at runtime (libraries can be loaded and linked at runtime), when needed, rather than including all the library code in the executable file itself. This results in smaller executable sizes and promotes code reuse, reduces redundancy (code duplication) and improves system efficiency.

So while the ELF format stores shared libraries, the dynamic linking uses this format to locate, load, and link the shared libraries with the executable file (into the memory space of the program).

The ELF format provides the necessary information and structures to identify the shared libraries required by an executable file, such as the library’s name, version, and location (path).

For core dumps:

When a program crashes or encounters a critical error, it may generate a core dump file that contains the program’s memory and processor state at the time of the crash. ELF format is commonly used to store core dump files, allowing developers to analyze the program’s state and diagnose the cause of the crash.

4) THE INFORMATIONS STORED IN THE ELF FORMAT

🔦💡Tip:

To find an elf file so you can observe it, you can type in your terminal this command, copy the path of file and then use “cat” to read it:

find . -exec file {} \; | grep -i elf

🔦 It’s important to know that the exact order and presence of sections can vary depending on the architecture, the operating system, and the toolchain used for compilation and linking.

Different architectures may have additional sections or specific requirements for some sections.

The ELF file presented as an example in this blogspot may differ from another.

ELF file is structured of three major parts: the ELF Header, sections and segments.
Each of this elements play a different role in the linking and loading process of ELF executables.

🗒️Note:

Before starting to detail this part, we have to notice the difference between a section and a segment.

In short, sections (in the Section Headers) organize the program’s data and code logically within the file, while segments (in the Program Headers) define the memory layout when the file is loaded into memory. Sections are used by development and analysis tools, while segments are crucial for the operating system’s loader to correctly map the program’s contents in memory.

Moreover, we can say that an ELF file can be viewed from two different views: the program header view and the section header view.
These two views provide different perspectives on the organization and structure of ELF files, with the linking view focusing on sections’ role and the execution view focusing on segments’ role.

Below is an example representing the main components of the ELF format: the ELF Header, the Section Headers, the Program Headers, and the Symbol Table (in order presented by typing “readelf — all [file]”):

For ELF Header:

The ELF header section provides crucial information about the ELF file (informations listed in order of presentation):

The magic number confirms the file format.

🗒️Note:

The magic number in the ELF header is a specific sequence of bytes that serves as a unique identifier (a signature) for ELF files. It consists of the bytes “7f 45 4c 46” in hexadecimal representation. These bytes correspond to the ASCII characters ‘ELF’, which is the acronym for “Executable and Linkable Format”.

The class indicates it is an ELF64 file (64-bit architecture).
The data field specifies the 2’s complement method and the little-endian byte order.

🗒️Note:

“2’s complement” refers to the representation of signed integers using the two’s complement method, which is the commonly used method in computer systems.
“Little endian” indicates the byte order used to store multi-byte data types. In a little-endian system, the least significant byte is stored at the lowest memory address, while the most significant byte is stored at the highest memory address.
In practical terms, it means that when interpreting the data in the ELF file, you need to read the bytes in reverse order to obtain the correct value for multi-byte integers and other data types.

The opposite of little endian is big endian.

The version indicates the ELF file version.
The OS/ABI field identifies the target operating system.
The ABI version specifies the version of the ABI targeted. Here, the value 0 indicates that ABI is not applicable, so the file does not rely on a specific ABI version.

🗒️Note:

The ABI version field in the ELF header indicates the version of the Application Binary Interface (ABI) targeted by the ELF file. The ABI defines the low-level interface between the operating system and the executable code.

The type indicates that it is a shared object file (DYN).

🗒️Note:

the “Type” field of “DYN” indicates that the provided ELF file is itself a shared object file, that can be dynamically linked with other programs or libraries at runtime to provide additional functionality.

The machine identifies the target architecture (AMD X86–64).
The entry point address marks the start of program execution.
The start of program headers and section headers indicate their starting offsets.

🗒️Note:

the offset indicates position relative to the beginning of the file. The offset value is typically expressed in bytes and represents the distance from the start of the file to the start of the section or segment.

Flags denote any processor-specific flags associated with the file : in this case, the value is “0x0,” indicating no specific flags are set.
Size of this header field specifies the size of the ELF header in bytes. Here, it is 64 bytes.
Size of program headers field specifies the size of each program header entry. In this example, it is 56 bytes.
Number of program headers field indicates the total number of program header entries present in the ELF file. Here, it is 13.
Size of section headers field specifies the size of each section header entry. In this case, it is 64 bytes.
Number of section headers field indicates the total number of section header entries present in the ELF file. Here, it is 31.
Section header string table index, the last field, indicates the index of the section header string table in the section header table. In this example, it is 30.

For Section Headers:

“Section Headers” section includes 31 section headers, starting at offset 0x3a28.

Some common sections that you could find in the ELF file:

.interp: section containing the path of the dynamic linker/loader.
.dynsym: section containing the symbol table for dynamic linking.
.dynstr: section containing the strings referenced by the symbol table.
.rela.plt: section containing relocation information for the procedure linkage table (PLT).
.init: section containing the code executed before the program starts.
.text: section containing the executable instructions of the program.
.dynamic: section containing dynamic linking information.
.data: section containing initialized data variables.
.bss: section containing uninitialized data variables.

Here, each section header provides specific information about the corresponding section’s purpose, size, type, and attributes ((informations listed in order of presentation):

[Nr]: the section number or index of the section header.
Name: the name of the section.
Type: the type or category of the section.
Address: the virtual address at which the section will be loaded during runtime.
Offset: the offset of the section within the file
Size: the size of the section in bytes.
EntSize: the size of each entry within the section. This is relevant for sections that contain multiple entries, such as symbol tables.
Flags: flags that specify various attributes of the section, such as permissions (W for write, A for allocation, X for execute), merging behavior (M), string section indicator (S), and more (see the “key section” above).
Link: contains a section index that has special meaning depending on the section type. It provides a reference to another section, such as a string table or symbol table, that is associated with the current section.
Info: contains extra information or other indices that depend on the section type.
Align: specifies the required alignment of the section in memory. It defines the limit to which the section's data should be aligned. The alignment value is used to optimize memory access and improve performance.

Let’s go through one example, by analyzing the “.dynsym” section:

The type of the section is DYNSYM, indicating that it contains the dynamic symbol table.
The virtual memory address where the section will be loaded at runtime is 0x3c8.
The offset in the file where the section data starts is 0x3c8.
The size of the section is 0x108 (264) bytes.
Each entry in the dynamic symbol table has an entsize of 0x18 (24) bytes.
For flags, The section is allocated in memory at runtime (A flag).
The link field contains additional information specific to the section type. In this case, it has a value of 7, which suggests that it is linked to the .dynstr section.
The info field contains additional information specific to the section type. In this case, it has a value of 1, which indicates that the dynamic symbol table is related to versioning.
For the align field, the required alignment of the section in memory is 8 bytes.

For Program Headers:

The “Program Headers” section in an ELF file serves an essential role in describing the segments of the executable or shared object.

Segments define the memory layout when the file is loaded into memory.

So their primary goal is to provide information necessary for the operating system or dynamic linker to properly load and execute the program.

Here are the first informations about this section:

The ELF file type is a shared object file (DYN).
The entry point of the program is at address 0x10e0.
There are 13 program headers, starting at offset 64.

After, we can find the description of the various segments of the executable file with their types, offsets, virtual addresses, physical addresses, file sizes, memory sizes, flags, and alignments.

Now, let’s examine the details of each program header:

PHDR: this segment describes the program header table itself.
INTERP: this segment contains the path of the program interpreter, which is “/lib64/ld-linux-x86–64.so.2”. It is a dynamic link library required for executing the shared object file.
LOAD: this segment represents a loadable segment. It has a virtual address and memory size and can be marked as readable (R), executable (E), or writable (W), and may contains different datas (such as constants, initialized and uninitialized data).
DYNAMIC: this segment contains dynamic linking information, such as dynamic symbol table and relocation entries.
NOTE: this segment contains various notes, which are arbitrary data provided by the system or tools.
GNU_PROPERTY: this segment contains GNU properties associated with the file.
GNU_EH_FRAME: this segment contains exception handling frame information.
GNU_STACK: this segment defines the stack’s permissions and size.
GNU_RELRO: this segment contains information for the read-only relocations.

The “Section to Segment mapping”, included in the Program Headers, shows the relationship (association) between sections and the corresponding segments in the ELF file. It helps the operating system and dynamic linker understand the organization of the ELF file, facilitating the loading, linking, and execution of the program.

For example, the segment “02” (it’s “LOAD”, the 2nd in the Program Header’s list, after “PHDR” -> “00”, and “INTERP” -> “01”) is associated with:

the program interpreter (.interp)
various note (.note.gnu.property, .note.gnu.build-id, .note.ABI-tag)
the global symbol hash table (.gnu.hash)
the dynamic symbol table (.dynsym)
the dynamic string table (.dynstr)
versioning information (.gnu.version, .gnu.version_r)
and relocation information (.rela.dyn, .rela.plt).

For Symbol tables:

This provided section in the ELF file contains information about two symbol tables: “.dynsym” and “.symtab.”

Symbol tables are used to store information about the symbols (functions, variables, etc.) defined and referenced within the ELF file.

“.dynsym”:

This table contains dynamic symbols used during dynamic linking and debugging. (unlike .symtab, with symbols used during static linking and debugging).

🗒️Note:

What is the difference between dynamic linking and static linking ?

Dynamic linking: process of linking external libraries at runtime for program execution. Allows for smaller executables and easy library updates.

Static linking: process of linking all libraries at compile time, resulting in self-contained executables (all the required code and libraries are in the executable). Provides portability (the program can be moved without concerns about library availability or compatibility) but larger executable sizes.

It includes 11 entries, numbered from 0 to 10. Each entry represents a symbol, and provides informations about it:

The “Num” column represents the symbol number.
The “Value” column shows the address or value associated with the symbol.
The “Size” column indicates the size of the symbol (in bytes).
The “Type” column specifies the symbol type (for example: NOTYPE, FUNC, OBJECT).
The “Bind” column represents the symbol binding (for example: LOCAL, GLOBAL, WEAK).

🗒️Note:

A symbol binding in ELF files is determined by the compiler or linker based on the symbol’s declaration or definition in the source code.

Local symbols are used for internal functions or variables that are not intended to be referenced outside the object file.

Global symbols are used for functions or variables that are intended to be used by other parts of the program.

Weak symbols provide a way to define a default or fallback implementation that can be overridden if a stronger symbol is available.

The “Vis” column denotes the symbol visibility (e.g., DEFAULT, HIDDEN).
The “Ndx” column indicates the section index to which the symbol belongs.
The “Name” column displays the name of the symbol.

“.symtab”:

This table contains symbols used during static linking and debugging.

It includes 69 entries, numbered from 0 to 68. Each entry provides information similar to the “.dynsym” table, including the symbol’s value, size, type, binding, visibility, section index, and name.

Some entries represent sections rather than symbols. These entries provide informations listed above about the various sections present in the ELF file:

Symbols related to code and data are also present in the “.symtab” table, with their respective names, addresses, and other details:

These are some of the common headers found in ELF files, while other headers may vary in their presence depending on the specific characteristics and requirements of the ELF file.

5) THE ARCHITECTURE OF ELF

Demystifying the ELF Format: Unveiling the Secrets of Binary Executables

For ELF Header:

For Section Headers:

For Program Headers:

For Symbol tables:

“.dynsym”:

“.symtab”:

Written by Razika Bengana