PE and ELF Code Caves

Will Ryan
Will Ryan
Jul 31 · 19 min read

In this post, we will explore the Executable and Linkable Format (ELF) and Portable Executable (PE) file formats for Linux and Windows executables, respectively. Along the way, we’ll touch on the similarities and differences between the two formats and how we can interact with them in C/C++. Since the broader context of this post is an introduction to malware-like techniques, we will do this by writing two programs, one for each format, to perform the following task: the injection of code into what is known as a “code cave” in a target executable.

What is a code cave?

A code cave can be broadly defined as a contiguous series of unused or null bytes that exists when a program is loaded into memory. This series of unused bytes typically results from the file’s alignment characteristics. This is very similar to a page in a book or other print publications (a similarity involved in the closely associated concepts of “pagination” and “page size”). Assuming consistent font type and size, each page can contain a fixed amount of text. If the size of the text is greater than the amount which the page contains another page is added. Unless the size of the text is equal to or divisible without remainder by the amount of text which the page can contain, there will always be a remainder of the page (if the page is white, the “white space”) which is not filled by the text. The same is true in memory, only instead of white space this remainder is “padded” with null bytes (0x00). This unused portion — the null-valued or hollow area of the code — is what is known as a “code cave.” Let’s illustrate this with a concrete example:

Image for post
Image for post
Figure 1 — Program headers for our “Hello, world!” target ELF executable (discussed below).

These are the program headers for the ELF executable that we will attempt to inject code into. Exactly what program headers are and what elements they contain — we’ll return to this later. For now, notice three elements in the highlighted row: Offset (0x1000), MemSiz (0x25c), and Align (0x1000). Offset indicates the offset (in bytes) of the beginning of the segment from the beginning of the file and Align refers to the value to which segments are aligned in memory. (Note also that 0x1000, or 4096 in decimal, is the page size of this machine.) Since the next segment is also aligned with the value 0x1000, the next segment’s offset is 0x2000 and the size of the highlighted segment is 0x1000. However, the size of the data in this segment, indicated by MemSiz, is only 0x25c. Since this data is loaded at the beginning of the segment, this leaves a remainder of 0xda4 unused bytes (a 0xda4-sized code cave) starting at the offset 0x125d. The beginning of this code cave can be seen in the following hexdump of this same executable:

Image for post
Image for post
Figure 2 — Hexdump of target file showing code cave at 0x125c.

Something very similar happens in the case of our PE file. Observe the .text section header for the PE equivalent of the above file:

Image for post
Image for post
Figure 3 — .text section header for PE equivalent of “Hello, world!” target executable.

In particular, notice Pointer to Raw Data (0x400) — which is equivalent, in this case, to the Offset element in the ELF executable — and Virtual Size (0xc66f) — which is equivalent to MemSiz. In this case, the alignment is not given within the section header, but in the Optional Header (more on this, too, later) as the value of File Alignment (0x200). Since the size of the section must be aligned with this value and the Virtual Size is 0xc66f, the size of the section, found as the value of Size of Raw Data, is 0xc800, the first multiple of 0x200 greater than 0xc66f. Adding the Pointer to Raw Data (or offset) to the Virtual Size (which becomes 0xca6f) and Size of Raw Data (which becomes 0xcc00), we find a remainder (and, thus, code cave) of 0x190 bytes. This can be seen in the following hexdump beginning from the highlighted line:

Image for post
Image for post
Figure 4 — Hexdump of target file showing code cave at 0xca70.

Now that we have a basic idea of what a code cave is, let’s discuss how we can inject code into it and, more importantly, how this will different respect to the two formats.

Basic Elements and Steps

Figure 5 — Code for our “Hello, world!” target executable (for both PE and ELF).

In order to understand how these two programs work, let’s get a sense of its basic elements: (1) a target executable — in this case, a “Hello, world!” program seen in Figure 5— into which we will inject our code (assuming there is a sufficiently sized code cave), (2) a program, which we will write, which will inject this code into the target executable and, in the case of our ELF-based program, (3) a separate executable which will contain the code to be injected. (I’ll go into more detail later about why it will be implemented this way.) Now that we know the basic elements, let’s get a better understanding of how the injecting program, which contains most of the code, is to be constructed. (The source code for these programs can be found here)

In order to highlight the similarities and differences between these two formats, I’ve attempted to standardize the overall process to the greatest extent possible. This involved, first and foremost, breaking each program down into a series of generic steps common to both programs. There are some steps which could not be perfectly aligned (sometimes due to difference between the file formats, sometimes because of implementation choices). These are shown in sub-steps. The steps and sub-steps are as follows:

Image for post
Image for post
Figure 6 — Generic steps common to our two programs.

From this we can see that the basic logic of these two programs are the same, but that there are some major differences. For example 3a-b includes some additional steps in order to determine the payload size. Additionally, 6d shows that, in the case of the PE files, we must write the address of WinAPI functions used in the payload. There is, however, one repeatedly mentioned difference: the references to segments (ELF) and sections (PE) in 4a and 6b with respect to the same sub-steps. This is somewhat confusing because the ELF format contains reference to both segments and sections, the sections of the ELF and PE executables have similar names and functions and, yet, here, it is sections and segments which are discussed in parallel. For this reason, let’s devote a little more time to these concepts.

Sections and Segments

As mentioned above, the ELF file format contains the notions of both sections and segments. Though these two notions corresponds to two, independent series of headers (section headers for sections and program headers for segments), these two series of headers represent two views of the same data. The first view, known as the “linking” view, divides this data on the basis of the section headers which represent different data categories (instructions, data, symbol table, relocation information, etc.). The second view, known as the “execution” view, divides this data on the basis of the program headers which contain information about how to create the process image. Of particular importance to us are segments of the PT_LOAD type, which are mapped into memory.

Though these are indeed two ways of viewing the same data, this does not mean that there is no correspondence between sections and segments. In fact, each segment “contains” one or more sections, i.e. the data of one section is contained in one segment. Which sections are contained in which segments can be seen in the section to segment mapping table produced by the readelf utility:

Figure 7— Section to Segment mapping for target ELF executable.
Figure 8— Section headers with sections in executable load segment highlighted.

If we refer back to Figure 1, we can see that the third segment is our executable load segment and that, from this mapping (Figure 6), this segment contains the .init, .plt, .plt.got, .text and .fini sections. That these are two views which correspond to the same data can be seem from the fact that the executable load segment and the above sections are both located between offset 0x1000 and 0x125c. In the case of the executable load segment, 0x1000 is Offset and 0x125c is equal to Offset (0x1000) + MemSiz (0x125c). In the case of the sections already mentioned, 0x1000 is the Off of the first .init section and 0x125c is equal to Off (0x1248) + Size (0x14) of the last .fini section.

The PE format, on the other hand, lacks a distinction between linking and execution views and only contains the notion of sections. Thus, while the ELF executable contains program headers which instruct how the data (and sections) are to be mapped into memory segments, the PE format simply maps relevant sections into memory.

Now that we understand how the programs will work and some of the basic underlying concepts, let’s take a look at these programs.

1) Open and Map Target File

The first step of out programs is to open the file, get the file size and map it to memory. In the both programs, the steps are the same but the functions which we will use are different. Specifically, will are using standard C functions for the ELF executable and WinAPI functions for the PE equivalent. They can be seen in the source code linked above. Since all of this is well documented and doesn’t pertain the ELF and PE file formats, let’s skip discussing how these functions work, what they return, etc. and move on to the next section.

2) Get Target File Header(s)

The next step is to find the file headers. It is at this point that differences in the two programs and formats begin to emerge. A header is a structure or a series of structures, usually at the beginning or “head” of a file, which contains important information about the file. The ELF executable is simpler in that it has only one file header. It’s contents are as follows:

Image for post
Image for post
Figure 9— ELF header structure.
Figure 10— ELF Header of our target executable as shown by the readelf utility.

Figures 9 and 10 show the structure of the ELF header and the contents of our target executable’s ELF header. This header contains a lot of important information, but let’s focus on what it has to say about the program’s sections and segments. In particular, it contains the offset of the program and section headers (e_phoff = 52 bytes and e_shoff = 14329 bytes, respectively), the size of those headers (e_phentsize = 32 bytes and e_shentsize = 40 bytes), and their quantity (e_phnum = 11 and e_shnum = 30). This information will be important, as we shall see later, for locating these headers and iterating over them.

The ELF header is found — using structures contained in the elf.h header file — by casting the pointer to the beginning of the file in memory (returned by the mmap function) to the ELF32_Hdr struct:

Image for post
Image for post
Figure 11 — Code to find the ELF Header where target_addr is the address of the mapping returned by the mmap function.

The equivalent information in the PE format is distributed among multiple headers and sub-headers. The PE file format has two main headers: the MS-DOS and PE headers. The PE header, is turn, is divided into three elements: the signature, the COFF header and the Optional header.

The PE format begins with the MS-DOS stub, a small MS-DOS executable consisting of the MS-DOS header and a small program. The original purpose of this header was to make the file a valid MS-DOS executable which, when run in DOS mode, would indicate that the file cannot run by simply printing the message “This program cannot be run in DOS mode.” From the perspective of the PE format, the most important member of this header structure is the e_lfanew member, which specifies the offset of the PE header. The contents of the latter are the following:

Figure 12— PE header structure.

As we can see, the PE headers contains three elements: the signature (“PE\0\0”, “PE” followed by two null bytes) which identifies the file as a PE format image file and two separate headers: the file header and the optional header. The optional header is, as the name indicates, optional. Information about this header and a number of other elements similar to the elements of the ELF header are found in the file header.

Figure 13 — PE file header structure.
Figure 14 — File header of our target PE executable.

The file header contains some elements which we have already seen (machine type, number of sections, a pointer to the symbol table, etc.), but it also contains the size of the optional header: when the optional header is empty (as is the case for object files), SizeOfOptionalHeader is set to zero. Otherwise, it contains the size in bytes of the IMAGE_OPTIONAL_HEADER structure. This structure contains 31 elements:

Figure 15 —PE optional header structure.
Figure 16 — Optional header of our target PE executable.

This is probably not the place to examine each of these elements. Let’s simply call attention to those elements which mirror those of the ELF header or other information which we have mentioned. We see, for instance, AddresOfEntryPoint, which specifies the address of the entry point relative to the image base (something which can also be specified under ImageBase). This header also contains file and section alignment, notions to which we were introduced in our description of a code cave. Lastly, we can see the size of the section headers.

As was the case for the ELF file, the IMAGE_DOS_HEADER is obtaining by casting the address returned by the mapping function (in this case, CreateFileMapping) to the IMAGE_DOS_HEADER structure. The PE header, in turn, is obtained by adding the file offset to the PE header (e_lfanew) to the address of the DOS header. Finally, the Optional Header can then be accessed from pinh using the -> operator:

Figure 17— Code to find the MS-DOS and PE headers where lpTarget is the address returned by CreateFileMapping.

3) Get Payload Size

The next section of our two programs attempts to find the size of the payload which is to be injected into the code cave. While there are some major differences between how the two program go about finding this size, this stems from the manner in which this payload is implemented. If, for instance, we had chosen to simply include this payload as shellcode (that is, as a string of hexadecimal instructions), there wouldn’t have been any major differences between the two implementations; it would simply have been a matter of finding the length of each string. However, since I wanted to emphasize some of the differences in the assembly instructions which our payloads contain and since the available means of doing this are different in the two cases, the programs have some major difference in implementation.

The Windows example is quite simple in that we can use the __declspec keyword with the extended modifier “naked” to create a function (indicating the start of the payload) containing the assembly code without a prologue or epilogue. This makes it relatively simply to find the size of the payload: we create an empty function immediately after this function and then subtract the address of the former (more specifically, its address) from that of the latter.

Figure 18 — Code containing payload to be injected into code cave. This payload write “POC” to the console using the WinAPI function WriteConsoleA, which requires us to first get a handle to stdout, and then returns execution to the original entry point.
Image for post
Image for post
Figure 19— Code calculating size of payload subtracting address of Payload function containing instructions from empty function PayloadEnd.

The way that this is implemented in our ELF-based program is significantly more complicated. This is because Linux lacks (as far as I know) a simple way of getting the same functionality introduced by the Microsoft-based __declspec keyword and the “naked’ extended modifier (a function without prologue or epilogue) in the context of a function declaration, thus, inhibiting our ability to define the boundaries of the payload which we used to determine its size. In order to find the payload size while also exhibiting our assembly instructions — and with the added benefit of interacting with sections in ELF files –, the payload is contained not in our injecting program, but in a separate executable. That executable is complied from the following x86 assembly instructions:

Image for post
Image for post
Figure 20 — Assembly instructions of payload for ELF program contained in separate ELF binary. These instructions write ‘POC’ to stdout using the sys_write Linux syscall — rather than the WinAPI function WriteConsoleA — and then return execution to the original entry point.

Finding these instructions involves opening and mapping our payload executable and finding its headers in the same way as we have done for our target executable. It’s at this point, however, that the difference between the PE and ELF programs again emerge. Recall that, whereas the PE format has only sections, the ELF executable has both sections and segments (it’s “two views”) and that, from the perspective of execution, it is ELF segments that correspond to PE sections. Since we are attempting to find the instructions contained in the payload’s .text section (visible in Figure 20) and only those instructions and, since the segment containing these instructions may also contain other information, we will have to find the executable’s .text section.

Finding the .text section is — compared to the PE format — a relatively complex process, which can be seen in the following code (preceded by the ELF section header structure):

Image for post
Image for post
Figure 21 — ELF section header structure.
Image for post
Image for post
Figure 22 — Code to find the .text section of the ELF executable containing our payload and, thereby, to find the size of the payload.

First we must find the section header table. The section header table is a sequence of contiguous section headers representing all the sections which the file contains, each of the ELF32_Shdr type. This table (or, more specifically, the first header in the sequence) is obtained by adding the section header table’s file offset (e_shoff in the ELF header) to the mapping address. Each of these section headers (which together make up the section header table) can be obtained by indexing into the section header table up to the number of section headers (given in the ELF header by e_shnum).

However, this is still not enough to find the section name, which is not contained in the section header. Rather, these names are contained in the section name string table. This table is also represented by a section header which is obtained by indexing into the section header table by the value of e_shstrndx found in the ELF header. The table itself can then be found by adding the file offset of the section table (found in the section header of the section name string table, sh_shoffset) to the mapping address. From there, the name of each section can be found by adding the index into the section name string table for the relevant section (the value of sh_name in each section header) to the address of the string table and casting it to a char. For each section, the section name thus obtained is compared with the string “.text”.

Having found the text section, we can acquire the size of the payload in a slightly easier way: by finding the value of the sh_size member of the section header.

4) Find Code Cave

The next task is to find a sufficiently sized code cave, that is, a sequence of contiguous null bytes which is greater than or equal to the size of the payload. In order to find such a code cave, we must first decide where in the file we shall look for it. In theory, this could be anywhere in memory that a sufficiently sized code cave exists, but we will take a slightly narrower approach. I’ve chosen to use the executable load segment of the ELF binary and the .text section of the PE binary. I have chosen these for a few reasons: (1) this segment and section are already set to “executable” and, while we can certainly set other sections and segments to executable, (2) if we are considering malware, some antivirus software will flag unexpected executable sections or segments as suspicious or malicious.

First we must locate this segment and section. In terms of our PE binary, finding the .text section is much simpler than what we saw in the previous section. It can be found in the following way (preceded by the PE section header structure)

Figure 23 — PE section header structure.
Image for post
Image for post
Figure 24 — Code to find PE .text section.

Whereas the ELF binary required us to add the section header offset to the mapping address to obtain the section header, the Windows API contains a function IMAGE_FIRST_SECTION() which takes the PE header as input. Furthermore, whereas, in an ELF binary, you can iterate over section headers by indexing the section header structure by the number of section headers, in the Windows API you can simply increment the section header. Finally, the manner in which the name of the section is stored is also slightly different: unlike the ELF section header, which contains an index into the section name string table, the PE section header has a member (Name) which contains the name of the section.

For our ELF binary, we will search for the beginning of the load segment which is set to executable and its size in bytes in order to iterate over it. We do this in the following way (preceded by the ELF program header structure):

Image for post
Image for post
Figure 25 — ELF program header structure.
Figure 26 — Code to find the beginning and size of the executable load segment in which we will search for a code cave.

To do this we first locate the first segment header. This is done by adding the segment header offset (the value of the member e_phoff in the ELF header) to the mapping address. The way we iterate through the program headers is different than the techniques which we have seen in the case of both the ELF section headers and PE section headers. It is neither a question of indexing into the section headers (ELF) or of incrementing the section header (PE) but, in this case, of adding the program header size to the previous program header address. This size is the value of the e_phentsize found in the ELF header. Once we are able to iterate through the headers it is a matter of locating the executable load segment, that is, the segment which is both of the PT_LOAD type (indicated by the p_type member of the program header) and which has the executable flag set (indicated by the p_flags member of the program header).

Once we have found the section or segment in which the code cave will be sought, the process for finding it is both trivial and the same in each case and doesn’t require any explanation (it can be found in the source code for these programs). The same is also true for section 6 of our programs, which injects the payloads into the code cave, and we will pass them over.

6) Patch Target File

All that remains to be done is to patch the target file to run the payload. Traditionally, a code cave would also return execution to the original target file but I will skip this step (to be honest, I had a couple problems with doing this symmetrically for the two programs).

Configuring the executable to run our payload involves making the entry point “point” to the beginning of the code cave. The way this is done is slightly different in the two cases. For the ELF binary, we follow the following procedure:

Figure 27 — Code to set the entry point of the target executable to point to our code cave.

First, we save the original entry point (oep), which we need to return execution to the original flow of execution. Next, we must locate the base address. This is the value of the first and lowest virtual address (p_vaddr) of the PT_LOAD segments. We find this in a similar way as we did when we searched for the executable load segment (a more precise formulation would have truncated this value to the greatest multiple of the page size less than or equal to it, but in our case the value of p_vaddr is equal to this multiple). Once this is found, we add the base address to the offset of our code cave and assign it to the e_entry member of the ELF header.

This procedure is slightly different for our PE file:

Figure 28 — Code to set the AddressOfEntryPoint of the target executable to our code cave.

Here, rather than adding the base address to the code cave offset, we find the address of the code cave relative to the image base. An address which is relative to the image base is what is known as a Relative Virtual Address (RVA) and it is in this form that the AddressOfEntryPoint member is formulated. We find this by subtracting the .text section’s PointerToRawData (the section’s file offset) from is VirtualAddress (which, in actual fact, is its RVA) and adding to this the file offset of the code cave.

The second important task is to make our payload return execution to the original entry point and the target file’s normal flow of execution (printing “Hello, world!”). This is done is both cases by overwriting the placeholder for this address in our payload with the original entry point. This address is pushed onto the stack and returned to with the ret instruction. The code to do this is pretty much the same in both cases:

Figure 29 — Code to overwrite the placeholder for the original entry point, which allows us to return to the target ELF executable’s original flow of execution.
Figure 30 — Code to perform the same task for the target PE executable.

The third important task is to augment the size of the relevant sections and segments:

Image for post
Image for post
Figure 31— Code to augment the size of .text section (PE).
Image for post
Image for post
Figure 32— Code to augment the size of the executable load segment (ELF).

Our last task, unique to our PE-related program, is to replace the placeholders for function addresses left in our payload as seen in Figure 17 (0xAAAAAAAA for GetStdHandle and 0xBBBBBBBB WriteConsoleA):

Figure 33 — Code to find the addresses of the GetStdHandle and WriteConsoleA functions and to insert them into our payload.

In order to do this, we must first get a handle to the kernel32.dll which contains these functions using the LoadLibrary function. After having done this, we find the address of these functions using GetProcAddress. We then search our injected shellcode for the placeholders and overwrite them with the appropriate address.

Compiling and running our programs

At this point, all that remains is to compile and run these two programs:

Figure 34 — Compiling and running our code for our ELF binaries.
Image for post
Image for post
Figure 35 — Compiling and running our code for our PE binaries.

In both cases, we compile our injecting (codecave.c) and “Hello, world!” (hello.c) programs. In the case of the ELF program, we also compile our payload (payload.asm) program. We then run the hello executables to show that they are functioning as expected. After this, we run our code cave executable to inject our payload into the hello executable. Finally, that this was successful can be seen by running the hello executable again, which, this time, prints out “POC” (Proof Of Concept) before returning to the original functionality and printing “Hello, world!” Notice that in both cases ASLR (Address Space Layout Randomization) is disabled, which allows us to know, without a lot of extra work, what the address of the original entry point will be during execution. In the case of the ELF binary, this change produces a segmentation fault (I’m not exactly sure why). Thanks for reading!

Thanks

In writing this and learning about these concepts, I drew heavily on two tutorials linked below and I’d like to shout out to their authors, dtm and pico, whoever they may be. I hope some of the clarity and generosity found in their tutorials is reflected here!

References:

https://0x00sec.org/t/pe-file-infection/401

https://0x00sec.org/t/elfun-file-injector/410

http://www.cs.yale.edu/homes/aspnes/pinewiki/attachments/ELF(20)format/ELF_format.pdf

https://docs.microsoft.com/en-us/previous-versions/bb985992(v=msdn.10)?redirectedfrom=MSDN

https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

Cyber Unbound

From philosophy to cybersecurity (and back)

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store