Exploring ELF files using pyelftools

Roman Storozhenko
Analytics Vidhya
Published in
6 min readOct 11, 2020

Introduction

There are many tools for exploring executable files of ELF format. Most of them intended for providing sole piece of information extracted from a binary in the mentioned format. They are great, but sometimes we need a kind of an universal and yet highly specialized tool allowing to do much more than standard tools are able to. This is a moment when pyelftools come into play.

In this article I would like to show some usage examples of of pyelftools. I don’t show how to use pyelftools itself, that is, its classes and other features, as you can find it in the documentation and source code itself. Instead I concentrate on applications of this tool for particular purposes.

Prerequisites

Environment

The below information is my test environment, yours could be different:

hedin@home:~/projects/elf$ lsb_release -a
LSB Version: core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch
Distributor ID: Ubuntu
Description: Ubuntu 20.04.1 LTS
Release: 20.04
Codename: focal
hedin@home:~/projects/elf$ python3 --version
Python 3.8.5

Requirements

Scripts given in this article require:

  • Python version 3.6 or higher.
  • pyelftools

Installation

There are some Linux-based distributions that contains no python3 or pip3. Also we need to install pyelftools. The code block below is how to install all mentioned on debian-based distros:

sudo apt install python3-pip
pip3 install --upgrade pip
pip3 install pyelftools

Usage Examples

OK, now we have installed pyelftools. But what next? How and why use it? I would like to show some output of standard GNU Binutils tools and then provide pyelftools-based code snippets. All the code snippets are available in my ELF github repository.

Segments of different sizes for in-memory and on-disk representations

The following quote from the specification provides us with information about segments in ELF-files:

An executable or shared object file’s program header table is an array of structures, each describing a segment or other information the system needs to prepare the program for execution. An object file segment contains one or more sections. Program headers are meaningful only for executable and shared object files.

From the programming point of view the structure below shows representation of a segment in an ELF-file header table:

typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;

Two members of the structure are interesting for us:

p_filesz — This member gives the number of bytes in the file image of the segment; it may be zero.

p_memsz — This member gives the number of bytes in the memory image of the segment; it may be zero.

Now we see that the segment’s size in file and in memory could be different. It depends on number and type of sections included in the segments, alignments of that sections and several other reasons. We are interested in finding segments having different sizes. Firstly, let’s look at a great readelf tool usage which allows us extract this information:

hedin@home:~/projects/elf$ readelf --wide --segments /bin/ps

I reduced the full output to segments information:

....................................................................
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0002d8 0x0002d8 R 0x8
INTERP 0x000318 0x0000000000000318 0x0000000000000318 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x009a88 0x009a88 R 0x1000
LOAD 0x00a000 0x000000000000a000 0x000000000000a000 0x00bbf1 0x00bbf1 R E 0x1000
LOAD 0x016000 0x0000000000016000 0x0000000000016000 0x006318 0x006318 R 0x1000
LOAD 0x01cf70 0x000000000001df70 0x000000000001df70 0x004190 0x025478 RW 0x1000
DYNAMIC 0x020ac0 0x0000000000021ac0 0x0000000000021ac0 0x000210 0x000210 RW 0x8
NOTE 0x000338 0x0000000000000338 0x0000000000000338 0x000020 0x000020 R 0x8
NOTE 0x000358 0x0000000000000358 0x0000000000000358 0x000044 0x000044 R 0x4
GNU_PROPERTY 0x000338 0x0000000000000338 0x0000000000000338 0x000020 0x000020 R 0x8
GNU_EH_FRAME 0x019e1c 0x0000000000019e1c 0x0000000000019e1c 0x0007b4 0x0007b4 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x01cf70 0x000000000001df70 0x000000000001df70 0x004090 0x004090 R 0x1
....................................................................

We have all of the required information, but it doesn’t appears in a way which is comfortable for us. To find appropriate segments we need to check each of them and compare FileSiz and MemSiz columns. Let’s write our own basic script which will show us segments of different size for in-memory and on-disk representation. In its core this script is a simple cycle which goes through all the segments of the ELF binary and renders ones satisfy the condition p_filesz != p_memsz:

#!/usr/bin/env python3    import sys
from elftools.elf.elffile import ELFFile
from elftools.elf.segments import Segment
if __name__ == '__main__': if len(sys.argv) < 2:
print("You must provide this script with an elf binary file you want to examine")
exit(1)
print(f"Segments of the file {sys.argv[1]} which size on disk and in memory differs") with open(sys.argv[1], 'rb') as elffile:
for segment in ELFFile(elffile).iter_segments():
if segment.header.p_filesz != segment.header.p_memsz:
seg_head = segment.header
print(f"Type: {seg_head.p_type}\nOffset: {hex(seg_head.p_offset)}\nSize in file:{hex(seg_head.p_filesz)}\nSize in memory:{hex(seg_head.p_memsz)}")

Now let’s find required segments using this script:

hedin@home:~/projects/elf$ python3 segments.py /bin/ps
Segments of the file /bin/ps which size on disk and in memory differs
Type: PT_LOAD
Offset: 0x1cf70
Size in file:0x4190
Size in memory:0x25478

As you can see only one segment in a standard /bin/ps tool has a segment which is differs in memory and on disk.

Representation of mappings between segments and sections

From the previous example we know that each segment could contain many sections. readelf tool give us the mapping between former and latter. I reduced output to the mapping itself, as there are much information that does not relies to the mapping. So this is how it looks like:

hedin@home:~/projects/elf$ readelf — segments — wide /bin/psSection to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .plt.sec .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .data.rel.ro .dynamic .got .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .data.rel.ro .dynamic .got

Well… Not very informative and user friendly, is it? To fix this situation I created a little script that shows the mapping in a more appropriate form. But before I provide this script I will share the definition of the section structure from the elf specification:

typedef struct {
Elf32_Word sh_name;
Elf32_Word sh_type;
Elf32_Word sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
Elf32_Word sh_size;
Elf32_Word sh_link;
Elf32_Word sh_info;
Elf32_Word sh_addralign;
Elf32_Word sh_entsize;
} Elf32_Shdr;

This structure shows us that the section should be identified by its name. In addition I would like to highlight one more interesting member:

sh_addr — If the section will appear in the memory image of a process, this member gives the address at which the section’s first byte should reside. Otherwise, the member contains 0.

I suppose that the section appearance in the output of the script should be identified by section name and it’s address like that (section_name, sh_addr).

What is the script’s behavior? It’s quite simple. Cycle through segments and for each segment cycle through sections belongs to it:

#!/usr/bin/env python3    import sys
from elftools.elf.elffile import ELFFile
if __name__ == '__main__': if len(sys.argv) < 2:
print("You must provide this script with an elf binary file you want to examine")
exit(1)
print(f"Mapping between segments and sections in the file {sys.argv[1]}") elffile = ELFFile(open(sys.argv[1], 'rb')) segments = list()
for segment_idx in range(elffile.num_segments()):
segments.insert(segment_idx, dict())
segments[segment_idx]['segment'] = elffile.get_segment(segment_idx)
segments[segment_idx]['sections'] = list()
for section_idx in range(elffile.num_sections()):
section = elffile.get_section(section_idx)
for segment in segments:
if segment['segment'].section_in_segment(section):
segment['sections'].append(section)
for segment in segments:
seg_head = segment['segment'].header
print("Segment:")
print(f"Type: {seg_head.p_type}\nOffset: {hex(seg_head.p_offset)}\nVirtual address: {hex(seg_head.p_vaddr)}\nPhysical address: {(seg_head.p_paddr)}\nSize in file: {hex(seg_head.p_filesz)}\nSize in memory: {hex(seg_head.p_memsz)}\n")
if segment['sections']:
print("Segment's sections:")
print([(section.name, hex(section['sh_addr'])) for section in segment['sections']], sep=', ', end='\n')
else:
print('Segment contains no sections')
print('\n--------------------------------------------------------------------------------')

The following is the output of this script (I cut off most of the segments to reduce the size):

hedin@home:~/projects/elf$ python3 segments_sections.py /bin/ps    Mapping between segments and sections in the file /bin/ps
Segment:
Type: PT_PHDR
Offset: 0x40
Virtual address: 0x40
Physical address: 64
Size in file: 0x2d8
Size in memory: 0x2d8
Segment contains no sections ----------------------------------------------------------------
Segment:
Type: PT_INTERP
Offset: 0x318
Virtual address: 0x318
Physical address: 792
Size in file: 0x1c
Size in memory: 0x1c
Segment's sections:
[('.interp', '0x318')]
----------------------------------------------------------------
Segment:
Type: PT_LOAD
Offset: 0x0
Virtual address: 0x0
Physical address: 0
Size in file: 0x9a88
Size in memory: 0x9a88
Segment's sections:
[('', '0x0'), ('.interp', '0x318'), ('.note.gnu.property', '0x338'), ('.note.gnu.build-id', '0x358'), ('.note.ABI-tag', '0x37c'), ('.gnu.hash', '0x3a0'), ('.dynsym', '0x3e8'), ('.dynstr', '0xdc0'), ('.gnu.version'
, '0x121e'), ('.gnu.version_r', '0x12f0'), ('.rela.dyn', '0x13a0'), ('.rela.plt', '0x91e8')]
----------------------------------------------------------------
....................................................................

Sections that do not reside in memory

My last example shows some special sections that are not loaded in memory, that is, their sh_addr == 0:

#!/usr/bin/env python3    import sys
from elftools.elf.elffile import ELFFile
if __name__ == '__main__': if len(sys.argv) < 2:
print("You must provide this script with an elf binary file you want to examine")
exit(1)
print(f"Sections of the file {sys.argv[1]} that are not loaded into memory") with open(sys.argv[1], 'rb') as elffile:
for section in ELFFile(elffile).iter_sections():
if not section.header.sh_addr:
print(section.name)

This is the output for bin/ps:

hedin@home:~/projects/elf$ python3 sections_not_in_memory.py /bin/ps
Sections of the file /bin/ps that are not loaded into memory
.gnu_debuglink
.shstrtab

Summary

pyelftools is a very flexible and convenient tool for observation of ELF binaries. It’s scope is far beyond simple examples given in this article and allow to create full-fledged exploring tools.

References

  1. Executable and Linkable Format
  2. pyelftools
  3. GNU Binutils
  4. ELF scripts repository

--

--