Embedded Linux — Boot-time reduction

Michael Wiesing
smartmechatronics
Published in
6 min readJul 8, 2021

Nowadays it is common, that on embedded devices System On Chips (SoCs) are used where a Linux OS is running. Often a requirement on these devices is that it can be turned on and is ready to use within a short time. This is understandable, especially when you think about a smart watch, mobile phone, kitchen tool or other daily life electronic.

Many manufactures of these SoCs do not deliver a Linux customized to your products special needs, but a Board Support Package (BSP) that you can use to build a reference image with tools like BitBake and the Yocto Project.

As this reference image is usually not optimal in your use case, boot-time reduction is an important topic for development. In this article, I describe how we reduced the boot-time of the device showed in this video. The Linux-OS is created with Yocto, where we decided to use “Das U-Boot” for demonstrating how a speed up can be reached with this very commonly used component.

In the linked video we use a Raspberry Pi 4 with a touch-GUI. This GUI consists of different pages, each showing different information or giving the user the possibility to interact with the device.

The main requirement for the user is that he can access the user interface as soon as possible. Therefore, the application providing the graphical user interface has to start as early as possible. Whereas other applications may be started later.

Bootprocess

Used boot process on the device. Colored boxes mark components meant for optimization

The figure above shows all components in the boot process. The colored components can be optimized on every embedded Linux devices, while the other ones are target specific. In the following sections, I will have a look at each of them one by one. I will show which tools you can use for measuring timings, analyzing the measured data and identifying those items slowing down the boot process. Additionally I will show you some possible actions to reduce the booting time and demonstrate the resulting speed up.

Init-System and Applications

At first, let us start with the processes within the user-space of the device. In modern embedded Linux systems SystemD is used as first process after the kernel. This process starts system applications and services in a common, default order. This means, that applications, like a graphical application, have to wait for former applications to be started.

Tools used for measuring:
Grabserial
Systemd-analyze

Typical optimizations:
— Reordering of application’s startup
— Starting GUI-Application before Init-Application

Starting point: The applications to show sensor data on the touch-GUI are divided into two different programs. The sensor application gathers all the sensor data and sends it out to an internal messaging bus. The graphical application controls the touch-GUI and reads the sensor data from the internal messaging bus.

SystemD controls both application. It starts the graphical application as part of the default boot-target multi-user.target. The sensor application is started automatically as dependency of the graphical application before. SystemD waits with starting the graphical application until the sensor application is started.

Actions: First action is the start of the graphical application in an earlier boot.target. Choosing an earlier boot-target means, that the SystemD service for the graphical application must contain all dependencies on other services/devices or targets it needs. We can remove other default dependencies.

Additionally to that the applications internal startup sequence is modified. The graphical initialization can be done before the messaging bus is up.

Before : Output of systemd-analyze at the graphical application
After: Output of systemd-analyze at the graphical application — modified boot sequence

Results:
Before optimization: 3.56s
After: 0.55s
Thus we saved about 3.01s

By implementing these changes we save a lot of time already. A few 100 milliseconds could be squeezed out of it by starting the graphical application before SystemD as part of the initramfs. As this is a bit more time-consuming we jump to the next component.

Kernel

Next we take a look at the kernel, which is responsible for setting up the interfaces, loading drivers and passing control to the init process.

Tools used for measuring:
— Grabserial
Bootgraph

Typical optimizations:
— Removing not needed periphery
— Moving the loading of later needed kernel moduls to application space
— Removing not needed kernel-modules
— Removing console output

Starting Point: Default kernel configuration of Raspberry Pi (shipped with yocto layer).

Actions:
First of all it is necessary to setup the measuring of the kernel start sequence.
To measure with bootgraph we recompile the kernel with CONFIG_PRINTK_TIME enabled and initcall_debug added to the kernel cmdline.
Now we can do a first measurement of the starting point by booting the device and executing dmesg > boot.log after logging in. This results in a boot log as you can see below.

Log created with dmesg showing impact of probing serial device

To get a first impression of the booting process we can be obtain a graphical representation by executing linux/scripts/bootgraph.pl boot.log > boot.svg. This command must be done within kernel repository.

Bootgraph visualization of the boot.log

In the log we see that the serial device needs a long time. In the graph we can see that vc4-driver takes a long time. Both devices are not needed at this stage of the boot process, thus the drivers of these devices are compiled as modules to be loaded later.

After a few iterations a few more drivers are removed or modularized.

Results:
Before optimization: 1.9s
After: 0.6s
Thus we saved about 1.3s

Bootloader

Last but not least we look into “Das U-Boot”. It is a very common bootloader on embedded linux devices, which supports many devices out of the box. This generic implementations gives us the opportunity to make target specific customizations.

Tools used for measuring:
— Grabserial
— Tracing

Typical optimizations:
— Removing bootdelay
— Creating custom startscript
— Removing not needed functions and hardware support
— Reducing size of kernel and device tree

Starting point: The default U-Boot configuration has a 2 seconds boot delay and a generic boot script, which enables booting over USB, Ethernet and SD-Card.
The boot script tests which hardware is available, whether it can load kernel and device tree from it and uses the first working one. For each of the possible interfaces drivers are loaded at boot time.
The loading time of kernel and device tree is done by reading from a FAT-partition of a standard SD-Card SDHC-1 class 10. The size of kernel is about 6MB which leads in booting process to a load time of roughly 1.3s.

Actions:
Removing the boot delay is easy and has a huge impact. As the application is meant to be started only via SD-Card the start-script is rewritten in the way that it loads kernel and device tree directly. The drivers for booting over network or usb are removed.
Additionally the commands U-Boot offers by default are stripped to the bare minimum.

Results: Removing bootdelay saved 2s, rewriting startscript ~190ms and removing modules and commands ~200ms.

The reduction of the kernel size is not done at this point. This kind of optimization is more complex and thus time-consuming. Besides, the kernel size is commonly reduced while hardening the kernel.

Conclusion

All modifications lead to a total speed up of 6,71s. To come to that point we needed roughly 3 days of work.
Further boot-time reduction is still possible at user-space, kernel and bootloader. For instance, moving start of graphical application before init can save about 0.5s. In the kernels startup-sequence we could save a few 100ms by examining more parts of the kernel. Reducing the kernels size leads to a faster loading, so that removing all not needed modules and features is reasonable.

Thanks for reading and have a nice time with Linux.

--

--