Realtime Linux

Setting up and operating the RT kernel

Patrick Dahlke
7 min readFeb 6, 2018
Source: https://nowardev.files.wordpress.com/2010/11/power_up_linux.jpg

In order to fulfill the requirements of a real-time system, a system must react to an external event like an interrupt within a defined time frame. Therefore several mechanisms, configurations and implementation rules have to be considered. Giving the Linux kernel possible real-time properties is not difficult:

  • download the kernel
  • download the PREEMPT_RT patch
  • patch the kernel
  • build the kernel
  • restart your system
  • choose the RT kernel

It is also easy to check whether the response behavior of the newly created kernel has actually improved:

  • start cyclictest and wait for a few hours
  • analyze and judge the result

What can be done if latencies are found, i. e. if the kernel starts the user program too late every now and then? There are different measurement methods for this, but this is not quite as easy as making the RT kernel.

For this reason, the individual measurement methods

  • breaktrace with subsequent trace analysis
  • continuous latency recording with peak detection

are described and explained with examples. In addition, there are frequently occurring latency sources such as

  • frequency modulation and
  • sleeping phases

which must be omitted.

Recapitulation: How do I create a realtime capable Linux kernel — e. g. on an Intel PC with a standard distribution?

First select the appropriate RT kernel, whose version is as close as possible to the kernel version of the respective distribution.
How to check kernel version on Linux: uname command: $ uname -r or as alternative the cat command $ cat /proc/version

For example, you can obtain the following information: linux-image-4.9.0-4-amd 4.9.51-1 This means you have kernel version 4, patch 9 and sublevel 51. On the download page of the Linux RT project you will find out that the RT patch patch-4.9.47-rt37.patch.xz is available there. This is the kernel that is closest to the current non-RT version. With the following commands the appropriate source code is downloaded, the patch is downloaded, the archive is unpacked and the patch applied:

mkdir -p /usr/src/kernelscd /usr/src/kernelswget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.9.47.tar.xzwget https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.9/older/patch-4.9.47-rt37.patch.xztar xf linux-4.9.47.tar.xzmv linux-4.9.47 linux-4.9.47-rt37cd linux-4.9.47-rt37xz -d ../patch-4.9.47-rt37.patch.xzpatch -p1 <../patch-4.9.47-rt37.patch

To ensure that the RT kernel supports the current distribution as well as possible, the first step is to take over its kernel configuration, which is located in the /bootdirectory of Debian, as with many other distributions, and which matches the kernel version number. In this case, it is the file /boot/config-4–9.0–4-amd64, which is now copied to the root directory of the kernel source code under the name .config:

cp /boot/config-4.9.0-4-amd64 .config

In the last step, before the kernel can be compiled, the new kernel has to be configured so that the functionality imported with the RT patch is also used. The command make menuconfig is called and we select Processor type and features -> Preemption Model -> Fully Preemptible Kernel (RT).

The kernel is then compiled, installed, and the system must be rebooted.

make -j4 
make modules_install install
reboot

During the restart we select the RT kernel in the boot menu.

Now we have to verify the system’s response behavior.
As a first important step after rebooting, you should make sure that the new kernel is properly configured — at least formally. This is indicated by whether the flags PREEMPT and RT are included in the output of the program uname.

uname -v | cut -d" " -f1-4 
#1 SMP PREEMPT RT

which is obviously the case here. A much better proof, however, is of course to create asynchronous events and check how long it takes the system to allow a user-space-process with real-time priority to react to them. The expected time span can be calculated with the rule of thumb

maximum_latency = clock_interval * 10⁵

This means that, for example, in a system with a clock frequency of 1 GHz and thus a clock interval of 1 ns, a maximum latency of less than 100 µs can be expected. The test program cyclictest, which is included in the RT-Test Suite rt-tests has proven itself for measuring the maximum reaction time of a userspace process and can be obtained from the following repository:

git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git

The program cyclictest also contains a histogram function with which so-called latency plots in the standard form can be produced. In this standard form, latency classes are plotted on the x-axis in µs with a granularity of one microsecond, and on the y-axis the frequencies of measurements per class are displayed in logarithmic representation. This is a histogram with the special feature that the logarithmic y-axis shows both very low and very high frequencies. A shell script can be used to execute cyclictest and generate a standard latency plot from the measured values:

#!/bin/bash

# 1. Run cyclictest
cyclictest -l100000000 -m -Sp90 -i200 -h400 -q >output

# 2. Get maximum latency
max=`grep "Max Latencies" output | tr " " "\n" | sort -n | tail -1 | sed s/^0*//`

# 3. Grep data lines, remove empty lines and create a common field separator
grep -v -e "^#" -e "^$" output | tr " " "\t" >histogram

# 4. Set the number of cores, for example
cores=4

# 5. Create two-column data sets with latency classes and frequency values for each core, for example
for i in `seq 1 $cores`
do
column=`expr $i + 1`
cut -f1,$column histogram >histogram$i
done

# 6. Create plot command header
echo -n -e "set title \"Latency plot\"\n\
set terminal png\n\
set xlabel \"Latency (us), max $max us\"\n\
set logscale y\n\
set xrange [0:400]\n\
set yrange [0.8:*]\n\
set ylabel \"Number of latency samples\"\n\
set output \"plot.png\"\n\
plot " >plotcmd

# 7. Append plot command data references
for i in `seq 1 $cores`
do
if test $i != 1
then
echo -n ", " >>plotcmd
fi
cpuno=`expr $i - 1`
if test $cpuno -lt 10
then
title=" CPU$cpuno"
else
title="CPU$cpuno"
fi
echo -n "\"histogram$i\" using 1:2 title \"$title\" with histeps" >>plotcmd
done

# 8. Execute plot command
gnuplot -persist <plotcmd
# Source: https://www.osadl.org/uploads/media/mklatencyplot.bash

It requires the prior installation of the gnuplot program which is included in most Linux distributions. And a program is required to display the plot.png latency plot generated by the script.

# 1. Run cyclictest
cyclictest -l100000000 -m -Sp90 -i200 -h400 -q >output
# 2. Get maximum latency
max=`grep "Max Latencies" output | tr " " "\n" | sort -n |
tail -1 | sed s/^0*//`
# 3. Grep data lines, no empty lines a common field separator
grep -v -e "^#" -e "^$" output | tr " " "\t" >histogram
# 4. Set the number of cores, for example
cores=4
# 5. Create two-column data sets
for i in `seq 1 $cores`
do
column=`expr $i + 1`
cut -f1,$column histogram >histogram$i
done
# 6. Create plot command header
echo -n -e "set title \"Latency plot\"\n\
set terminal png\n\
set xlabel \"Latency (us), max $max us\"\n\
set logscale y\n\
set xrange [0:400]\n\
set yrange [0.8:*]\n\
set ylabel \"Number of latency samples\"\n\
set output \"plot.png\"\n\
plot " >plotcmd
# 7. Append plot command data references
for i in `seq 1 $cores`
do
if test $i != 1
then
echo -n ", " >>plotcmd
fi
cpuno=`expr $i - 1`
if test $cpuno -lt 10
then
title=" CPU$cpuno"
else
title="CPU$cpuno"
fi
echo -n "\"histogram$i\" using 1:2 title \"$title\" with histeps" >>plotcmd
done
# 8. Execute plot command
gnuplot -persist <plotcmd
Figure 1: good realtime properties of a multicore (4 CPU) architecture

If the script is accepted unchanged, it has a runtime above 5 hours, generating data of 100 million cycles. For a meaningful result, suitable stress scenarios must be set up during the measurement. A typical result is shown in Figure 1 above. The scaling of the x-axis is deliberately very high to compare the result with slower processors or systems that have unsatisfactory real-time properties. Desirable is the steepest possible right-sided flank of the curve. Since this is a processor with a clock frequency of 2.5 GHz and thus a clock interval of 0.4 ns, a maximum latency of 40 µs would be permissible according to the above-mentioned rule of thumb. The measured value of 19µs is considerably lower, so that research into the causes of this latency and the attempt to achieve its further reduction are probably not meaningful.

Analysis of a system with unsatisfying real-time properties

Figure 2: Bad latency plot of a Uniprocessor x86 system

The latency plot shown in Figure 2 is a Uniprocessor system with x86 architecture. This architecture often requires so-called System Management Interrupts (SMIs) with which, for example, certain communication protocols, thermal control measures, and microcode patches. Since the operating system has no way of preventing SMIs, it is quite possible that SMIs with an execution time longer than the acceptable latency of the system may lead to such a system not being able to be used for real-time tasks.
Sometimes, the manufacturer’s repair of the BIOS can eliminate or at least shorten the SMIs to such an extent that it does not interfere with the real-time properties of the system.

— Thanks

Inspired by a talk and slides of Dr. Carsten Embde

--

--

Patrick Dahlke

I consider myself as a hybrid between Security Expert and SW-Engineer. Cybersecurity Expert @ Apex.AI