Cgroup v1 in details
In this article, I describe in details the version 1 of cgroup
and its usage with systemd
.
This article requires a previous knowledge about systemd
and cgroup
: see my previous articles:
cgroup filesystem
cgroup
file system is in /sys/fs/cgroup
virtual file system (it only exists in memory and disappears when you shut down the machine.
In the /sys/fs/cgroup
directory you see something like this:
blkio
cpu -> cpu,cpuacct
cpu,cpuacct
cpuacct -> cpu,cpuacct
cpuset
devices
freezer
hugetlb
memory
net_cls -> net_cls,net_prio
net_cls,net_prio
net_prio -> net_cls,net_prio
perf_event
pids
rdma
systemd
unified
Each of these directories represents a cgroup
resource controller (or subsystem or controller).
Controllers
The following controllers are available for cgroup
v1 only:
blkio
: Short for Block Input/Output. It allows to set limits on how fast processes and users can read from or write to block devices. (A block device is something such as a hard drive or a hard drive partition.)cpu
,cpuacct
: these controllers are combined into one single controller. This controller lets you control CPU usage for either processes or users. On a multi-tenant system, it allows you to monitor user's CPU usage.cpuset
: In case of multiple CPU cores systems, this allows you to assign a process to one specific CPU core or a set of CPU cores. This enhances performance by forcing a process to use a portion of the CPU cache that's already been filled with the data and the instructions that the process needs. By default, the Linux kernel scheduler can move processes around from one CPU core to another, or from one set of CPU cores to another. Every time this happens, the running process must access the main system memory to refill the CPU cache. This costs extra CPU cycles, which can hurt performance.devices
: it allows controlling access to system devices.freezer
: It allows you to suspend running processes in acgroup
. This is interesting when you need to move a process from onecgroup
to another.memory
: it allows you to set limits on the amount of system memory that a process or a user can use. It generates automatic reports on memory resources used by those processes & users.net_cls
,net_prio
: it allows you to tag network packets with a class identifier (classid
) that enables the Linux traffic controller & Linux firewalls (thetc
command) to identify packets that originate from a particular process. This is useful to control and prioritize network traffic for variouscgroups
.pids
: it can set a limit on the a number of processes that can run in a cgroup.perf_event
: can group tasks for monitoring by theperf
performance monitoring and reporting utility.rdma
: Remote Direct Memory Access allows one computer to directly access the memory of another computer without having to involve either computer's operating system. This controller is mainly used for parallel computing clusters.hugetlb
: allows you to limit the usage of huge memory pages bycgroup
processes.
Tuning options
Each controller directory has a set of files representing the cgroup
tuning options. These files hold information about any resource control or tuning parameters that you would set.
For example for blkio
, you find:
blkio.bfq.io_service_bytes
blkio.bfq.io_service_bytes_recursive
blkio.bfq.io_serviced
blkio.bfq.io_serviced_recursive
blkio.reset_stats
blkio.throttle.io_service_bytes
blkio.throttle.io_service_bytes_recursive
blkio.throttle.io_serviced
blkio.throttle.io_serviced_recursive
blkio.throttle.read_bps_device
blkio.throttle.read_iops_device
blkio.throttle.write_bps_device
blkio.throttle.write_iops_device
cgroup.clone_children
cgroup.procs
cgroup.sne_behavior
init.scope
machine.slice
notify_on_release
release_agent
system.slice
tasks
user.slice
Each file represents a parameter you can tune for a best performance. We also see directories such as init.scope
, machine.slice
, system.slice
, and user.slice
. Each also has its own set of tuning parameters.
Controlling resources
Resource controllers v1
For examining resource controllers of cgroup
v1, you should install cgroup-tools
:
sudo apt install cgroup-tools
Once installed, you can run lssubsys
to view the active resource controllers:
$ lssubsys
cpuset
cpu,cpuacct
blkio
memory
devices
freezer
net_cls,net_prio
perf_event
hugetlb
pids
rdma
In this case all resource controllers of cgroup
v1 are active.
If you go to /sys/fs/cgroup
directory, you see that each resource controller has its own directory:
$ ls /sys/fs/cgroup
blkio
cpu -> cpu,cpuacct
cpu,cpuacct
cpuacct -> cpu,cpuacct
cpuset
devices
freezer
hugetlb
memory
net_cls -> net_cls,net_prio
net_cls,net_prio
net_prio -> net_cls,net_prio
perf_event
pids
rdma
systemd
unified
Note
Ignore
systemd
andunified
:
systemd
is for the rootcgroup
unified
is for Version 2 controllers
There is 2 symbolic links with cpu
and cpuacct
controllers because these 2 separate controllers are now combined into one.
This is the same for net_cls
and net_prio
controllers.
Here we focus on 3 resources controllers: cpu
, memory
, blkio
because they can be directly configured via systemd
.
Controlling user CPU usage
Imagine a user consuming 100% of your CPU…
First you need to identify him with systemd-cgls
command to get it's user number (let's give 1001
here).
If you want for example reduce drastically its CPU quotas usage to 10%, run:
sudo systemctl set-property user-1001.slice CPUQuota=10%
This command creates some new files in the /etc/systemd
so you need to run:
sudo systemctl daemon-reload
…as you created a new unit file.
Once done, you can see that the user will be limited to 10% but… to explain more in details: for a 4 CPU cores system, defining a CPU quota to 10% means 2.5% per core for a total of 10% spread in 4 cores.
So if you allocate CPUQuota=100%
in reality you allocate 25% by core in this case.
You can simulate CPU stress on a Linux system by installing stress-ng
:
sudo apt install stress-ng
To stress 4 CPU cores you can run with 1001
user profile for example:
#1001 user stress 4 CPU cores
stress-ng -c 4
In this case, 1001
user takes all the CPU resources because by default on Linux all users have unlimited usage of the system resources.
The first time you execute a systemctl set-property
command, you create the system.control
directory under the /etc/systemd
directory, which looks like this:
$ cd /etc/systemd
$ ls -ld system.control/
drwxr-xr-x 3 root root 4096 Jul 14 19:59 system.control/
Under this directory you can find 1001
user slice and under its user slice directory you can find the configurations files for its CPUQuota
:
$ cd /etc/systemd/system.control
$ ls -l
total 4
drwxr-xr-x 2 root root 4096 Jul 14 20:25 user-1001.slice.d
$ cd user-1001.slice.d/
$ ls -l
total 4
-rw-r--r-- 1 root root 143 Jul 14 20:25 50-CPUQuota.conf
$ cat 50-CPUQuota.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Slice]
CPUQuota=10%
Now, be aware that you only need to do a daemon-reload
when you create this file for the first time. Any subsequent changes you make to this file with the systemctl set-property
command will take effect immediately.
In the cgroup
file system, under 1001
user slice directory, you see her current CPUQuota
setting in the cpu.cfs_quota_us
file. Here is what it looks like when set to 10%
:
$ cd /sys/fs/cgroup/cpu/user.slice/user-1001.slice
$ cat cpu.cfs_quota_us
10000
Controlling service CPU usage
To show how to control CPU for a service, I create a specific service: cputest.service
managed by root and I reuse the stress-ng
tool of previous chapter.
To create the service, run:
sudo systemctl edit --full --force cputest.service
It creates a file in /usr/lib/systemd/system
and the content should looks like this, assuming we have 4 CPU cores:
[Unit]
Description=CPU stress test service
[Service]
ExecStart=/usr/bin/stress-ng -c 4
Start the service:
sudo systemctl daemon-reload
sudo systemctl start cputest.service
At this time, the service should take 100% of the CPU resources.
As made for the user, you can set a CPU quota to the service, for example 90% in this case by running:
sudo systemctl set-property cputest.service CPUQuota=90%
sudo systemctl daemon-reload
This command creates a directory in /etc/systemd/system.control/
:
$ cd /etc/systemd/system.control
$ ls -l
total 8
drwxr-xr-x 2 root root 4096 Jul 15 19:15 cputest.service.d
drwxr-xr-x 2 root root 4096 Jul 15 17:53 user-1001.slice.d
Inside the /etc/systemd/system.control/cputest.service.d/
directory, you'll see the 50-CPUQuota.conf
file:
$ cd /etc/systemd/system.control/cputest.service.d
$ cat 50-CPUQuota.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Service]
CPUQuota=90%
This allows cputest.service
to use only about 22.5% of each CPU core... remember last chapter...
In the cgroup
file system, you can see the CPUQuota
set to 90%:
$ cd /sys/fs/cgroup/cpu/system.slice/cputest.service
$ cat cpu.cfs_quota_us
90000
This limit is only placed on the service, and not on the root user who owns the service. The root user can still run other programs and services without any limits.
Another way to limit the CPU usage for this service is to directly specify the quota inside the service file:
#Stop the service
$ sudo systemctl stop cputest.service
#delete the cputest.service.d/ directory that you created with the systemctl set-property command
$ cd /etc/systemd/system.control
$ sudo rm -rf cputest.service.d/
#reload service
$ cd /etc/systemd/system.control
$ sudo systemctl daemon-reload
# start cputest.service.
$ cd/etc/systemd/system.control
$ sudo systemctl start cputest.service
Then stop the service and edit the unit file:
$ sudo systemctl edit --full cputest.service
Add the CPUQuota=90%
line, so that the file now looks like this:
[Unit]
Description=CPU stress test service
[Service]
ExecStart=/usr/bin/stress-ng -c 4
CPUQuota=90%
Save the file and start the service.
Controlling the memory usage for a user
Again, imagine a 1001
user running process that is using all the system memory.
You can simulate a high memory usage of your 1001
user by running stress-ng
:
stress-ng --brk 4
If you want to limit 1001
user programs to 1GB max memory for example, run the following command and execute daemon-reload
:
sudo systemctl set-property user-1001.slice MemoryLimit=1G
sudo systemctl daemon-reload
Like for CPU quota, this command creates specific file into /etc/systemd/system.control/user-1001.slice.d
:
$ cd /etc/systemd/system.control/user-1001.slice.d
$ cat 50-MemoryLimit.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Slice]
MemoryLimit=1073741824
If you want to create temporary modifications just add --runtime
option:
sudo systemctl set-property --runtime user-1001.slice MemoryLimit=1G
sudo systemctl daemon-reload
In this case, you loose your modifications when you reboot the system and it creates a temporary configuration file in the /run/systemd/system.control/user-1001.slice.d/
directory, which looks like this:
$ cd /etc/systemd/system.control/user-1001.slice.d
$ cat 50-MemoryLimit.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Slice]
MemoryLimit=1073741824
Controlling the memory usage for a service
In short, adding memory control to a service is close to user process. Execute the following command (here example of Apache2 service):
sudo systemctl set-property apache2.service MemoryLimit=1G
Then edit the Apache2 service file and in the section [Service]
, add the new parameter without double quotes:
[Service]
...
MemoryLimit=1G
Next, run sudo systemctl daemon-reload
.
Controlling IO usage for a user
Controlling IO means controlling blkio
controller.
To follow the usage of IO, you can install iotop
:
sudo apt install iotop
iotop
allows you to view live consumption of disk IO.
Imagine a 1001
user from whom you want to manage IO in a specific partition. To simulate this control, first let's create a dedicated partition with dd
command with the 1001
user's profile:
$ dd if=/dev/zero of=afile bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 17.4288 s, 602 MB/s
We have created a 10GB file and next the user copies contents of file over the /dev/null
device.
$ dd if=afile of=/dev/null
20480000+0 records in
20480000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 69.2341 s, 151 MB/s
It appears that the user reads this file at an average rate of 151 MB per second.
First we have to know where the user reads the file:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 99.4M 1 loop /snap/core/11316
. . .
. . .
sda 8:0 0 1T 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 930.5G 0 part
└─sda3_crypt 253:0 0 930.5G 0 crypt
├─debian--vg-root 253:1 0 23.3G 0 lvm /
├─debian--vg-var 253:2 0 9.3G 0 lvm /var
├─debian--vg-swap_1 253:3 0 976M 0 lvm [SWAP]
├─debian--vg-tmp 253:4 0 1.9G 0 lvm /tmp
└─debian--vg-home 253:5 0 895.1G 0 lvm /home
sdb 8:16 0 10G 0 disk
└─sdb1 8:17 0 10G 0 part /media/backup
sr0 11:0 1 1024M 0 rom
1001
user has his own home directory. The /home
directory is mounted as logical volume on the /dev/sda3
drive.
Let’s say we want to limitate bandwidth to 1MB, in this case we run the command:
sudo systemctl set-property user-1001.slice BlockIOReadBandwidth="/dev/sda3 1M"
sudo systemctl daemon-reload
In this case the system modification appears permanent by creating a new file in /etc/systemd/system.control/user-1001.slice.d
:
$ cd $/etc/systemd/system.control/user-1001.slice.d
$ cat 50-BlockIOReadBandwidth.conf
# This is a drop-in unit file extension, created via "systemctl set-property"
# or an equivalent operation. Do not edit.
[Slice]
BlockIOReadBandwidth=1000000
Regarding cgroup
the following file appears:
$ cd /sys/fs/cgroup/blkio/user.slice/user-1001.slice
$ cat blkio.throttle.read_bps_device
8:0 1000000
In this blkio.throttle.read_bps_device
file, the 8:0
represents the major and minor numbers of the /dev/sda
device:
vissol@debian:/dev$ ls -l sda
brw-rw---- 1 root disk 8, 0 Oct 5 08:01 sda
Controlling IO usage for a service
As for a user, you can control IO for a service using BlockIOReadBandwidth
parameter for this service. Here an example with Apache2 to control IO by command line:
sudo systemctl set-property apache2.service BlockIOReadBandwidth="/dev/sda 1M"
If you want to set this BlockIOReadBandwidth
parameter in a service file, you need to know that you have to surround the /dev/sda 1M
part with a pair of double quotes. But when you set this in a service file, you do not surround the /dev/sda 1M
within double quotes.
Edit the Apache2 service file and in the section [Service]
, add the new parameter without double quotes:
[Service]
. . .
.. .
BlockIOReadBandwidth=/dev/sda 1M
Next, run sudo systemctl daemon-reload
.
This modification must add the following file & values in cgroup
file system:
$ cd /sys/fs/cgroup/blkio/system.slice/vsftpd.service
$ cat blkio.throttle.read_bps_device
8:0 1000000
In this blkio.throttle.read_bps_device
file, the 8:0
represents the major and minor numbers of the /dev/sda
device as already said:
vissol@debian:/dev$ ls -l sda
brw-rw---- 1 root disk 8, 0 Oct 5 08:01 sda
More about systemd resource control
If you want more parameters to control systemd
via cgroup
, run:
man systemd.resource-control
You will find all the parameters you can set to control users and services programs.
Simply be careful of the cgroup version supported by your system…