Linux CPU cgroups primer — by example

M Castelino
Kubehells
Published in
5 min readApr 11, 2019

cgroups and CPU constraints

From the Linux Kernel: Documentation/cgroups/cpu.txt

- cpu.shares: The weight of each group living in the same hierarchy, that translates into the amount of CPU it is expected to get. Upon cgroup creation, each group gets assigned a default of 1024. The percentage of CPU assigned to the cgroup is the value of shares divided by the sum of all shares in all cgroups in the same level.- cpu.cfs_period_us: The duration in microseconds of each scheduler period, for bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will improve throughput at the expense of latency, since the scheduler will be able to sustain a cpu-bound workload for longer. The opposite of true for smaller periods. Note that this only affects non-RT tasks that are scheduled by the
CFS scheduler.
- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us in for the current group will be allowed to run. For instance, if it is set to half of cpu_period_us, the cgroup will only be able to peak run for 50 % of the time. One should note that this represents aggregate time over all CPUs in the system. Therefore, in order to allow full usage of two CPUs, for instance, one should set this value to twice the value of cfs_period_us.

Let’s get our hands dirty

Let us start two jobs

#bash -c "exec -a jobmore stress-ng --cpu 3 --timeout 120m"stress-ng --cpu 8 --timeout 120m" &
#bash -c "exec -a jobless stress-ng --cpu 3 --timeout 120m"stress-ng --cpu 8 --timeout 120m" &

and then examine the resource usage

#htop
...
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 68.6 0.1 3:43.63 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 68.0 0.1 3:43.67 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 67.3 0.1 3:43.67 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:08.10 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:07.73 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:08.31 jobmore --cpu 3 --timeout 120m

At this point the jobs are using all of the CPUs on the system the best they can

Constrain the quota (upper bound)

#mkdir /sys/fs/cgroup/cpu/testcg
#mkdir /sys/fs/cgroup/cpu/testcg/jobless
#mkdir /sys/fs/cgroup/cpu/testcg/jobmore
#echo "24036" > /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24037" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24038" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24039" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24137" > /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24138" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24139" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24140" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks

Now let us upper bound at the parent cgroup level and split the time amongst the children

#echo 300000 > /sys/fs/cgroup/cpu/testcg/cpu.cfs_quota_us
#echo 100000 > /sys/fs/cgroup/cpu/testcg/jobless/cpu.cfs_quota_us
#echo 200000 > /sys/fs/cgroup/cpu/testcg/jobmore/cpu.cfs_quota_us

and look at the resource utilization

#htop
...
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 32.4 0.1 5:53.28 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 34.4 0.1 5:53.68 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 34.4 0.1 5:53.53 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 66.1 0.1 6:49.54 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 66.1 0.1 6:49.39 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 68.0 0.1 6:50.38 jobmore --cpu 3 --timeout 120m

So we see that all the jobs fit within 3 CPUs. And furthermore jobless gets only 1 CPU and jobmore gets 2 CPUs.

Assured quota (lower bound)

Now let us give them the same upper bound

/sys/fs/cgroup/cpu/testcg/cpu.cfs_quota_us
200000
/sys/fs/cgroup/cpu/testcg/jobless/cpu.cfs_quota_us
100000
/sys/fs/cgroup/cpu/testcg/jobmore/cpu.cfs_quota_us
100000

We see the CPU get split evenly

24036 mrcastel   20   0 53800  5712  5392 S  0.0  0.1  0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.10 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.36 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 33.9 0.1 9:24.20 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:54.74 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.6 0.1 11:55.03 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:55.74 jobmore --cpu 3 --timeout 120m

But let us say jobmore is more important, so let us setup the shares accordingly

/sys/fs/cgroup/cpu/testcg/cpu.shares
1024
/sys/fs/cgroup/cpu/testcg/jobless/cpu.shares
24
/sys/fs/cgroup/cpu/testcg/jobmore/cpu.shares
1000

We do not quite see what we expected, that is because there are enough free CPUs floating or top is not reporting the numbers correctly.

24036 mrcastel   20   0 53800  5712  5392 S  0.0  0.1  0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.10 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.36 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 33.9 0.1 9:24.20 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:54.74 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.6 0.1 11:55.03 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:55.74 jobmore --cpu 3 --timeout 120m

So let us force all the tasks to the same cpu

$ taskset -p 10 24036
$ taskset -p 10 24037
$ taskset -p 10 24038
$ taskset -p 10 24039
$ taskset -p 10 24137
$ taskset -p 10 24138
$ taskset -p 10 24139
$ taskset -p 10 24140

Now you see jobmore get the correct lower bound, even though both jobs have the same upper bound.

24036 mrcastel   20   0 53800  5712  5392 S  0.0  0.1  0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 0.7 0.1 15:34.61 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 0.7 0.1 15:35.11 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 1.3 0.1 15:38.99 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 32.9 0.1 18:15.75 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.3 0.1 18:14.36 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 32.3 0.1 18:15.22 jobmore --cpu 3 --timeout 120m

--

--