Linux CPU cgroups primer — by example
cgroups and CPU constraints
From the Linux Kernel: Documentation/cgroups/cpu.txt
- cpu.shares: The weight of each group living in the same hierarchy, that translates into the amount of CPU it is expected to get. Upon cgroup creation, each group gets assigned a default of 1024. The percentage of CPU assigned to the cgroup is the value of shares divided by the sum of all shares in all cgroups in the same level.- cpu.cfs_period_us: The duration in microseconds of each scheduler period, for bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will improve throughput at the expense of latency, since the scheduler will be able to sustain a cpu-bound workload for longer. The opposite of true for smaller periods. Note that this only affects non-RT tasks that are scheduled by the
CFS scheduler.- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us in for the current group will be allowed to run. For instance, if it is set to half of cpu_period_us, the cgroup will only be able to peak run for 50 % of the time. One should note that this represents aggregate time over all CPUs in the system. Therefore, in order to allow full usage of two CPUs, for instance, one should set this value to twice the value of cfs_period_us.
Let’s get our hands dirty
Let us start two jobs
#bash -c "exec -a jobmore stress-ng --cpu 3 --timeout 120m"stress-ng --cpu 8 --timeout 120m" &
#bash -c "exec -a jobless stress-ng --cpu 3 --timeout 120m"stress-ng --cpu 8 --timeout 120m" &
and then examine the resource usage
#htop
...
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 68.6 0.1 3:43.63 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 68.0 0.1 3:43.67 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 67.3 0.1 3:43.67 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:08.10 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:07.73 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:08.31 jobmore --cpu 3 --timeout 120m
At this point the jobs are using all of the CPUs on the system the best they can
Constrain the quota (upper bound)
#mkdir /sys/fs/cgroup/cpu/testcg
#mkdir /sys/fs/cgroup/cpu/testcg/jobless
#mkdir /sys/fs/cgroup/cpu/testcg/jobmore#echo "24036" > /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24037" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24038" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24039" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks#echo "24137" > /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24138" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24139" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24140" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
Now let us upper bound at the parent cgroup level and split the time amongst the children
#echo 300000 > /sys/fs/cgroup/cpu/testcg/cpu.cfs_quota_us
#echo 100000 > /sys/fs/cgroup/cpu/testcg/jobless/cpu.cfs_quota_us
#echo 200000 > /sys/fs/cgroup/cpu/testcg/jobmore/cpu.cfs_quota_us
and look at the resource utilization
#htop
...
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 32.4 0.1 5:53.28 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 34.4 0.1 5:53.68 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 34.4 0.1 5:53.53 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 66.1 0.1 6:49.54 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 66.1 0.1 6:49.39 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 68.0 0.1 6:50.38 jobmore --cpu 3 --timeout 120m
So we see that all the jobs fit within 3 CPUs. And furthermore jobless gets only 1 CPU and jobmore gets 2 CPUs.
Assured quota (lower bound)
Now let us give them the same upper bound
/sys/fs/cgroup/cpu/testcg/cpu.cfs_quota_us
200000
/sys/fs/cgroup/cpu/testcg/jobless/cpu.cfs_quota_us
100000
/sys/fs/cgroup/cpu/testcg/jobmore/cpu.cfs_quota_us
100000
We see the CPU get split evenly
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.10 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.36 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 33.9 0.1 9:24.20 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:54.74 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.6 0.1 11:55.03 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:55.74 jobmore --cpu 3 --timeout 120m
But let us say jobmore is more important, so let us setup the shares accordingly
/sys/fs/cgroup/cpu/testcg/cpu.shares
1024
/sys/fs/cgroup/cpu/testcg/jobless/cpu.shares
24
/sys/fs/cgroup/cpu/testcg/jobmore/cpu.shares
1000
We do not quite see what we expected, that is because there are enough free CPUs floating or top is not reporting the numbers correctly.
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.10 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.36 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 33.9 0.1 9:24.20 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:54.74 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.6 0.1 11:55.03 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:55.74 jobmore --cpu 3 --timeout 120m
So let us force all the tasks to the same cpu
$ taskset -p 10 24036
$ taskset -p 10 24037
$ taskset -p 10 24038
$ taskset -p 10 24039$ taskset -p 10 24137
$ taskset -p 10 24138
$ taskset -p 10 24139
$ taskset -p 10 24140
Now you see jobmore get the correct lower bound, even though both jobs have the same upper bound.
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 0.7 0.1 15:34.61 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 0.7 0.1 15:35.11 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 1.3 0.1 15:38.99 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 32.9 0.1 18:15.75 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.3 0.1 18:14.36 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 32.3 0.1 18:15.22 jobmore --cpu 3 --timeout 120m