Python, kubernetes & CPU pinning

Sergei
3 min readSep 27, 2019

Solving issue with performance of an AI project in k8s cluster, I faced with interesting workaround for Python & k8s/docker architecture specific, which I would like to share.

A story of issue. If you work closely with Python you probably heard about GIL. In order not to go into details just a small overview how it can affect your code if you use Python threads. If to do I/O thingies, like files downloading, it will be really helpful and decrease the time, in comparison if to download files one-by-one in one thread. But if you do mathematical calculations, threads won’t decrease the time, rather it grows due time losing on switching between threads. NB! It’s not about some libs like numpy, which are developed with native C-code & and are not affected by GIL.

And in our project we also use threads because of the need to download files and process them with ML libs then. And we were unpleasantly surprised to see that container, deployed on testing k8s cluster, shows performance more 1.5 times less than it expected. Nodes, which we use in k8s, are quite cool 4-cores machines. And more over when we deployed it to another k8s cluster 8-cores nodes, performance was degraded nearly 3 times.

Root cause. When you specify a CPU limit in k8s (and we set it “2”), it doesn’t mean, that these cores belong to container entirely. This number is rather a % of time, during which the host node gives to container to use all cores. That’s why by the way it’s possible to provide fractional numbers of CPU limit in k8s manifest. And container doesn’t see 2 cores in my case, it sees all node cores. For example, multiprocessing.cpu_count() returns 4 on testing cluster, not 2. ‘2 cores’ limit just means, that in a node with 4 cores, container owns 0.5core * 4, and in a node with 8 cores — 0.25core * 8. But Python GIL allows to execute only one thread in one core per one time unit. And totally it means, that our container didn’t use 2 cores, but uses 0.5core or a bit above about due to I/O, which isn’t depending from GIL. I can’t say that this is a well-established fact, rather it should be seen as an assumption explaining quite well the performance degradation.

Solution. Idea was to give container to see and use only 2 cores, in order python isn’t spread to all cores. K8s supports CPU affinity, but it’s still beta and since the version we don’t have. But linux also supports it with taskset, which we decided to use, changing docker command from:

CMD [“python”, “start.py”]

to:

CMD [“taskset”, “-a”, “-c”, “0,1”, “python”, “start.py”]

Checking it our testing k8s cluster we immediately had got the result. But when we rolled out it with several dozens of containers, we found, all containers launched on the same node use mostly 1st & 2nd CPU cores, and other cores are idle. As a fix we decided to choose 2 pinned cores randomly in each container. It more or less balanced CPU load. With bash it’s a bit hard to get pair of random numbers, that’s why it was done inside start.py script:

from multiprocessing import cpu_count
import os
import random
import psutil
random.seed(ord(os.urandom(1))) # 'true' cryptographical randomizationcpu_nums = list(range(psutil.cpu_count()))
random.shuffle(cpu_nums)
proc = psutil.Process(os.getpid())
proc.cpu_affinity(cpu_nums[:2])

We initialised random module with cryptographic randomisation in order to avoid potential problem, that containers will start in the same time and will use identical pseudorandom numbers (a rare case of course, but insured).

Resume. After changes deployment we understood that we can decrease amount of used containers more than 15%, and herewith we had got performance growth nearly 30% in peak load. Although It’s not linear growth taskset looks quite cool variant for CPU affinity, but of course to try native its support on k8s level seems should be better.

--

--

Sergei

Software Engineer. Senior Backend Developer at Pipedrive. PhD in Engineering. My interests are IT, High-Tech, coding, debugging, sport, active lifestyle.