Raising shared memory limit of a Kubernetes container
While using Pytorch’s (v1.4.0) Dataloader with multiple workers (num_workers > 0), I encountered the following error,
Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
With this started my couple of hours long struggle for increasing the shared memory size. Now, if one is running a docker container with docker run command, this issue can be handled by inserting following command line argument.
--shm-size=desired_memory_size
However, for running the job on a kubernetes cluster, one needs to include the relevant flag in the corresponding *.yaml file. Internet search provided me with suggestions (link, link, link) to include shm_size tags at different locations but none seemed to help.
Finally, I happened across the solution that had worked for some users. It suggested mounting an emptyDir
to /dev/shm and setting the medium to Memory
.
spec:
volumes:
- name: dshm
emptyDir:
medium: Memory
containers:
- image: image-name #specify your image name here
volumeMounts:
- mountPath: /dev/shm
name: dshm
here,
volumes
- declares the available volume(s)volumeMounts
- points to the volume declared involumes
and specifies the location for mounting that volume within the container .