Python “multiprocessing” “Can’t pickle…”

陳信宏 Ted Chen
DevOps’ hole
Published in
3 min readMay 26, 2022

Python gets ‘…Can’t pickle…’ when using ‘multiprocessing’.
It‘s more than ‘pickle’ issue but OS related issue.

Background

  • In previous article, I bumped into the ‘lxml’ issue and the solution is to execute the function using ‘lxml’ in a separate process
  • Implement a decorator to wrap a function to run in a separate process
  • The ‘run_on_subprocess’ decorator uses ‘multiprocessing’ Python package
  • Use Macbook for the Python service development
  • The service is hosted on a Python Docker service in formal environment

‘…Can’t pickle…’ Issue

Here is the parts of ‘run_on_subprocess’ decorator.

def run_on_subprocess(func):    @functools.wraps(func)
def _wrapper(*args, **kw):
def subprocess_function():
...
results_queue.put(func(*args, **kw))
...
sub_p = multiprocessing.Process(target=subprocess_function)
sub_p.daemon = True
sub_p.start()
result = results_queue.get()

sub_p.join(timeout=3)
sub_p.close()
...
return result
return _wrapper

And I can use it to wrap a function to run in a separate process.

@run_on_subprocess
def some_lxml_parser():
...

After finishing the ‘run_on_subprocess’ decorator, I deployed it after testing it without any error in local Python Docker service. And everything was fine until I ran my service in my development Python virtualenv on Macbook after the few days. I got this error.

‘Can’t pickle local object’ error

It’s easy to know that the ‘subprocess_function’ can’t be pickled because it’s a local object inside the decorator wrapper function.

But why was there no error on the Docker service for the same code? I checked that virtualenv was using Python 3.10.2 which was the same version as Docker service Python one.

Here was parts of the starting service shell command.

# Got pickle error
(venv) $ python app.py
# Got no error
(venv) $ docker run --rm -v $(pwd):/wd -w /wd python:3.10.2 bash -c 'python app.py'

First I tried to fix the local object not being pickled bug. I moved the ‘subprocess_function’ to module level and passed wrapped function as arguments. And I got another error.

‘Can’t pickle … it’s not the same object as …’ error

It’s also easy to know that decorator wrapped function is not the same as original one. You pass the original function to child process which is not the same as decorator wrapped one. So ‘pickle’ complained about it.

Focus On— Why was there no error on Docker service

I thought that I should figure the first issue, why was there no error on Docker service, out. Those ‘pickle’ issues may be more than problem but not root cause.

The root cause and Solution — multiprocessing.set_start_method(‘fork’)

I found this thread. The caleb said that there may be ‘multiprocessing’ issues between starting process method ‘spawn’ and ‘fork’. And I added multiprocessing.set_start_method('fork') before creating multiprocessing.Process. Surprisingly those errors included ‘pickle’ issues were gone when running the service.

What are starting process method ‘fork’ and ‘spawn’? Here is details:

  • spawn:The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary.
  • fork:The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process.
  • forkserver:…

And here’s why I ran my service with errors and not on Docker service.

  • spawn:Available on Unix and Windows. The default on Windows and macOS.
  • fork:Available on Unix only. The default on Unix.
  • forkserver:…

My service’s docker image is Python 3.10.2 which is Unix OS.

Python3.10.2 OS

And the ‘fork’ one may not need to copy(pickle) resources from parent process. So it evaded the issues.

Summary

The ‘pickle’ issue in ‘multiprocessing’ is because of passing objects as arguments between process.

There is 3 methods to start process in ‘multiprocessing’. And default method is different between macOS and Unix.

The starting process method ‘fork’ inherits all parent resources so that it may not need to use ‘pickle’ to pass the arguments. But note that, safely forking a multithreaded process is problematic(from this).

--

--

陳信宏 Ted Chen
DevOps’ hole

攝影、程式和一些資訊 Photography, coding and information