Bug Bytes II: Solving Python Multiprocessing locks

Rajat Kanti Bhattacharjee
csmadeeasy
Published in
5 min readMar 16, 2020

This one is less of a bug but more of a testament of how human stupidity has no limits. Before we begin the bug was actually introduced because of an untested code by a contributor and also resolved by me moments back. Resolving which took a lunch with a friend , two bus trips and pondering about what exactly python is not happy about passing. 🤣🤣 !!!
OK lets start with dissecting the matter then ….

Editor image because I need a placeholder image plus it’s cool.

CONTEXT

So the bug was in the project Conference-Notify , it’s an open source project currently in works. Without getting into much detail , I will give a high level over view of the project architecture. The project has a Scrapper-Service , Notifier-Service , Search-Service and user application. The bug was in the Scrapper Service. So lets look into a code section of the app.py , the starting point of the project.

Code responsible for all this app.py

When I originally wrote the scrapper the long time plan was to have each Scrapper (there are multiple scrapper code dedicated to each site for aggregation) will be running in independent processes for better CPU utilization plus parallel execution. (on cloud it can also mean better network utilization since higher bandwidth). So all of my code are put in these Scrapper classes which are loaded using python importlib as plugins to the main application code which handles logging and database and now multiprocessing them. Well the first obvious mistake that I did is to assume that Python Multiprocessing will behave like C multiprocessing i.e with forks. So in effect the whole memory information should be copied. Apparently on windows that is not the case. But since I had this assumption I got the idea of creating a context manager for Multiprocessing. Although there was Pool already existing but Pool have their own issue of process scheduling and memory allocation which I did not wanted. I wanted full blown separate Python interpreter process handling each of plugin code.

This is the monster I came up with

I still find the idea neat. Alas !! in multiprocessing.py

Well if you look into the code you will realize runnable is actually a function. And if you look into my previous code snippet your will notice the run function was passed as the argument. So far all good. Then the wretched problems started.

Problem list

  1. Non-Picklable data
  2. Passing objects function from parent context to child is risky
  3. Deadlocks due to unknown lock variable used by libraries

The experts among you might have already realized about the problems by now. But for those who didn't realized let’s look at what went wrong.

BUG

The data cannot be pickled was easy to tackle. Going through few stack over flow led me to the solution.

The dill package happen to serialize a lot better. You can looks at the code give by this answer to see how you can circumvent issues of passing all data types of python to process functions

They also suggest Pathos , but using that would mean using an extra library which is not standard and we understand less about and extra reading. Not good for the project and contributors. Though the problem is not fully resolved issues surrounding sockets still remain. But those can be mitigated creating the multiprocessing library in a different way. We just have to make sure anything that cannot be handled by dill is not needed to be passed.

Again sending in object will lead to something like this.

Observe the sequence and now see the code, the logs shown here are generated by one of the plugin

Observe the code sequence given below and think what just happened

Scrapper.py , wikicfp.py code

If we consider how the executable was passed , from the code given above. We realise that the __init__ and __del__ was invoked because the invocation was from the object that are context of the app.py process and not the subprocesses. This is one important bread crumb here. Realizing this was pivotal. Though I was thinking I am passing the object along with it because I was still in the “fork” mode. Once I realized that’s not the case , I am making a function call on an object that is lying in a separate process things were more clear. So I refactored the code.

RESOLUTION

Fixed issue of object passing

Well the serialization and object passing was resolved by now. So was the deadlock. Well what changed exactly. We did not addressed the deadlock. Well apparently it was about what we were passing the scrapper class.

If your observe here

execute(    scrapper  , 
log_level = log_level,
log_stream = log_stream ,
log_folder = log_folder,
database_module = Database_module,
db_configuration = db_configuration
)

The log_stream in the code was not a string or simple object it was this

logging.FileHandler( "{}/{}.log".format(log_folder , context_name) )

That is a logger stream handler for files. Now here is what happened. This logging.FileHandler uses some sort of lock on to the file given and this locked object was existing in the app.py , i.e the parent process of all child plugin processes. Now earlier with pickle the error was already telling me that there was a lock object somewhere laying around. I assumed it must be an issue with MongoClient object getting created. But I was wrong the lock was from this stream handler used by logger in python. Now I knew what went wrong. We were passing the lock of parent process to it’s child process without telling the the OS. Our dill module successfully serialized it , so it went under our node. Locks like this should not be transferred as they are provided in respect to the OS and it’s way of identifying process vs resource can be messed up by something like this. This lead to child process not having the write rights to the file since it was locked to its parent process. In short do not pass locks. But again in my defense I never realized it was holding the lock. Well after I took the code of creating the stream somewhere else which can be imported by the class code of scrapper. Things were fine.

Lesson learned. Always be more careful with libraries dealing with files from now on. One bad argument pass when doing Multiprocessing we will be left wondering where did we f**ked up.

😁 Thanx for reading if you found the article insightful do leave a clap or your input.

--

--

Rajat Kanti Bhattacharjee
csmadeeasy

Your everyday programmer. Talking about everyday engineering. Love natural science, physics buff and definitely not good at no-scope 360.