(Semi-)correct handling of log rotation in multiprocess Python applications
Take a deep breath, and prepare to embark on a journey of discovery and self-improvement…
Having multiple processes open the same file for writing/appending is permitted in *NIX systems, with the caveat that if these processes write concurrently, then the file’s contents may end up interleaved or worse, overwriting each other. Overwriting content does not occur if all processes open the file in “append” mode, however their writes will possibly get interleaved and make it harder/impossible to read.
For production deployments of Django web apps using either gunicorn of uwsgi, typically multiple worker processes are spawned which log their messages into the same set of files using the
RotatingFileHandler class. This handler rotates the log file once its size exceeds a predefined threshold by renaming each backup to increase its backup number (e.g. foo.log.1 → foo.log.2) and finally renaming the original log file (foo.log → foo.log.1). The process that performs this rotation then creates a new file named foo.log and carries on logging its messages to this new file. However, as the previous file was simply renamed, any open file descriptors will continue pointing to the same file (since the inode hasn’t changed) and happily perform their I/O operations. This means that other gunicorn/uwsgi worker processes will continue logging messages into the now renamed foo.log.1 file.
The problem gets worse when the next HTTP request is received and taken by a different worker. This worker will attempt to log a message and the
RotatingFileHandler will once again look at the size of the file it is currently writing to and realize that a rotation is due. Therefore, foo.log.2 gets renamed to foo.log.3, foo.log.1 → foo.log.2 and foo.log → foo.log.1.
So now the situation is:
- the current worker is writing to the correct foo.log file;
- the worker that took the previous request is writing to foo.log.1;
- and all other workers are writing to foo.log.2.
Furthermore, when the maximum number of backups is reached, the final backup is deleted (*NIX systems allow deleting files that are open in other processes) and any workers still referring to that file will now write to nowhere (probably /dev/null).
And this is how an arbitrary amount of log messages gets lost into oblivion…
After looking at the source code for
RotatingFileHandler, I came to the conclusion that this handler cannot be used in this scenario because it provides no way to keep writing to the same file (in Python 2.7 at least). Despair not! There is hope!
The very old and wise logrotate program (nicely picked name) can help us here by performing the rotation in a slightly different manner. The logrotate configuration file shown below can be copied (and adapted) into /etc/logrotate.d/ and it will get executed once a day by the cron.daily task.
The crucial bit in this configuration (the copytruncate line) is that we are instructing logrotate to rotate the original file by making a copy of its contents to (`cp foo.log foo.log.1`) and then trucating the original file to zero bytes. Since the file’s name and inode remain the same, any workers logging to that file will continue to do so in blissful ignorance :)
In addition to this, we must now modify the logging configuration in our app to have all loggers use a regular
FileHandler instead of
RotatingFileHandler, since we can’t have Python rotating the logs for the reasons explained above in excruciating detail.
And that concludes our wonderful journey. Lost logs begone!