Gunicorn worker configuration for memory

Satendra Pratap
4 min readMay 11, 2018

Suppose we love a sport then what is the best way to improve upon it? We need to understand sport’s nitty gritty well then improve upon or maybe tune on our techniques of playing to provide better results.

Let’s start with nitty gritty of server concepts (limited by my own research and understanding).

There are many types of servers in web technology depending of the kind of tasks they do but we will talk about web server and application server. Web server is an interface to the users and responsible for providing responses to user requests. Web server doesn’t know how to talk to your backend application and would most probably serve the users with static content. On the other hand an Application server can talk to the backend application and generate dynamic results for user requests but it does not interact directly with the users and need a webserver for it. There can be server software which is a webserver and application server both (maybe to cater to the small applications).

Nginx (engine-x) is a webserver and Gunicorn is an application server.

Apache is one of the most popular web server which was designed to spawn a copy of itself to serve each new connection. This kind of architecture is not suitable for non linear scalability of web products and results in high usage of memory and CPU with less throughput.

Nginx was designed differently and more suitable for nonlinear scalability of applications. It can handle huge number of simultaneous connections and requests per second efficiently. It is event based and does not create new processes or threads for each web page request which results in manageable memory and CPU usage even in the case of increased load.

Nginx is modular, event-driven, asynchronous, single-threaded, non-blocking architecture

Nginx processes connections run in a highly efficient run-loop in a single threaded pre-forked processes called workers. It runs a single master process with several (by default 4) worker processes.

$ ps ufax | grep nginx

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 17104 0.0 0.1 85892 1128 ? Ss May08 0:00 nginx: master process /usr/sbin/nginx

www-data 17107 0.0 0.2 86516 2076 ? S May08 0:07 \_ nginx: worker process

www-data 17108 0.0 0.1 86172 1428 ? S May08 0:08 \_ nginx: worker process

www-data 17109 0.0 0.1 86516 1548 ? S May08 0:09 \_ nginx: worker process

www-data 17110 0.0 0.2 86516 2112 ? S May08 0:06 \_ nginx: worker process

As you can see, each nginx worker is not taking much memory and it does not increase much as well while being used heavily.

Gunicorn is also based on pre-forked worker model which means a master process managing several pre-forked worker processes. Each worker can handle a configurable number of requests before worker restarts itself.

Worker models are better than threading model in the sense that a thread causes an issue then it could affect master process which in turn may bring down the system but in case of workers only that particular worker which is handling the request will get affected.

Gunicorn supports different type of workers (by default worker are synchronous and handles single request at a time, syn workers does not support persistent connections and each connection is closed after sending response)

Usually 4–12 gunicorn workers are capable of handling thousands of requests per second but what matters much is the memory used and max-request parameter (maximum number of requests handled by the workers before it restarts, usually used to compensate for memory leaks if any).

We need to configure workers and max-requests very carefully otherwise it becomes tough to debug the issues.

There is a generic rule mentioned in gunicorn documentation that configure number of workers as (2*N + 1) where N is the number of cores. I think this is just a vague logic as you need to consider memory as well.

Suppose each gunicorn worker takes ‘W’ and total system memory is ‘T’ and having N=1 core.

So as per the suggestion minimum number of workers = 2*1 + 1 = 3

Now suppose your application takes ‘A’ memory.

So total memory required with only one worker handling all requests R = W*3 + A

So as long as T is more enough than R, everything is fine but problem comes when suppose Operating System schedule other workers to serve more requests then each worker consumes at least W+A memory. So actually system memory required for gunicorn with 3 workers should be more than (W+A)*3 to avoid random hangs, random no responses or random bad requests responses (for example nginx is used as reverse proxy then it will not get any response if gunicorn worker is crashing because of less memory and in turn nginx will respond with a Bad Request message)

So we need to be careful while configuring gunicorn and consider both number of core and memory.

I believe that Gunicorn documentation at http://docs.gunicorn.org/en/stable/settings.html

should mention about memory consumption as well while configuring workers.

--

--