Vertical scaling a Python application running on mod_wsgi

3 min readOct 23, 2017

Curiosity starts at finding out ways to improve time spent by request in waiting for its turn. In other terms, improving congestion at the entrance. If you are using Newrelic as a monitoring tool in your infrastructure, it can be very easy to find the average time spent by requests in a queue.

Newrelic web transaction time graph (showing Request queuing specifically)

How to write optimized queries to database? How to leverage caching? Common practice to write performant code? is something which is mostly talked about over the space.

But are your also making the maximum use of your server resources?

Infrastructure cost optimization is a big task within almost every organization. So maximum server resources utilization is one approach towards it and writing performant code is the best way.

You can run Python-based application on Apache with mod_wsgi. This is surely one of the way, not the only one. I am currently discussing about it here, maybe a separate post covering rest of them in future.

With mod_wsgi, you can run multiple processes running isolated instances of your application. Each process is capable of handling multiple requests concurrently depending on your thread pool size defined in your Apache configuration.

You can use WSGIDaemonProcess directive of Apache to configure processes, threads size, etc. There are multiple options to this directive, which can be tunned-in to the optimum performance. Some of them are:

processes
threads
memory-limit
request-timeout

WSGIDaemonProcess <NAME1> processes=2 threads=30 display-name=<NAME2> request-timeout=120

*NAME1 — name of the daemon process group
*NAME2 — name to show for the daemon process when using the ps

Most common problem and question around the configuration is the ideal number/ratio for process vs threads count?

Just start with 2–2–30

There is no fine set of rules which can define — what should be the best CPU core to process to thread ratio. You can initially start with one machine → 2 cores → 2 processes → 30 threads. Play around with the numbers to find the best fit for you.

Keeping the count of the process to 2 and threads to 30 doesn’t mean that your server can serve only 30 request/sec. If your APIs response time is even 200ms, a single thread can serve 5 requests a second. That means a single server with two CPU cores can handle request at 9000 rpm throughput.

Value of request-timeout should be chosen very wisely. It can result into 5XX response. Whenever a request exceeds the value defined in request-timeout, Apache restarts the Daemon process. It results in damage to the request being served currently by other threads in the same process.

In this state, that API call has to be found among all which is taking time more than request-timeout.

Then comes a rule of thumb, Logs are the best torchbearer, that can take you to the root cause. So whenever there is a restart of WSGI Daemon process, Apache dumps the beautiful looking elegant stack trace for you. That looks like:

[TIMESTAMP] [wsgi:info] [pid 1234:tid 123456789987654] mod_wsgi (pid=1234): Shutdown requested ‘NAME1’.
[TIMESTAMP] [wsgi:info] [pid 1234:tid 123456789987654] mod_wsgi (pid=1234): Dumping stack trace for active Python threads.

… then goes the actual stack trace …

*NAME1 is the same name you have defined as an options to WSGIDaemonProcess directive

All you need to do is, grep -i ‘Dumping stack trace for active Python threads’ in your apache error log files.

You can use Newrelic to monitor restart count of these process, in Newrelic Reports (Reports → Capacity → App instance restart by host).

You can also find out, how busy your Application instances are from the same Capacity Report.

Keep exploring! Make the maxmium out of what you have ..
.. and don’t forget 2–2–30

Vertical scaling a Python application running on mod_wsgi

Written by Shubham Arora