Increasing GoldenLine’s e-mail throughput by 110% with Blackfire

GoldenLine is built on top of the Symfony full-stack framework which is why our e-mail delivery is handled via the built-in Swiftmailer integration.

We are sending nearly 6M e-mails per month (mostly transactional) with spikes during announcements to our 2M registered users.

We’ve built a custom DatabaseSpool which acts like a queue and receives all e-mails and their sending priorities. Priorities are critical to keep a good user experience and push high priority e-mails (like forgotten password reminders) out ASAP and handle other e-mails (like job alerts) afterwards.

Our queue is quite simple and based on MySQL. Yes, MySQL. We’ve made that decision for a couple of reasons. First of all we’ve years of experience with MySQL and it’s not only a mature project but also one we trust. Second we already had a MySQL database with master-slave replication and backups running on professional dedicated hardware. Last but not least MySQL gave us the flexibility of implementing priorities, single delivery guarantee and FIFO order all of which combined is almost impossible with “real queues”.

Workers are pulling messages (which are raw e-mails) out of the queue and handing them over to Amazon SES (via SMTP) for delivery.

We’ve recently noticed that our queue is performing quite slow and the process of sending an e-mail took minutes or even hours to complete. The more e-mails were waiting the more data was stored in our database. We came to a conclusion that the database queue could be a problem and came up with the idea to store the data in Redis. As we all know Redis is super fast, stores all data in-memory and has O(1) time complexity on most operations. Switching to Redis should solve our problem. That sounds reasonable, right?

Wrong! Hopefully we agreed to benchmark our current setup with Blackfire to confirm that the database is our bottleneck. Here is the result:

As you can see SQL queries took only 815 ms (0,27%) to complete and the bottleneck is I/O bound to the fgets() function which constitutes 94,9% of the overall time. Why fgets()? Well as mentioned before we’re talking to Amazon SES through SMTP. The problem is that SMTP is designed as what we nerdy types like to call a chatty protocol. To send a single email using SMTP, you have to send multiple separate commands to the SMTP server and wait for a response on each one. The farther you are from the SMTP server, the longer you have to wait for each response. Our servers are located in Poland and we are using the closest (and only) Amazon SES region in Europe which is Ireland with ~50 ms latency.

The official Amazon SES documentation has a guide on increasing throughput with Amazon SES and two main tips where the most relevant for us:

  1. “Measure your current performance to identify bottlenecks” — that’s what we did with Blackfire and we were able to process 700 e-mails with a single thread in a fixed 5 minute time window.
  2. “Consider using the Amazon SES query API instead of the SMTP endpoint” — that’s what we wanted to try out.

Switching from SMTP to an API was quite easy as it came down to implement a custom transport and replace it in the configuration file:

swiftmailer:
default_mailer: default
mailers:
default:
transport: aws.ses.transport
spool: { type: db }

We did the benchmark again with a new transport but on the same internet connection, the same dummy e-mails and in the same time window of 5 minutes. Our goal was to increase the throughput and push more than 700 e-mails out:

With this setup we were able to push 1480 e-mails just by changing the transport from SMTP to HTTP API. Our process is still I/O bound but it’s now making a single request per e-mail which is even more visible on the awesome network toolbar:

Thanks Blackfire and happy profiling folks!