Improving Rails Performance with Better Background Jobs
It is not unusual to think about scalability as an easy thing to achieve. Hosting services such as Heroku are able to provide more resources to our applications, such as RAM or CPU, with just a few clicks, right? However, while upgrading infrastructure is indeed a valid option, I believe most apps out there can scale just as well with code alone.
Tag along to check a few tips on how to use available resources more efficiently, and maybe save your project some money.
Keep an eye on your background jobs
By now you are probably already using some sort of tool to handle background processing, and that's great. Really! This means your application is able to deliver better response times by delegating computationally intensive tasks to a background job.
Lower response times mean you get higher throughput, as each instance of your server can handle more requests per minute. Yay! your application web server is already scaling! But what about your worker instances, how are they responding to all this load?
Without the right approach, the answer is most likely straightforward: Not so well. With that in mind, I’m going to show how my team managed to refactor from a non-scalable background job architecture, that at some point would just stop working at all, to an architecture that has proven to be much more reliable and kept our resources usage at a minimum.
First things first, in order to do any optimization on your application you will need to measure, measure and measure again! Metrics are the most important and useful weapon in your arsenal when optimizing, remember that. I often rely on New Relic for metrics, but there are plenty of options.
What is going on in production?!
The feature request seemed dead simple, sending emails to a list of subscribed users with content they might have interest in. Easy-peasy! Let's query all subscribed users, gather custom content for each one and deliver that email right away! In a hurry to get this simple task done, one could easily write the following code and call it a day.
We tested it on development, it performed nicely. Then we shipped it to production, just to discover something was wrong as soon as the job started running. Users receiving the same email time and time again, Heroku crying with R14 errors (memory quota exceeded), from that point on all sort of errors were showing up. What could possibly have gone wrong?!
Taking a closer look on Heroku's metrics, our biggest problem seems to be memory consumption. R14 errors are bad news because the app will start using swap memory as there is no available RAM. Swapping on Heroku is surprisingly slow and will likely lead to obscure failures.
Ok, let's clean this mess
In order to properly tackle this issue we tried to set up development environment as close as possible to production. Put simply: run your servers using the production flag, set the same environment variables wherever is possible and don't forget to import a database dump.
Most of the, time while experimenting things on development, you can get useful information using the unix top command, which displays a lot of metrics for the given process, but for brevity's sake let's focus on memory.
So we fired up our application and started monitoring it. At the beginning memory consumption was rising fast, eating about 520mb of RAM with just 200 users processed. It was still going up after a thousand of users processed, albeit much slower.
So, no doubt why we had troubles on Heroku. Having about 200 users processed would be enough to use all available RAM and start swapping. I bet it managed to process less than 800 users before the worker failed and restarted, which was causing users to receive the same email over and over again.
Taming object instantiation
Yes. I'm looking at you, ActiveRecord! Querying all subscribed users is clearly not a good idea, since Rails will try and instantiate each one of them for us. But there is an easy way out of this: ActiveRecord's find_in_batches method. And there is more to it, you can set the batch size through the arguments to find_in_batches and experiment with different sizes.
In summary, batch processing allows you to work with the records in batches, thereby instantiating fewer objects and greatly reducing memory consumption. With that handy method, we were able to rewrite our background job and get better results. Using a batch size of 100 allowed us to stay just under the 512mb RAM available at our Heroku worker.
New code and metrics below:
Ok, that's already an improvement and would probably run fine on production. Memory consumption was stable and didn't seem like going beyond 512mb. But we knew we could do better! Staying just under the limit is neither good enough nor safe.
Background Jobs Done Right!
You might have noticed our background job was doing too much. It retrieves all subscribed users and handles emailing all of them! As any other class, workers should be highly specialized in what they do. With that in mind, we went for a new architecture: instead of having one gigantic worker doing it all, we chose having hundreds of small ones doing exactly one thing.
What it really means is that we now have one worker per user to be emailed. This new approach has several benefits for scalability. For instance, chances are your application has multiple kinds of background jobs which will need to run randomly as users interact with the app. When you have a giant background job running for a long time, depending on your resources, there will be no free slots for other jobs to run, leading to obscure errors on your web server.
We can also improve object instantiation with this approach, yeah! By Using another great ActiveRecord method: pluck. This allows us to get an array of user ids, instead of instantiated users, and pass each one as an argument for our new ContentSuggestionWorker. So we wrote another entity to enqueue our new workers.
Another cool thing about having this specialized worker is that we can finally play with the retry option. Since each worker handles just one user, setting this to false will affect only this single user, which isn't really a problem for us.
Setting Sidekiq to not retry upon failure would be bad for us on our previous worker because unprocessed users would not receive any suggestions should any error occur, and this is not something we wanted. On the other hand, having retry set to true caused processed users to receive repeated emails in case of errors.
These are the results we got with this change:
A maximum of around 372mb is way better than what we had. We are no longer getting near the 512mb mark, yay! But we are not done yet, those results gave us clues about where to go next.
Collect that garbage!
Spreading the workload across multiple workers allows the Garbage Collector to do a better job, freeing up some unused allocated space on the memory heap. The reason behind this is that objects related to a worker are able to be swept as soon as it finishes its execution. That's likely what reduced memory consumption for our last solution.
In light of this, we decided to take this Garbage Collection thing to the limit and see what we got. In a nutshell, we forced a Major GC sweep at the end of each worker. Let's see how it performs.
Wow, now that's what I call improvement! Our job now manages to stay under 300mb of memory consumption, leaving enough room for other routines to be processed.
I know someone might say: forcing so many Major GC sweeps doesn't sound like a good idea. While I am keen to agree with that, this is just an example to show the concepts behind our solution. You are free to try and see how things go running with less GC sweeps. Just don't forget to measure it!
On Ruby 2.1+, you are able to experiment even further as there are a lot of environment variables exposed to help tuning Garbage Collection. This post by Thorsten Ball and this one by Sam Saffron are great resources on that matter. There is also this gem called TuneMyGC to help you determine which are the best GC values for your application needs, give it a try!
Sidekiq's concurrency plays a big role on memory too. The higher your concurrency settings, the hungrier your app gets! This happens because there will be more processes running and instantiating things, so you better watch out for this. Sometimes just lowering concurrency will be enough to save resources.
Finally, it could be a good idea to pass only the user_id to the ContentMailer, and let it handle user instantiation. This way we avoid passing complex arguments, which is usually a good practice, specially inside Sidekiq workers.
While this might be a contrived example, don't let it mislead you! With nothing but simple changes in our code, we were able to improve both performance and memory consumption! Take a look this graph comparing our different solutions.
Our journey for scalability improvements isn't over just yet! We still have some tricks upon our sleeves, but they will be brought to light on other posts, as they involve improving other areas of our application. So, stay tuned for the next episodes! :)