A simple fix to improve partial rendering speed by 30% in a large Rails application
When developing a Rails application, performance might not always be our main focus, and things might degrade little by little until we decide to say: “That’s enough, let’s do something about it”. At that point, we might have hundreds of models/views/partials/controllers etc… and as much places where we can seek improvements. Here we will focus on a specific case of partials rendering.
As mentioned in the official doc, partials are a good tool to make views more readable and reduce duplication. But this comes at the (usually small) cost of rendering that partial separately.
Let’s assume we have one of these large rails applications with a template defined in
app/views/layouts/application.html.haml for our pages, that would look like this:
= render :top_bar
This partial would be defined in
app/views/application/_top_bar.html.haml and would just be an empty file as its content is quite irrelevant.
When this gets rendered within the
UserController, it might be quite slow, and the reason is not the inner rendering of that partial. So we can easily deduce that it must be related to fetching the partial itself. If we dig in the
ActionView to see what is happening:
ActionView::PathSet We can go down to its
An there we can take a peek at its code:
def _find_all(path, prefixes, args, outside_app)
prefixes = [prefixes] if String === prefixes
prefixes.each do |prefix|
paths.each do |resolver|
templates = resolver.find_all_anywhere(path, prefix, *args)
templates = resolver.find_all(path, prefix, *args)
return templates unless templates.empty?
We can ignore the
outside_app part as it is only used when rendering a file. So for each prefix we go across paths and we try to find all matching templates on that path (from which we will take the first result afterwards in the
find method). But what are the values for prefixes and paths ?
prefixes will look like this
["users", "application", "base"] because that is how it will get initialized by default (with the
controller_path) in the ViewPath module:
# Override this method in your controller if you want to change paths prefixes for finding views.
# Prefixes defined here will still be added to parents' <tt>._prefixes</tt>.
paths on the other hand will look like
[#Resolver:0x0000...] starting by a Resolver matching to our app, but with potentially a lot of entries, depending on our number of gems.
What is happening is:
- in a first loop it will not match our partial path because
usersdoes not mach anything in the path of our partial,
- on the second loop it will “magically” match (because the “application” coming from the hierarchy of our controller will match the “application” of the path we chose for the template).
And the problem is that when we have a big rails application (views, partials, gems that add to the
view_paths, etc..), going through that prefix loop once can be “very costly”.
= render "application/top_bar"
While not prefixing our partial can be very practical and limit oversized line length and repetition, getting used to the “Rails magic” underneath that might open us to that “semantic error” and performance problem. So the possible solution are:
- always prefixing partials,
- banish “base” and “application” from the view path (use some other folder name),
- be careful to always prefix our partial name when they are in these special paths (“application” and “base”).
In a “real life application”
So of course, all this does not come out of nowhere, but from a performance optimization run on a big Rails 5 application. This Rails app has more than 3000 files in app/views (about 1/3 of these being partials), and 18 “path resolvers” at run-time.
We could observe a drop from
6ms on each concerned
We could also notice that there was no caching involved in the partial rendering (the main reason being that there is no need for any in most cases)
Because we had this problem mainly on our template layouts, the overall impact was quite big, and most request did benefit from the fix, as for example our
As these graphs show, with an average request taking previously around
500ms, nearly a third of that was taken to find partials.
Finding a partial (going through each
path of an application and for each running the resolver) is a CPU intensive process (there is I/O involved, but it seems to be cached at some point inside the path resolver). So we could even see a small dent in the average CPU load of our servers.