Introducing PowerStation

Have you ever experienced going to a website that takes forever to load and wondered why? You are not alone. Even well-developed, popular web applications have performance issues. Some of these apps have pages that can take way more than 2 seconds to load, and this can easily “make or break” your website. As you might have guessed, performance is one of the top most mentioned issues in many web applications’ bug tracking systems.

Architecture of Web Applications

To understand the causes of these performance issues, we need to understand how web apps are constructed on the server side. By far the most common is the model-view-controller architecture implemented by many object-relational mapping frameworks (ORMs) (e.g., Rails, Django). The idea of ORMs is to free developers from needing to know how to talk to the database (often using a specialized language such as SQL) where their application data is stored, but instead allow developers to write their applications in general-purpose languages like Ruby or Python using the ORM APIs. During runtime, the ORM translates the API calls to SQL queries to manipulate persistent data stored in the backend databases. The retrieved data is then rendered and sent back to the client such as a browser.

Abstracting database operations into APIs written in the same language as the rest of the application is a good idea. However, this abstraction can also lead to problems: retrieving data from the database often takes 10 to even 10 million times longer than going to the main memory on the web server, and developers are no longer aware of which function call can turn into a long-running SQL query due to the ORM abstraction. There are many blog posts discussing various issues, including the infamous “N+1” problem ([1][2][3], to name a few). However, many of them seem anecdotal and we are unaware of any systematic study that looks into what are the common performance issues in the wild and how prevalent they are, and that’s what we set out to do.

Studying Web Applications “in the Wild”

To study this problem, we wanted to pick a single framework with a lot of apps that we can get access to easily. At the end, we selected the Ruby on Rails ORM due to its popularity and the number of open-source web apps that are available. We took 12 popular Rails web apps hosted on github and analyzed their source code. Among these we looked at the issues reported on their trackers, profiled their performance using synthetic data, and tried to fix any performance issues that we identified.

Spoiler alert: We found many performance issues that generalize across the applications. Many of them can be easily fixed and lead to substantial performance improvement.

Identifying the issues

The applications we chose covers many domains, from online forums, collaboration platforms, e-commerce, to task management, social networking, and map services. We picked the most-starred apps from each category. Many of them are well-known such as Gitlab, OpenStreetMap, Diaspora, Discourse, etc.

First, we manually sampled over 200 performance-related issues from 12 applications and summarized the common issues that users have reported. Next, we profiled the applications with synthetic data (using the same data distribution as reported by the app developers if available, and scaling the data up). We measured the page loading time for each application using their most recent version, and tried to identify the reasons for the slow ones. We have also released the code that we used for our study.

After examining these slow pages and the code that generates them, we summarize them into performance anti-patterns— common code patterns that lead to bad performance. Altogether we found nine anti-patterns that are prevalent across the applications we studied.

First, some general observations.

Observation 1: Most (11 out of 12) applications have at least one page that take more than 2 seconds to load, and half of them have pages that take over 3 seconds. Many pages have both efficiency and scalability issues.

Observation 2: Most of the time is spent on the server side. Especially for the slow-loading pages, on average over 80% of the time is spent on the server. This suggests that the causes for the slowdown is likely due to server side issues (rather than client side rendering, for instance).

After we identified these anti-patterns, we manually fixed them and re-evaluated the time taken to load pages. Interestingly enough, as shown in Figure 1, most fixes (over 80%) require fewer than 5 lines of code change, but they result in significant speedup in page load time (60% of the fixes has over 2x speedup)!

Fig 1: Average speedup and average lines of code change for fixing one anti-pattern. Each bar represents one type of anti-pattern that is described in detail in our ICSE paper.

Our “top 3 list” of anti-patterns

  • API Misuses

Like many ORMs, Rails comes with many APIs. Some of them share similar functionality but have different performance characteristics. For example:

@user.role_ids = Role.all.map{|r| r.id}
@user.role_ids = Role.all.pluck(:id)

The above two lines do the same thing: return the id's of every role. However, Rails translates them into different SQL queries:

SELECT * FROM roles
SELECT id FROM roles

The second query is more efficient than the first since it only retrieves one of the attributes rather than everything (i.e., *), this is especially important when each role object contains many attributes. But ORM users are unlikely to know unless they look “underneath the covers.” Unfortunately, many other types of API misuses widely exist, and we have summarized them for both the Rails and Django ORM.

  • Unnecessary Computation

Due to abstraction, developers often are unaware of their code issuing multiple queries unnecessarily. For instance:

values.each do |value|​
read_only_attribute_names(user).include?value
end

the code above issues one query per iteration to check if a user includes a value in their attributes. However, as user is unmodified, the entire loop can be replaced with one single query. This pattern is hard to spot, especially when the query is not directly issued in loop but in functions that are called in the loop.

  • Design Tradeoffs

Often times a simple webpage design / layout change can drastically impact performance. For instance, pagination is a common trick to accelerate webpage loading, i.e., render a long list of contents over multiple pages. However, we found that “rendering the entire laundry list” is still a common practice in the applications we studied, and using pagination can reduce loading time by over 90%.

Other tradeoffs we found include expensive but trivial content displayed on webpages, which turn out to have performance implications. For example, one social network application used to show daily summary on a webpage. Developers later found this feature to be too slow as the number of daily emails increased, and other summaries on the same page already provide similar information, so they removed this feature. Similar design decisions exist across different applications.

You can find out about the other patterns we have identified in our ICSE paper.

What’s Next?

Now that we have manually identified these anti-patterns, we have built a new tool called PowerStation that can detect these anti-patterns for Rails application automatically. PowerStation is part of our Hyperloop project for studying web apps and techniques for alleviating performance issues.

PowerStation is a static analyzer that finds potential problematic code for your application. Many of the above anti-patterns can be identified statically, and our tool will point them out in the code. It can also automatically fix some of the issues above mentioned. PowerStation has been integrated into RubyMine, a handy IDE for Rails developers. To use it, all you need to do is just load your app code and click a button. Our project page also includes a demo video of its use.

Using PowerStation

Here’s a sample usage of PowerStation in fixing an API-misuse anti-pattern, as we mentioned in the previous example. PowerStation analyzes the code and highlights all inefficient API calls as shown below:

Figure 2: Powerstation highlights the code that can cause performance issues

Then, when the user clicks the fix button, our tools makes a suggestion to replace the highlighted code with a better API:

Figure 3: Powerstation fixes the anti-pattern by replacing the inefficient API call with one that only returns one of the attributes from the objects.

This example actually comes from an application called ror-ecommerce, and PowerStation’s fix can result in a speedup of 8x for page loading (for 10K records in the roles table). We have more usage examples listed on our website.

Learning more

If you worry about your Rails app running slow, definitely try out PowerStation! We are still in the process of adding more features. If you find any common performance issues, feel free to contact us or submit feature requests on our github page.