Slimming Down MindTouch’s Page Response Time

James Andrew Vaughn
5 min readOct 30, 2014

--

Yesterday, MindTouch’s cloud infrastructure handled 6,048,192 HelpRequests across all MindTouch sites, each request creating knowledge or providing it to your customers. MindTouch has a continuing commitment to speed and performance, and delivering all this knowledge presents significant engineering challenges that other customer success software, knowledgebases, and content management systems do not face. MindTouch pages are so much more than static HTML and text, they are also rich with DekiScript code, identity and context-specific content, templated content, global variables, and transcluded pages. Each page that is served to your customer is run through our content engine, which parses, compiles, and renders the finished MindTouch page, all in realtime! The more dynamic content the page contains, the more work and load on the content engine. With more work and more load comes longer page response times.

Back in July, MindTouch Engineering and MindTouch DevOps rolled out two big changes that greatly decreased the median server-side page load times, resulting in faster page loads for customers. The following chart was generated from actual production data, a particularly heavy page requiring considerable DekiScript and content parsing, on a very highly trafficked MindTouch site. Across the X-axis are the days of July 2014. The median and 95th percentile server-side page load times are on the Y-axis. The median page response time, also known as the 50th percentile, gives us the most probable response time of the day (half of the page responses for that day were either faster or slower). The 95th percentile gives us the range of nearly all response times for a day, with the top 5 percent removed as they always represent uncommon spikes or other outliers. Let’s see if you can spot when the optimizations were released:

The first drop in load time (from 727ms to 476ms) was on the 11th of July, and the result of work by MindTouch DevOps and our friends at Amazon Web Services. On that release day, we switched our application servers to Amazon’s newer Compute Optimized C3.2xlarge EC2 instances. These EC2 instances provide each application server with 8 CPU’s and 15 GB of Memory, giving considerably more iron to our content engine, and dropping the median processing time by 250ms! In the world of high-performance web applications, shaving 250ms off every page load is incredible.

On the 22nd of July, all MindTouch sites received Anonymous Page Caching, in this example dropping the median response time to 87ms. Anonymous Page Caching allows us to completely bypass the content engine, under ideal circumstances. If the customer is unauthenticated (they haven’t signed into the MindTouch site), and the page they are visiting was visited once before, the page is fetched from a cache of post-content engine rendered data, requiring no page content processing. Cached page data is stored one hour before the page is re-processed and re-rendered by the content engine. We can use the MindTouch Developer Google Chrome Extension to observe what is happening under the hood.

During this first request, the visited page content is not available in the cache, so it must be fetched from the MindTouch API. In addition, the current site configuration is looked up in the cache, and it too needs to be fetched from the API. The section in purple shows the cache misses, the section in blue shows the subsequent API requests made to fetch (and cache) the page content and site configuration. If you look closely, you’ll see one of the columns in the API request table is called cache. This measures the cache hit ratio for another cache even deeper in the system between the content engine and the database, alleviating pressure on our databases. The total server-side response time is the blue section, 369.42ms for this request.

Visiting the page again is a considerably different experience. This time both the page content and the site configuration are in the cache, requiring no communication with the API. The total response time of 57.81ms represents the time required to fetch data from the cache, apply a skin, stylesheets, and other presentation steps. Your site visitors will get pages served quicker, their questions answered faster, turning them into promoters of your brand! For improved search engine optimization (SEO), a MindTouch site’s time to first byte (TTFB) measurement should be 500ms or less for anonymous site visitors. Assuming a reasonable DNS resolution and compressed page download speed, ~58ms adds a trivial amount of time to the TTFB measurement — great for SEO!

So far we’ve seen the positive effect on a single page on a single site. While this is certainly a consistent way to show improvement, it doesn’t give us the whole story. So let’s look a collection of sites and all their page views before infrastructure and application performance optimizations, afterwards, and today! You can thank our friends at Splunk for the pretty graphs.

Here are 5,716,351 page load events across a collection of sites during the week of June 20th — 28th 2014. Each event is grouped by the time it took us to load the page. These graphs also take into account authenticated requests, such as those for pro members and community members (who do not receive cached pages). It looks like the majority of pages were loaded between 256ms and 512ms.

Here is the week after our performance optimizations, July 11th — 19th 2014. We have 7,380,852 page load events from the same collection of sites. It appears the majority of pages are now loaded somewhere between 128ms and 256ms! Notice the increase in the 64ms — 128ms response times, these are fully cached pages.

How are we doing today? Between October 3rd — 11th 2014 these sites served 10,108,118 pages. Here are the response times. Wow, that’s a lot more fully cached pages!

What’s next? We’d love to extend this caching optimization to ALL site users, including community members, authors, and editors. Due to the sophisticated relationship between MindTouch page content and MindTouch access controls, it will be a complex engineering problem to solve, but we have the best minds at MindTouch working on it!

Originally published at http://mindtouch.com on October 30, 2014.

--

--