Back in 2006 I was working on the Gmail team and we were undertaking a complete rewrite of the frontend code. Gmail’s original web client had strokes of genius in it but was getting really hard to maintain and was limiting new feature development.

The new version had a bunch of enhancements: including a robust request service, using both XHRs and Iframes (for incremental responses); there was an event driven store that abstracted the UI’s data access; we were now using templates that compiled to JS functions; and we were employing many other techniques that are now considered best practice, but at the time were in their infancy.

Things were going great, then a few months into development we started to notice that the app was performing really badly in Internet Explorer 6. Given that one of the primary goals of the new client was speed, and that a large percentage of our users were on IE, this was a bit of a blow.

This was before the advent of modern web development tools. The first version of Firebug had come out a few months earlier and for IE you could use the Visual Studio debugger, but they were really limited and there were no good profiling tools. So when we started noticing slowness we resorted to liberally annotating the code with tracers that logged to our own debug window—no console in IE remember—and started manually searching for hotspots.

It started to look like a lot of the time was being spent doing string manipulation; escaping text and concatenating templates. Homing in on a particular hotspot we saw 200ms spent doing a simple string concatenation. Very strange.

It became clear that the slowness was inside native code, not our JS, but why was IE taking so long over seemingly simple operations? It was at this point that Jon Perlow—Gmail Frontend’s tech lead for many years—broke out windbg. After some digging he found that the pauses we were experiencing corresponded with JScript garbage collections…

Doing some research, we stumbled across Eric Lippert’s 2003 blog post explaining how the JScript garbage collector worked. The key piece of information was from comment added in 2005:

The heuristics are we do a GC on the next statement after any one of the following limits are passed since the previous GC: 0x100 variables/temps/etc allocated 0x1000 array slots allocated 0x10000 bytes of strings allocated

In other words every 256 allocations the GC would run. And because the runtime of the garbage collection routine scaled linearly with the size of the working set, the overall effect was quadratic when doing complex operations that allocated lots of objects. Further compounded by the large code base and all the data we were now storing on the client.

Using windbg, Jon identified the JScript call where objects were being marked, allowing us to figure out exactly what constituted an allocation. Creating objects obviously counted as an allocation, but the nasty surprise was that each string literal counted as an allocation. So x += 'foo' was two allocations, one for ‘foo’ and one for the new string assigned to ‘x’. This is why best-practices dictated for so long that you should use arrays to build strings instead of string concatenation; the string builder pattern triggered fewer GCs and was therefore faster.

From the comments on Lippert’s blog, it was obvious that Microsoft was aware of the issue. They had even released a JScript patch that tweaked the GC heuristics, alleviating the problem. But it would be a long time before the patch was made available as a critical update and it was infeasible to expect Gmail’s millions of IE users to install the update. We needed to work around the problem.

What came next was one of the most brilliant hacks I’ve witnessed. Using the Detours library from MS Research Jon wrote an ActiveX control that instrumented JScript allowing us to intercept GC allocations and collections from JavaScript. We then tied this into our tracer library and were able to see how many objects specific function calls allocated and how many GCs occurred during critical paths.

With this information, we could more easily update our core libraries to avoid object allocations through smarter string concatenation and object pooling. We also wrote passes for the Closure Compiler to help reduce the size of the working set and to reduce allocations.

Even though we made a lot of progress we didn’t initially launch the renovated web client to IE6 users. It wasn’t until mid-2008, after Microsoft had made the JScript patch available as part of Windows Update, that we opened up access.

The optimizations weren’t completely in vain though, they still provided performance improvements in patched IE6 and IE7.

Epilogue

Perhaps ironically though, the changes had negative side effects elsewhere. The changes we made to the libraries caused an increase in code size and complexity that ended off hurting further down the road when we no longer cared about IE6. And some of the compiler changes ended off hurting GZIP compression and have since been backed out.