TL;DR: IE11 looses RAM and contains a hack that transparently creates a new process when it gets to around 1.7GB, SPAs do not allow IE to apply the hack which requires the app to reload or navigate. Programatically reloading the SPA at regular intervals can give IE the opportunity to apply its hack and recycle the process.
I have been working on a large Ember.js Single Page Application (SPA) for the last 2 and a half years. I would guess that about 10 man years of effort has gone into the HTML5 browser app, not counting the backend and other project activities. This makes it the largest SPA that I have worked on and one of the largest that I have seen.
The primary reason for choosing to write an SPA is that the application is not your typical internet application and has to work offline for weeks at a time, collecting and storing data in the browser ready to sync back to the servers when a connection can be made.
The previous version of this application was a traditional desktop application with a local relational database that would sync with a mainframe backend.
My background is in writing web apps with frameworks like Ember, when we examined the requirements this this project it seemed that all the building blocks necessary were already available in HTML5 & Ember… local databases, offline support, data synchronisation (via EmberData), etc… plus you get the simplicity of deployment that comes with all web applications.
The only issue was that the organisation was using Internet Explorer 9 as a standard which doesn’t support all the HTML5 features we needed, but we were assured that by the time the project shipped all computers would be upgraded to IE11, so we went ahead.
Using Google Chrome as a Development Platform
It doesn’t seem very long ago that IE was the browser of choice. It had more features and everything just worked. After you completed your testing in IE then you’d spend the same amount of time again trying to get the same feature to work in Netscape. Then came Firefox, that changed everything. All of a sudden standards mattered and IE didn’t follow standards. There followed an awkward few years when websites didn’t follow standards and only worked in IE, but after we got over that hump everything was better. Then came Chrome, its 6 weeks update cycle and automatic updates. Developers no longer had to worry about which version of Chrome users had, it was always the latest and Firefox later adopted the same approach.
Fast forward a few more years to the present and IE has still has not caught up with the competition. Its release cycle is measured in years not weeks, and even then if your XP machine shipped with IE6, you might still be using it 10 years later! Fortunately there aren’t many XP machines around still but there are a lot of Windows 7 computers running IE 9.
Generally, web developers no longer use IE as a development tool. It’s slow and the inbuilt debugging tools are awkward to use and have a habit of locking or even crashing the browser.
Chrome, on the other hand, is fast and has excellent development and debugging tools included. Whilst it is always the responsibility of the developer to ensure that their features work in IE, we a have dedicated QA team checking that everything works as expected in IE. What could possibly go wrong.
Finding Memory Leaks
Somewhere near the end of the initial development phase it became apparent that we had a memory problem. This was exasperated by our UI design. Instead of displaying each section of a large document on individual screens (as was done in the previous application that we were replacing), it was decided to render all sections on one screen and use anchors to quickly jump between them. Whilst rendering some larger documents IE would allocate 300mb or more just for the necessary DOM elements. Even worse, we notice that IE would not free up all memory when navigating away. When IE starts to approach 2GB RAM it runs out of memory.
If you are lucky you can catch the exception and ask the user to restart IE, if you are unlucky the browser exits without warning.
We immediately assumed that this was a problem with IE, since we didn’t experience the same issue in Chrome. When we tried to run the memory profiler in IE the browser crashed. The amount of RAM needed by IE in order to profile the RAM used puts the process too close to the 2GB limit and it dies, every single time.
However, after spending some time in Chrome profiler we began to think that it was our problem. Since Chrome was using much less RAM to render the same documents the problem just didn’t become evident until much later.
So the bug hunt began.
Fixing by Trial and Error
We started breaking down our components and testing them in isolation, paying particular attention to event subscription and DOM references, and ensuring that each component has a working teardown operation that releases all DOM references. A surprising number of problems were found, especially since many 3rd party UI components do not come with working teardown functions (even some bootstrap components). The assumption being that resources will get recycled on the next page load, which for an SPA is not necessarily the case.
Side Note: Ember has two component teardown methods, willDestroyElement and willClearRender, these can be a little confusing and we were not always using them correctly. Be sure to read the documentation for your framework of choice closely.
The Chrome profiler was extremely useful in helping to find memory leaks in our app and 3rd party components. Especially if you create 2 or more heap snapshots, use the compare tool and pay particular attention to “Detached DOM Tree” items found.
Eventually we thought that we’d fixed all leaks and started another round of system testing. Things looked good in Chrome. Rendering, clearing and re-rendering the same document over and over produced a nice flat line. No unexpected detached DOM tree elements. We could leave the app running for days without issue. However, the problem was still not fixed in IE.
Observing the Black Box
Internet Explorer is a Black Box. Microsoft is one of the last browser vendors who ship a web browser based on closed source technologies. Other venders such as Mozilla, Google, Opera and Apple make use of open source code for their browsers. If there is a problem when using a product based on open source technology you can take a look at the internals and if you are motivated enough try to fix the problem yourself. More importantly, someone else (who’s probably cleverer than me) has probably already troubleshooted and fixed the problem and I just need to wait for the next release, or jump onto the canary/dev channel until the fix is released to stable. If a fix is not already in the pipeline, then I can submit a detailed bug report and someone will probably pick it up.
With a closed source product, you have no code to look at so you are left to guess what is going on internally by observing the external behaviours.
My first assumption was that this can’t be a memory leak in IE. If it were then there would be a lot of users complaining about it. However, during our testing we made some interesting observations drew some conclusions:
- Other major websites had the same problems in IE (it wasn’t just our app).
- An IE process functions normally until about 1.6GB RAM.
- Somewhere between 1.6GB and 1.8GB the browser will recycle the process, which is almost transparent to the user, dropping RAM back down to 200MB.
- The process can only be transparently recycled when the user navigates from one page to another or reloads the page (which may never happen in an SPA).
- With an SPA the browser will throw an out of memory exception somewhere between 1.7GB and 1.8GB.
- If the web app causes IE to consume RAM in a single operation so that it starts at < 1.6GB and completes at > 1.8GB the process will terminate without warning.
Since the initial memory leak bug hunt began, we now had the increase in RAM consumed per render down from 300MB to less than 100MB. This now means that we shouldn’t hit the worst case scenario where IE terminates without warning. We can catch the out of memory exception, but this exception is only occurring because our SPA is not giving IE the opportunity to apply its hack and transparently recycle the process.
To allow IE to apply its hack, we simply need to get the app to “reload” itself at key points. IE then detects that it’s getting close to the memory limit and transparently recycles the process.
The user experience is a little disappointing. It was always a little slow in IE, and now it’s a little slower when the app reloads. Not as bad as you might think, since as an offline app all resources are already local. But at least the users can now work without fear of data loss.
Most SPA apps will probably never experience this IE memory problem, I can’t imagine working on another app with the same size views again, and if I did I would look into how to break up the views into smaller parts (which we are already implementing for the next version of our app).
I’m still not 100% convinced that we did everything we could to help IE GC more RAM, but without working dev tools it’s hard to tell. I’m sure I will revist the issue at some point.
Enterprise customers still insist on using IE, always hard for developers to swallow, but an inconvenient truth. Maybe Microsoft will fix the problems with Microsoft Edge which is coming with Windows 10, but the fact that it will be tied to Window 10 means that it probably won’t be keeping up with the latest web standards and Enterprise users probably won’t get it for many years yet. If you’re developing apps for enterprise customers, IE11 is here to stay, so remember to test early using representative data.