What’s in a web browser
Part I — Hardware Acceleration
This article describes the old Android Browser Graphics Architecture, superseded in 2012 by Chrome on Android (in JellyBean, Android 4.2). While it’s a few years old, its graphic architecture was ultimately pretty advanced, and took some interesting detours. If you ever wondered how the graphic pipeline of a modern mobile Web Browser is implemented under the hood, or how software architectures evolve, this article might give you some idea of the challenges involved. Part I mostly lays out the stage and describes the move to hardware acceleration, Part II will expand on some of the advanced features.
Once upon a time, Google had two web browsers: Google Chrome, of course, but also the Android Browser, which was a completely separate codebase and team. Confusingly, both Chrome and the Android Browser were based on WebKit, the HTML rendering engine that Apple used for Safari, itself originally based on KHTML, an earlier engine that came from the KDE project.
The reasons Google had two browsers were both historical and technical — Chrome was focusing on desktop usage and web apps, had a complex security model and a multi-process architecture, while the Android Browser had to work well on mobile, a much less powerful platform, and was pretty much just using WebKit as intended. Priorities differed.
From Android 1.0 to Android 4.2 (2012) the Android web browser was thus simply an android application directly built around the platform’s webview widget. In fact, the webview implemented — it had to! — a full-blown HTML rendering engine that any application could embed, with the browser being its first client.
As devices got more powerful, and with Chrome starting to focus on mobile, the old Android Browser was ultimately replaced by Chrome-on-Android in 2012, while the platform’s webview ended up being replaced by a brand new version (updatable to boot!) based on Chrome’s codebase around 2013, with Android 4.3. Having the same internal codebase makes it a lot easier to stay up to date with webkit or supporting new features across platforms — a welcome improvement for our users and developers.
While the chrome’s codebase is a marked improvement over the old webview, there were quite a few interesting ideas in the old webview’s graphic architecture. I did a presentation about some of this in 2012 at Google IO, although this was more targeted towards android developers. This article, on the other hand, has the advantage compared to the above video that you will not have to suffer through my accented english. I will describe the evolution of the webview rendering architecture as well as some of the reasons behind it.
The Old World
A typical Rendering loop
Before continuing, let’s review how things “should” work in a typical browser. The basics are relatively simple:
- Receives a request to paint an area of the webview (say, after scrolling uncovers a non-painted area, or think about an animated gif)
- Go to webkit, which will traverse the html tree to figure out which elements need to be painted, then use them to paint…
- Update the screen with the newly painted content
Of course, things get a little bit more involved in reality. In fact, the Android webview was doing things quite differently. Remember, we were running on hardware that wasn’t that powerful, and beyond memory usage, battery life was also a critical resource.
The Android Way
The browser (note that I will freely use the terms browser and webview interchangeably in the rest of this article) needed something that would provide a much faster repaint loop than going back to webkit and traversing the document all the time — it was important in order to provide a fast interaction experience to the user (scrolling, zooming…), and the normal repaint loop was not going to cut it. Webkit being written in C++ and the browser’s UI in Java, the constant JNI calls from java to native would not have helped much either.
So, instead of directly asking webkit to paint the screen, the team had a better idea. They decided to modify Webkit to record the list of painting instructions for the full page, but to not paint directly — basically using a “display list” model. The list of painting commands were for example “draw this text here”, or “draw this image there”.
When the user would need to zoom in for example, it was not necessary to ask Webkit to repaint the screen — the webview instead could just replay the already present display list, only changing the zoom factor. This also gave a rather nice side-effect that the screen was always sharp — no text blurriness when zooming in, no missing content, as the content was always being redrawn “the right way”.
You could think of this display list as a way to have a vector representation of the entire page, that you could scale on demand and redraw — the difference between illustrator and photoshop.
The Display List
The repaint loop performance was thus limited only by the time taken to paint the display list on screen, something which was reasonably fast as screen resolutions were relatively low, and the painting library used in the webview happened to be the one used for the entire android platform, Skia, and quite optimized. In fact, the display list itself was simply a recording of the native skia graphics instructions that webkit would have used had it been painting the screen directly, and stored in a SkPicture data structure.
Another particularly nice side-effect of having such a vector representation of the page, a display list, to work with was that it contained a bunch of useful information that could be exploited by the UI. For example :
- keypad navigation was implemented this way
- find text on page
- text selection
- allowing users to tap on an address on the page and being redirected to Google Maps, etc.
The display list generated was usually able to contain the entire page to draw — so if the content was not changing, everything could be completely handled on the UI side, without having to go back to Webkit.
In the (rather common) case where the web page wanted to update, and parts of the page would need to be redrawn, we needed to go back to webkit to regenerate the display list — a costly operation to do for the entire page.
To avoid regenerating the entire display list, we only regenerated (“repainted”) the parts of the display list corresponding to the updated area. Thus, a display list was typically containing a large SkPicture followed by smaller ones and associated bounds. In the example above, the area marked 1 changed, so we had to go back to webkit, regenerating only a SkPicture covering the area 1. The SkPicture 2 and following SkPictures would be added to the overall PictureSet and used instead of the area 1 when painting the screen.
The architecture of the webview stayed roughly as described from Android 1.0 up to Gingerbread — Android 2.3. Then, we started to work on the Honeycomb release, which ended up targeting tablets devices (all our previous releases only targeted phones).
Things didn’t look good.
Not good, at all.
The tablet’s resolution was a lot higher than our previous phones. The performance of the repaint loop, dreadful — in the single digit frames per seconds. Something needed to change. Fast.
One of the thing that I was working on at the time was adding support for CSS 3D animations:
This was a recently introduced feature in webkit that allowed you to specify a 3D transform and apply it to a <div> element in your HTML page. Presumably, you could use it to build much more fancy websites, with animations and elements flying around you left and right.
It looked cool.
It also meant that, to make it work, I had to move the painting of the elements onto separate surfaces so that I could apply a 3D transform.
This had a few interesting consequences. First, if you need to change the position of a content that has been placed onto a layer, you will not need to update the base layer, i.e., no need to go back to Webkit to regenerate the display list for the page. Second, CSS animations, being declarative, were pretty easy to move fully on the UI side (i.e. evaluating them in the UI-side render loop), without needing to communicate with webkit while the animation is playing. Finally, nothing prevented you to apply a 2D transform to receive the same benefits, which means that you could make Web UI that were much more responsive and smooth.
Adding support for the composited layers as well as the UI-side evaluated CSS animations resulted in a massive speedup in specific cases, even if rendering those layers in software, as we were saving up continuous round-trips to webkit (see the falling leaves demo for an example of animation that benefited from the concept of layers, even in 2D). Google Maps on mobile was another candidate for this feature.
With the introduction of CSS3D, the architecture of the android webview changed from a single display list, to many display lists (one for the root layer, one per any CSS3D layer).
I was beginning to idly think about moving to a fully hardware accelerated architecture — as the current architecture was starting to show some scalability issues.
The New World
The existing architecture, based on display lists, had some definitive advantages, as explained previously. But things were also starting to look… worrying, on the performance front.
The biggest apparent problem was that, while the display list approach was faster than having webkit directly paint, and was saving us countless round-trips to webkit, the speed at which we could paint the screen was still entirely dependent on the complexity of the content we wanted to display. It makes sense — more content meant more drawing instructions, more stuff to paint. The round-trip to webkit was happening asynchronously, but the painting itself was still synchronous.
The result was that the speed of the user interactions, particularly zooming and scrolling, would change depending on the website that was displayed. On a given website everything would be smooth, on another… not so much.
When we started to work on Honeycomb and tried the browser on a tablet, the higher resolution made it instantly clear that the existing architecture would simply not scale. The time spent painting the screen was just too much.
Suddenly, idle thoughts about leveraging the CSS3D architecture to move the browser toward hardware acceleration was not something that would be nice, but something we had to do if we wanted to ship.
So, Hardware Acceleration it was.
Hardware Acceleration can mean many different things, from a web browser perspective. The idea is to take advantage of specialized hardware (i.e. GPU) to accelerate an operation, generally drawing in our case. There were broadly two things that we could do:
- Rasterize (paint) the entire display list directly on the graphics card
- Use a tile approach: segment your content into multiple textures organized in tiles
Option #1 was on paper appealing, but was also a lot more refactoring and development work — time we did not necessary had. Ensuring accurate rendering (similar to the software rendering we had) would also be tricky. It also (at the time, on the hardware we had) could results in similar performance issues than we were trying to move away from—i.e. the performances would still depends on the content, although the hope was that rasterizing the content via the GPU would be fast enough to not be a practical problem. Finally, we were not dealing with the browser in isolation here, we were also the framework’s webview — a widget that could be used by any application. The webview was going to have to share the GPU with the rest of the system — in fact, Romain Guy was busy at the time adding hardware acceleration in the Android UI framework, for similar reasons. It would be rather nice if the webview would not hog the entire GPU…
Option #2 was what we ended up picking. Not only was it simpler to integrate— we could reuse most of the existing graphic pipeline of the webview — it was not going to be too taxing for the GPU. The main difference from the previous software rendering pipeline was that instead of directly rasterizing the display list on screen, we would have an intermediate set of tiles covering the screen that we would paint on. The GPU would then simply draw the tiles on screen.
More critically, by having those intermediate tiles, we could completely decouple the scrolling and zooming behavior of the browser from the painting performances — resulting in perfect 60 FPS scrolling and zooming. At least, that was the idea!
For a given viewport — a window onto our page content — we would generate a set of tiles covering it. We would then paint the tiles using the display list coming from Webkit. Rendering the tiles on screen happened on the GPU — each tile is a texture — which allowed us to draw them pretty much instantly on screen, and we could then move the tiles at any position really quickly.
As we could move the tiles instantly, we could move them in lockstep with the user scrolling the page —adding new tiles covering new content as needed — and voila! 60 FPS scrolling.
We also kept a set of tiles that we constantly reused and repainted in a background thread (as seen on the previous diagram, once a tile is outside of the viewport, it can be put back into our pool of tiles and reused to paint a tile that will soon be shown to the user), keeping memory usage reasonable.
Zooming is also much faster with the tiles being drawn by the GPU. Zooming out as shown on the diagram above is as easy as scaling down the current set of tiles (tile set A). In the background, we then kick off the rendering of a new set of tiles that will cover the new viewport (tile set B). When tile set B is ready to be displayed, we can switch between the two with a fade-in transition.
Issues with Tiling
The caveat with this tiling approach was that we would lose our nice, perfectly sharp at all time rendering. Indeed, while drawing the tiles on screen is really fast with the GPU, repainting the content of the tiles themselves can still be slow; and if the tiles are repainted too slowly, the user will get missing tiles (while scrolling) or blurry content (while zooming).
Still, it was by far preferable to the agonizingly slow pure software rendering experience on high-resolution displays; and the advantage of this approach was that we would still be able to leverage the existence of the display list to paint things faster than purely via webkit.
Bottom line: We could ship!
Now that we managed to wrestle good performances from the tablet, it was time to look back at the composited layers integration. After all, we were running in an OpenGL context all the time — fully supporting the CSS 3D layers was a lot more straightforward, and we might be able to finish the implementation in time for the Honeycomb release.
An important architectural pivot in our webview happened around that time. It started small.
Composited layers, as implemented in WebKit at the time, only worked in few cases:
- if you applied a css transform to an element
- if you had a video, it would be moved to a separate layer (helping with compositing)
Why couldn’t we extend layers support to other cases?
Remember, compared to the “normal” webkit rendering mechanism, we had a few advantages and disadvantages:
- We were faster to redraw content we already had, as we could simply replay the display list
- …But we were slower at painting new content, as we had the intermediate generation of the display list plus the painting, and now, the upload to GL textures.
We had one particularly painful example of a HTML behavior that was dreadfully slow with our existing, display-list based, architecture.
In essence, fixed elements are positioned not relative to the document, but relative to the viewport — to the window. It means that they will stay where they are even if you are scrolling. Great feature, used in many, many websites. Particularly mobile websites.
This was the absolute worst possible scenario for our rendering architecture. It meant that every time the user would scroll a web page with such an element, we had to go back to webkit, ask for a new display list for the entire page, come back and regenerate all the tiles covering the screen. Rinse and repeat.
But… we now had composited layers in our toolbox. We decided to move such fixed elements on their own composited layer. This completely solved the slowness issue — the fixed elements would never need to have the entire page being repainted anymore. All they would need is to be positioned directly from the UI thread, and we would have perfectly smooth, buttery 60 FPS scrolling behavior, even with those pesky fixed elements in the page.
…Predictably, the positioning turned out to be not as easy as it initially seemed — it took a couple releases to iron out some corner cases, if I recall. Still, extending composited layers to other HTML use cases was overall a success.
Framework integration: GLFunctor
Early on during the development of the hardware acceleration support in the webview, there was a need to integrate it with the rest of the Android framework acceleration work that Romain Guy was doing for the Honeycomb release.
We were trying hard to avoid unnecessary recopy of textures — bandwidth was limited. We could have created a completely separate GL context for the webview to work in, but that would not have been optimal. So… after a quick discussion, we introduced a GLFunctor — a function that was simply calling the webview OpenGL renderer directly from the framework, sharing the same GL context as the rest of the application.
As the framework and the webview knew what they were modifying, GL-wise, the framework could save and restore the GL state before and after calling the webview. The neat thing with this trick was that the GLFunctor was in essence a direct function call to the webview, plus the set of necessary parameters that goes with it — matrix applied to the view, etc. The UI framework could then directly store the GLFunctor straight into its own display lists.
Hard to be more efficient.
As described previously, tiling has some drawbacks — mostly, the painting of the tiles can sometimes be too slow to get all the tiles ready for the current viewport. Which results into missing tiles — not an optimal user experience.
Bar improving the painting performances, it seems it’s just something you have to live with. Or do you?
Remember, we had a display list at our disposal. One thing that we can use the display list for is to repaint the same content, much, much faster than if we had to go back to webkit. How could we leverage this?
Well, we could generate a second set of tiles, at a lower resolution, thereby covering more area of the document for the same cost. And display those tiles when the tiles at the correct resolution are not available.
Why would we ever want to do this? aren’t we supposed to not want blurry areas? We definitely don’t want blurry areas, that’s true. However, when scrolling, two things will happen:
- you are not scrolling very fast. Painting the tiles can catch up, everything is good.
- you are scrolling very fast — flinging. Painting the tiles will be hopelessly behind.
For #2, having a second tile set of blurry tile is a perfect solution: they will cover a lot more area, and your eye will much prefer blurry content passing by quickly than empty areas or checkerboards patterns where a tile should be. In fact, in many cases it was hard to see that the content scrolling fast was not rendered at the correct resolution. Fake it ’til you make it!
Another trick we were doing to improve scrolling user experience is that we generated more tiles than purely necessary to cover the viewport, and were pre-rendering tiles that were going in the direction of the scroll.
After a few grueling months working on it, we shipped this version of the browser in Honeycomb. We had hardware-acceleration working, scrolling and zooming working well. We had some measure of composited layers support (I believe the first version did not have tiled layers). A lot more to do in front of us. Reviews were pretty good, the UI improvements adding a lot as well (particularly tabs, incognito mode (ah!) and pie-menus).
We were not at this point drastically more innovative than other browser rendering pipelines — tiling had become the obvious solution for many platforms. We were starting to be quite fast though, thanks to the display list approach. We got faster. Way faster :)
Part II will describe more advanced parts of the rendering pipeline and display list magic.