Introducing Acrobat on the Web, Powered by WebAssembly

Tapan Anand
Feb 21, 2020 · 6 min read

PDF documents are a major part of our digital lives and, in an era where we spend most of our time working inside a web browser, enhancing the PDF experience on the web is crucial for providing a seamless, multi-device experience. As the creators of PDF, this led Adobe to envision Acrobat Web; we embarked on our Acrobat Web journey with the introduction of the Document Cloud PDF Embed API last year.

The PDF Embed API offers Adobe’s pixel perfect PDF viewing on the web with the promise of performance and ease of integration on all major browsers. It also offers UI customization and integration with Adobe Analytics. You can see the Embed API in action here.

PDF rendering and viewing in the Embed API is done purely on the client’s browser. All the client-side PDF heavy lifting is performed by the core component of the Embed API called Acrobat JS.

What is Acrobat JS?

Acrobat JS is a web-based PDF library powered by WebAssembly. WebAssembly (WASM) is an open standard that enables reusing native C/C++/Rust applications in a web browser at near native performance. Acrobat JS leverages WebAssembly by using Adobe’s Mobile PDF library on the web; the same library that powers Adobe’s Acrobat Mobile Apps. The library’s C++ code has been compiled to WebAssembly to bring Adobe’s high-fidelity PDF rendering on the Web.

The Acrobat JS project started in the very early phases of WebAssembly in 2016, which meant the documentation and support was still very new. The browser implementations and the standard were continuously changing. Debugging support was also not very convenient, especially for large codebases like ours. But the fun of working with such an amazing technology made it all worth it.

Acrobat JS rendering technology

Ever since we started working on Acrobat JS, we always had two major goals: good performance and high fidelity. To ensure high fidelity, our rendering technology is based on doing a pixel-perfect rendering of the PDF instead of translating the PDF content streams to Canvas or HTML. The output is a high quality bitmap which is shown to the user using an <img> tag. The bitmap is compressed to PNG to save memory.

Similarly, to ensure that text selection is accurate, we pass the user interaction events to the PDF library, which provides us the quads to be drawn based on the page content. This is in contrast to doing selection on a hidden text layer on top of the bitmap as it may lead to alignment issues when there are fonts embedded in the PDF or if there is a font mismatch or font substitution.

Acrobat JS: The journey

In this section, I will talk about various challenging problems we solved to ensure both our goals for Acrobat JS: performance and high fidelity.

A major performance metric for us was the time it takes for us to load and initialize the PDF library as a WebAssembly (WASM) module and show the first page to the user. We like to call this metric as timeTillFirstRender. Currently, in Acrobat JS, this time is under 900ms for 75 percent of files from our benchmarking set (composition derived from real world analytics data). We reached these numbers after working through an improvement of about 300 percent in timeTillFirstRender by using various strategies that we discuss here.

A major bottleneck for timeTillFirstRender was the time it takes for the WASM module to get compiled and ready for use. This was specially difficult when tiered compilation was not part of JS engines. A lot of the work went into keeping the WASM size as low as possible. We used various strategies to accomplish this:

  1. WASM Swapping
  2. Dynamic linking
  3. Moving out font data from the binary: Separating out static data like fonts from the binary and, instead, having it load dynamically when needed helped us in reducing WASM file size by about 1MB.

Let’s talk about these in more detail.

WebAssembly currently doesn’t have built-in exception handling support and thus, in the web environment, it is emulated with the help of JavaScript. This translates to size and performance penalties if exceptions are enabled and used in the code.

In order to overcome this performance penalty for our critical rendering path of showing the first page to the user, we innovated an approach where, by default, we load the thinner WASM file with exceptions disabled and only load the exception enabled binary when an actual exception is encountered. That’s why we call this strategy as WASM Swapping.

We further extended this to implement something similar to a PGO (profile guided optimization), where we identified the hot and cold areas of our code on a big test set and then got rid of all the ‘colder’ areas from the thinner binary.

The below graph shows the gains we observed at the time of implementing this strategy on our benchmarking set:

Initial improvements provided by the WASM swapping approach

As you can see, this offered about 40 percent improvements in both size and performance for us, which was huge. We further made changes in our Mobile PDF Library code to reduce the use of exceptions and also, with advancements in WebAssembly and JS engines, the penalty of exceptions became less costly with time than in the relatively early days. We will still stick to this strategy as it still offers us about 20 percent gains in both size and load time performance.

Along our journey of bringing more and more PDF goodness to the web we encountered situations where adding more PDF features to the library meant adding to the WASM size, thus affecting the overall performance. We could have extended our WASM Swapping approach to get around this, but we needed a more long-term approach to this. That’s where dynamic linking comes into picture.

As part of dynamic linking, we divided the WASM module into a main module and several side modules. All the core PDF viewing features reside in the main module and additional features are present in side modules. We load only the main module for the critical rendering path while the side modules are loaded lazily on demand.

This allowed us to keep adding new features without severely affecting the critical rendering path performance by adding incremental features in side modules. The current main module size is 865kB gzipped.

Fonts required to render text in a PDF are usually embedded as part of the PDF itself, but with documents in the wild, it is not very uncommon to find PDFs where the font is not actually embedded inside the PDF. In such a scenario, we had to fallback to font substitution which meant we could not meet our goal of ensuring high fidelity rendering.

For most native implementations, the font, if present on a user’s machine, can be used or a close substitution of that font could be used. But for us this was not possible since we were not using HTML rendering and the JavaScript and WASM sandbox can’t access native fonts directly. We innovated a solution around the Canvas API and our Mobile PDF library to render such PDFs with the correct font when available on the user’s machine.

The following images show the results of this:

Differences are specially visible in lower-case y and upper-case I.

This is an excerpt from a PDF which requires the font Tahoma for rendering, but it’s not embedded in the PDF. One can notice the differences in fonts between the above two images specially in characters like lower-case “y” and upper-case “I”.

The Future is bright

WebAssembly has opened so many opportunities on the Web. Performing high octane tasks efficiently on the browser is no longer a far-fetched dream. Our journey with running high fidelity rendering on the browser in a performant way demonstrates this.

The WebAssembly working group, the browsers and everyone involved, is doing an awesome job of ensuring WebAssembly keeps getting better and better every day. Some of the recent interesting developments have been around threads and SIMD support and we are working towards using them in Acrobat JS. We keep a close watch on the upcoming WebAssembly advancements and features. We are really excited about features like Exception Handling and SIMD. We look forward to advancing the Web with the help of WebAssembly.

Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store