Introducing Acrobat on the Web, Powered by WebAssembly

Tapan Anand
Feb 21 · 6 min read

PDF documents are a major part of our digital lives and, in an era where we spend most of our time working inside a web browser, enhancing the PDF experience on the web is crucial for providing a seamless, multi-device experience. As the creators of PDF, this led Adobe to envision Acrobat Web; we embarked on our Acrobat Web journey with the introduction of the Document Cloud View SDK last year.

View SDK offers Adobe’s pixel perfect PDF viewing on the web with the promise of performance and ease of integration on all major browsers. It also offers UI customization and integration with Adobe Analytics. You can see View SDK in action here.

PDF rendering and viewing in the View SDK is done purely on the client’s browser. All the client-side PDF heavy lifting is performed by the core component of the View SDK called Acrobat JS.

What is Acrobat JS?

Acrobat JS is a web-based PDF library powered by WebAssembly. WebAssembly (WASM) is an open standard that enables reusing native C/C++/Rust applications in a web browser at near native performance. Acrobat JS leverages WebAssembly by using Adobe’s Mobile PDF library on the web; the same library that powers Adobe’s Acrobat Mobile Apps. The library’s C++ code has been compiled to WebAssembly to bring Adobe’s high-fidelity PDF rendering on the Web.

The Acrobat JS project started in the very early phases of WebAssembly in 2016, which meant the documentation and support was still very new. The browser implementations and the standard were continuously changing. Debugging support was also not very convenient, especially for large codebases like ours. But the fun of working with such an amazing technology made it all worth it.

Acrobat JS rendering technology

Ever since we started working on Acrobat JS, we always had two major goals: good performance and high fidelity. To ensure high fidelity, our rendering technology is based on doing a pixel-perfect rendering of the PDF instead of translating the PDF content streams to Canvas or HTML. The output is a high quality bitmap which is shown to the user using an <img> tag. The bitmap is compressed to PNG to save memory.

Similarly, to ensure that text selection is accurate, we pass the user interaction events to the PDF library, which provides us the quads to be drawn based on the page content. This is in contrast to doing selection on a hidden text layer on top of the bitmap as it may lead to alignment issues when there are fonts embedded in the PDF or if there is a font mismatch or font substitution.

Acrobat JS: The journey

In this section, I will talk about various challenging problems we solved to ensure both our goals for Acrobat JS: performance and high fidelity.

Performance

A major performance metric for us was the time it takes for us to load and initialize the PDF library as a WebAssembly (WASM) module and show the first page to the user. We like to call this metric as timeTillFirstRender. Currently, in the View SDK, this time is under 900ms for 75 percent of files from our benchmarking set (composition derived from real world analytics data). We reached these numbers after working through an improvement of about 300 percent in timeTillFirstRender by using various strategies that we discuss here.

A major bottleneck for timeTillFirstRender was the time it takes for the WASM module to get compiled and ready for use. This was specially difficult when tiered compilation was not part of JS engines. A lot of the work went into keeping the WASM size as low as possible. We used various strategies to accomplish this:

  1. WASM Swapping

Let’s talk about these in more detail.

WASM Swapping

WebAssembly currently doesn’t have built-in exception handling support and thus, in the web environment, it is emulated with the help of JavaScript. This translates to size and performance penalties if exceptions are enabled and used in the code.

In order to overcome this performance penalty for our critical rendering path of showing the first page to the user, we innovated an approach where, by default, we load the thinner WASM file with exceptions disabled and only load the exception enabled binary when an actual exception is encountered. That’s why we call this strategy as WASM Swapping.

We further extended this to implement something similar to a PGO (profile guided optimization), where we identified the hot and cold areas of our code on a big test set and then got rid of all the ‘colder’ areas from the thinner binary.

The below graph shows the gains we observed at the time of implementing this strategy on our benchmarking set:

Initial improvements provided by the WASM swapping approach

As you can see, this offered about 40 percent improvements in both size and performance for us, which was huge. We further made changes in our Mobile PDF Library code to reduce the use of exceptions and also, with advancements in WebAssembly and JS engines, the penalty of exceptions became less costly with time than in the relatively early days. We will still stick to this strategy as it still offers us about 20 percent gains in both size and load time performance.

Dynamic linking

Along our journey of bringing more and more PDF goodness to the web we encountered situations where adding more PDF features to the library meant adding to the WASM size, thus affecting the overall performance. We could have extended our WASM Swapping approach to get around this, but we needed a more long-term approach to this. That’s where dynamic linking comes into picture.

As part of dynamic linking, we divided the WASM module into a main module and several side modules. All the core PDF viewing features reside in the main module and additional features are present in side modules. We load only the main module for the critical rendering path while the side modules are loaded lazily on demand.

This allowed us to keep adding new features without severely affecting the critical rendering path performance by adding incremental features in side modules. The current main module size is 865kB gzipped.

Rendering non-embedded fonts with high fidelity

Fonts required to render text in a PDF are usually embedded as part of the PDF itself, but with documents in the wild, it is not very uncommon to find PDFs where the font is not actually embedded inside the PDF. In such a scenario, we had to fallback to font substitution which meant we could not meet our goal of ensuring high fidelity rendering.

For most native implementations, the font, if present on a user’s machine, can be used or a close substitution of that font could be used. But for us this was not possible since we were not using HTML rendering and the JavaScript and WASM sandbox can’t access native fonts directly. We innovated a solution around the Canvas API and our Mobile PDF library to render such PDFs with the correct font when available on the user’s machine.

The following images show the results of this:

Differences are specially visible in lower-case y and upper-case I.

This is an excerpt from a PDF which requires the font Tahoma for rendering, but it’s not embedded in the PDF. One can notice the differences in fonts between the above two images specially in characters like lower-case “y” and upper-case “I”.

The Future is bright

WebAssembly has opened so many opportunities on the Web. Performing high octane tasks efficiently on the browser is no longer a far-fetched dream. Our journey with running high fidelity rendering on the browser in a performant way demonstrates this.

The WebAssembly working group, the browsers and everyone involved, is doing an awesome job of ensuring WebAssembly keeps getting better and better every day. Some of the recent interesting developments have been around threads and SIMD support and we are working towards using them in the View SDK. We keep a close watch on the upcoming WebAssembly advancements and features. We are really excited about features like Exception Handling and SIMD. We look forward to advancing the Web with the help of WebAssembly.

Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers…

Tapan Anand

Written by

Web Application Developer and Web Security Enthusiast. Software Development Engineer @ Adobe

Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

More From Medium

More from Adobe Tech Blog

More from Adobe Tech Blog

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade