Fast Source Map Generation for React Native

I’m David and I work on the JavaScript Tools team in London where we are building React Native’s Packager.

React Native never had accurate source maps because generating them was extremely slow. In order to provide better source maps for React Native we had to completely rewrite our implementation and speed it up significantly. This note describes how we improved source map generation for 1,758,618 single mappings from 3 seconds to 550 milliseconds by optimizing code for execution in v8. This will be available in React Native 0.42, due to be released on 1st March.

A bit of history

We have been using approximated line-by-line source maps since the beginning, because the generation of proper source maps added too much overhead to the bundling process.

The approximated source maps have become a problem over time — not all babel transforms are able to produce code that can be generated on the same line as the source, leading to off-by-some errors that accumulate over the length of a full bundle. One example of this is the class properties transform that moves the properties from the class body into the constructor. Using only one mapping per line was relatively fast, but didn’t offer the necessary precision. Mappings off by a few lines might not have been too much of a problem for crash reports, but they definitely are for tools like debuggers or profilers that become useless if source maps are not correct.

We knew that we could only really fix this issue by not using babel’s retainLines setting, and using real source maps instead of approximations. Babel is able to produce these maps per module, and the source map proposal document describes an alternative format for post processing – Indexed Map – that can be used to efficiently combine multiple source maps into a single one. Problem solved. We tried this but Chrome's Devtools crash when serving index maps for our big React Native apps at Facebook. The bundle simply became too big.

Finding a solution…

Given that the least expensive option wasn’t available, we had to find another fast solution. We gave up on our first approach: We let babel generate a single source map for every module and “concatenated” them. That means we would concatenate the sources, sourcesContent and names arrays. In addition, we decoded and re-encoded the mappings string, carrying over offsets for array indexes and source locations. That traded size of the source map against generation speed, but never got under the magic one second mark for any of our bundles.

Encoding the mappings for each module just to decode them again seemed wasteful. To avoid this cycle, we added support to retrieve “raw mappings” to babel-generator. “Raw mappings” just means that the numeric data is exposed, rather than the base64 VLQ string containing the encoded mappings.

Disappointingly, the first attempt to build a single source map using the source-map library for one of our bundles (which had 1,758,618 single mappings at that time) took over three seconds.

…and making it fast

Being slower albeit doing one fewer encoding and decoding pass was a disappointment, but we were convinced that our approach was right and started poking around to see how we could make the code faster.

A faster source map builder

The source map generator in the source-map library has overhead that isn’t strictly necessary for our scenario. It requires passing in argument objects and does argument validation.

We replaced this general-purpose source map builder with a more specialized version that assumes certain constrains: Mappings are passed in in the order of the generated source, and we don’t check every mapping for an individual source file — it either belongs to the current source file, or it has no source file at all. We also assume that every source file becomes “current” only once. These restrictions neatly fit the data we retrieve from babel.

For our specific case, the “current file” optimization allowed us to replace ~1,700,000 map lookups and ~5,000 insertions with ~5,000 array pushes. That’s a ratio of 1:350, and the operation is presumably cheaper.

Without parameter objects, we avoided the allocation of ~7.8 million objects compared to the implementation in source-map.

invariant and process.env

We had encoded our restrictions as invariants, using the implementation in fbjs/lib/invariant. As it turned out, each call checked process.env.NODE_ENV which is slow to access. Paul O’Shannessy helped us out and delivered a patch in no time that shaved off over a hundred milliseconds.

Allocations

Writing code in JavaScript gives us all the amenities of a managed scripting language: automatic memory, and strings that behave like primitive values. When producing 11 MB of mappings (the size for our set of mappings) from really short strings (1–7 characters), the allocation overhead becomes noticeable. We were still using the base64-VLQ implementation from source-map that produces segments from one to five VLQ encodings (between 1 and 7 characters each). All these segments have to be concatenated again, leading to 8–10 million string concatenations (estimated). Although v8 has various optimizations around string concatenation, the creation of all these small strings does not come for free.

Since mappings can only contain 66 Ascii characters, we were able to replace JavaScript strings with a pre-allocated buffer. If we run out of space, we simply reallocate to double the size of the buffer. This leads to just four reallocations and memory copies for 11 MB of mappings. In consequence, our port of the encode function only operates on numeric values and writes character values to the buffer without creating any strings. v8 does a great job optimizing this code, and it runs really fast. We also replaced the original arithmetic to get the base64 character value with a simple table lookup.

All these changes finally brought us down to one second, and we decided this was shippable.

Engine internals

Profiling the code in isolation showed the expected results: Most CPU time was spent in the encode function and in indexFor. The latter seemed too simple for optimization, having already replaced a lookup object with a faster native map. We wondered whether encode could be made any faster.

Optimization killers on the wiki of the bluebird library is a really good read. It turned out that we were giving away a lot of performance by having ported the encode function to ES2015. More specifically, replacing var with let will lead to function deoptimizations in v8 if the code uses compound-assignment operators. Separating assignment from the operator itself in only two locations led to a speedup of over 20%!

Other ES2015 constructs such as classes or arrow functions turned out not to be problematic. The takeaway here is to read a lot about engine internals, and to measure every change, even if it looks innocent.

Invariant, second act

The final speedup came from removing all calls to invariant from the generation code and replace them with a simple if … throw. We didn’t check the details of the generated code in v8, but we were calling it twice for every mapping. Removing these 3.5 million calls eventually brought us down to incredible 550ms including serialization to JSON.

Native performance

We had considered moving the inner loop to C++ to make our first solution fast. After improving performance so much, we were curious how it compared to a native solution, as a throughput of 20 MB/s didn’t seem too great.

A naïve port of the algorithm to C++, using folly::dynamic as data structure for the raw mappings (loaded from a JSON file) turned out to be ~1.4× slower. The point is not that it would be impossible to make it faster than the JavaScript implementation (it certainly wouldn’t), but rather that it is possible to get really good performance from v8 by writing code that can be easily optimized (e.g. triggering small integer optimization) and avoiding allocations for hot code that is called millions of times.