SPA source code recovery by un-Webpacking source maps

rarecoil
rarecoil
Jun 13 · 10 min read

No matter what one thinks, it seems that the JS-heavy single page application paradigm is here to stay. Front-end developers rave about React and coo at Vue, keeping the user experience smooth and uninterrupted as fetch, XHR and service workers do all the dirty work interfacing with backend API services behind the scenes. SPA detractors talk about extreme code bloat, the insane size and shape of node_modules directories, and how all of this JS-heavy front-end code makes auditing the applications somewhat harder to understand and debug.

Adding into the front-end complications, it seems that almost nobody writes JavaScript directly for the browser anymore. As early as 2013, many called JavaScript “the assembly language for the Web”. While I do not find this the correct analogy either, there are certainly elements of truth to the statement. I think it is more appropriate to state that browser-based JavaScript is the Web’s intermediate representation, allowing developers to leverage high-level abstractions that enable the use of other languages within the JavaScript VM.

When a developer chooses to write in pure JavaScript, they must constantly worry about experimental feature support and cross-browser issues, drastically lowering the overall “speed” of development. While a somewhat exaggerated example, nobody writes CIL or LLVM IR from scratch; it is usually more efficient to write in something else. Leading into the current trends, countless languages transpile to JS and the number grows by the day. Keeping future-ECMAScript out of the question, TypeScript is likely the most popular; Stack Overflow’s annual developer survey put it at third most loved and fourth most wanted language for 2019. On the framework front, React and Vue win both the most loved and most wanted categories.

Complexity begets complexity

A common web application frontend these days is written without any .js files at all — in the case of TypeScript+React, our folder structures are all .ts and .tsx files that are then transpiled and packed. Because of this, developers are shipping a big application bundle as compiled JavaScript to their services, and then trying to take them back apart for debugging.

One helpful solution for debugging massive JavaScript applications in the browser are JavaScript source maps, which map minified JS to what was originally the JS powering your application. This allowed for better JavaScript code organization while still gaining performance gains from packing your JS into single files. Source maps originally came out of the minds of Google and Mozilla in 2011 or so. At the time in Silicon Valley, Backbone.js and CoffeeScript -> JS had started to become the next front-end big deal, although most JS developers were still happily writing away with jQuery, MooTools, and a series of other libraries from the mid-to-late-00s. At the SPA model was still mostly in its infancy. Now, they have become invaluable tools to map the intermediate-layer JavaScript back to the code we actually wrote in TS/TSX. Instead of writing the JS, we write TS/TSX, Webpack it all, turn on source maps, and see what’s going on in the developer tools.

However, what most developers do not realize is that these source maps actually contain the entirety of their front-end source code. Yes, they know that something is able to do the mapping, but the source map abstraction is handled for them by an often byzantine automation process that is hard enough for compressed-deadline crunch-mode web developers to simply learn how to use let alone understand. Because of this misunderstanding, it is relatively common to find that development teams have left JavaScript source maps on in production in their Webpack configurations; Rails 6 even originally did the same by default but moved to a more hardened preset after some debate.

As an application security engineer, these source maps are invaluable for quickly understanding a single-page-application that you may be black boxing — while you clearly have the compressed, minified, prettify-able application JavaScript bundle you can use to reverse-engineer the client, the source map contains useful information because it contains the source used to generate that bundle, effectively turning your understanding of the black box to a gray box without a bunch of thinking.

Source maps are information disclosure

If we agree to the concept that the Webpacked bundle is the intermediate representation of the application code, we can make the analogy that the bundle is the closest thing to a binary that we commonly ship to a browser. Just as we can decompile a binary and read the assembly code in IDA Pro or another disassembler, we can do the same to the application bundle; it just requires more skill and time to figure out what is going on in the packed/transpiled variant than it does to start if we started with the source code in the first place. Holding these assumptions true, source maps for transpiled SPAs can logically be treated as a source code leak (CWE 540: Information Exposure through Source Code).

For an application security engineer, source maps contain:

  • Information about the developer’s source code paths. The contents field of the source map from Webpack contain the relative paths used in the build process. This is information that can be fed back into systems where we may be able to exfiltrate source code by having a better idea of what the paths are where the source code lies, such as in local file inclusion vulnerabilities.
  • Developer comments and doc strings. Commented-out functions, developer names, email addresses, documentation strings, and sometimes even expected API server responses exist in the uncompressed source code; this is stripped out in the default Webpack minification process.
  • Your web application’s raw front end source. This code contains all original function names, variable names, and strings, in their original filenames and languages. This makes the code significantly easier to read and turns reverse engineering into an easy code review, where the developers are guiding you through the process of understanding the problems in their codebase. Go ahead and easily dig around for template injection or dangerouslySetInnerHTML and have a better sense of what’s going on. The source map is effectively giving away, in documented and organized form, the entire front-end half of your application to everyone who visits your production website, trivially forked and hacked on by competitors or anyone in jurisdictions where intellectual property protection is lax.

Note that some popular web developers such as Chris Coyier did not seem to care about this information disclosure. After all, what’s on the client is on the client anyway in minified/compressed form, so why does it matter? Stated in the above article:

The benefits [of source maps in production] boil down to these two things:

1. It might help you track down bugs in production more easily
2. It helps other people learn from your website more easily

The third rule ignored above is that production source maps also help malicious users track down those same bugs in production. Furthermore, it is tough to explain why this is the case to the non-technical. Your company’s executives, middle managers, and legal teams would want to know why the “proprietary and confidential” TypeScript/React source code is sitting somewhere on a public web server comments and all. I do not know of many corporate clients that I could convince of this being a good idea for a host of reasons.

Remember that application security is about raising the cost for an adversary, and mitigating business risk therein. Publishing production source maps to the public solve neither problem; instead, it exacerbates both problems. All of the information listed above is useful on a security assessment and is thus useful to penetration testers outside of your organization as well.

If only there was an easy way to get this into a manageable form…

Enter unwebpack_sourcemap.py

After having run into these source maps a few times, both on bug bounties and actual engagements, I wrote a tool that allows me to quickly grab them from Webpack-generated bundles and sources. Alongside this post I have open-sourced the tool under the MIT license for other appsec engineers to use on their own assessments. It is now available on GitHub.

When using this tool, I personally fetch the source maps by looking at JavaScript files included in the browser and extracting the sourceMappingURL that exists at the bottom of the bundled JavaScript files. However, to make the tool easier to use, I have recently added a feature to point it at a (known) web application URI and have it try to auto-detect the source maps from JavaScript files that exist on the page, then write the original files out to a directory structure that closely resembles one the developer would see when writing the application. While I have taken some precautions, you are acting on someone else’s input that contains directory paths, so use this feature at your own risk if you are not sure of what the source maps for your target application contain.

Un-Webpacking a victim application

In order to understand the impact of source maps, an example TypeScript+React application is shipped along side the script at example-react-ts-app/. For this application, I simply took a random TypeScript+Webpack+React boilerplate from GitHub and made some quick changes to demonstrate the potential impact.

To run the application, simply yarn install inside the directory and then run npm run start-prod to run the application on port 3000. This is what our very ugly application looks like in Chromium on Linux

Now we can open the Chromium developer tools and see that we have a bundle JavaScript file that is being read. This is the core of our TypeScript/React application, which has been transpiled and minified by the Webpack process.

If we look at this in the browser, there is not much to see. Many names have been mangled by the process and we have lost much sense of the SPA’s context:

Usually, security engineers and bug hunters will use a tool such as js-beautify to get a better sense of what all these things are and what is going on. However, why not just use the source map?

Now, in the output directory, I have a file structure similar to what the developers would see. We can open it in Visual Studio Code and recover the actual TSX file for the application from the source map.

Also, note that we have a reference to fakeLibrary, something in lib/LibraryCode.ts. This is easy for us to find as we have a semi-working directory structure reversed from the source map content. What’s in there?

We have the code as it exists to the developers, with the maintainers, comments and all. This makes it more obvious that in this example, the HEADER constant is used as a hard-coded bearer token. Also, because we have comments, we also have dead code:

It is important to note that we are not looking at JavaScript anymore. This is not the intermediate representation of the SPA. We are looking directly at the front-end source code as it would exist to a front-end developer. This is an actual TypeScript file, with the type declarations and everything.

This means your black-box test is actually a little closer to a gray-box test if you can find a source map. You now are extremely close to the representation of the SPA you would get in a code drop from a client or development team.

Remediation

A good security post isn’t without some remediation advice — and if you have better advice, please share it in comments below. While Webpack’s documentation talks about some best practices for production source maps, there are a couple ways of dealing with this risk:

  • Don’t turn on source maps in production. This may frustrate debugging but the fastest way to mitigate the risk is eliminate it.
  • Put source maps on a private server, accessible internally to your dev team. Source maps do not need to be published alongside the code; they can be put on a different server that is appropriately inaccessible to others.
  • ACL the sourcemaps in your existing server configuration. I don’t really like this idea, but some people suggested it. If you don’t have to publish raw source to a production server, don’t do it. Web server misconfiguration has historically been a problem for server-side source code leaks.
  • Use a more restrictive source map. This mitigates some source leakage but not the entirety of the disclosure. Rails’s webpacker moved to nosources-source-map as the production default.

If you are a web developer, hopefully you will take some of this under advisement and restrict your source maps in production. Note that no matter what you do, client code is client code, and a determined adversary is going to reverse it. What you do by restricting source maps is a) raise the effort level required to dig through your SPA, and b) avoid unintentional leakage of developer names, usernames, comments, and keys that you may think aren’t making it into your “compiled” application.

If you are an application security engineer, happy bug hunting. If you are using this script on any of your engagements, please consider a pull request to contribute to making the script better. While it is simple, it has been useful for cutting out a lot of the boring JavaScript reversing and getting to the fun part of many security reviews.

If you would like to offer better constructive advice for this article, please leave a comment. I am always happy to see feedback from others that can help better secure web development practices.