Yet another explanation on sourcemap

May 20, 2018 · 8 min read

When writing this article, I feel like I’m about to explain how to translate back from Chinese to English although I don’t understand much Chinese. You know what I mean, it’s all about a magic dictionary that helps us trace back to the language we initially better understand: the sourcemap.

This tool is so magic that not all developers can see what’s inside, but they are able to use it. Just like a dictionary that we don’t know how it works but it… just works when we use it.

In this tutorial, we’ll explore in more details how it works from the inside out. I’ve read another similar articles on the subject, especially the official specification, but I found that it wasn’t clear enough about its internal mechanism. So, I will try to explain sourcemap in my own words in this article.

As usual, following examples will be written in JavaScript and we’ll use Babel as transpilation tool.

For the same reason mentioned above, some of us do not really need to understand deeply the sourcemap to use it. Just skip this pretty long article if you’re not interested in computing history and complex mathematical concepts.

Before going deep dive into how sourcemap works, let’s get back to some mathematical concepts: Base64 Encoding and VLQ (Variable Length Quantity). I think we need to take time to completely understand what they are and how they work before digging into sourcemap.

1. Base64 Encoding

A little bit of history

Base64 encoding or Base64 is a way to encode and decode binary data to an American Standard for Information Interchange (ASCII) text format, and vice versa.

What? Can you repeat that please?

Let’s get back to the computing history. We all know that our computers process data in binary form (series of 0s and 1s), so all data transferred between computers have to be encoded into that binary form before being sent and decoded back when being received. ASCII is a standard way to make all computers agree on the same encoding and decoding method, allowing them to understand data transmitted.

For some reason ASCII was originally conceived as a 7-bit code (though it was extended to 8-bit code after), you can refer to the famous ASCII table to as some sort of dictionary between ASCII code and American English characters. Every value from 0000000 to 0111111 corresponds to a control character or a printable character, grouped in the so-called ASCII character set.

But sending data over the wire doesn’t mean that it has to be streamed in that raw binary form because traditional network media are made for transmitting textual data, more specifically on ASCII format-compliant. There were two main problems with ASCII at that time:

  • If people try to send emails in another language than English, which might contain characters that are not part of the ASCII set, these characters won’t be encoded or decode correctly. Same problem when trying to attach a file into emails, how does this file will be encoded?
  • Some systems interpret special character differently (for example line ending character 10)

Base64 was originally invented to solve these problems. It will take a stream of characters and converts them to Base64 characters that is a subset of the ASCII character set, so that they will be transported correctly over the wire using the email protocols. So if we need to attach a file (uploaded in binary form by the computer) to an email, we could convert it to Base64 before sending the email without problem.

How Base64 works

Let’s say if we want to send Bonjour as content of an email, it will be interpreted like this:

Encode text to Base64

More details on how to convert a text to Base64:

How Base64 encoding works

2. VLQ (Variable Length Quantity)

From Wikipedia:

A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer. It was defined for use in the standard MIDI file format[1] to save additional space…

It could become confusing at first when trying to figure out what is VLQ from this definition. Let’s make it simpler. Given an integer of any size, we could use blocks of X bits to represent it. The idea is to break the binary representation of the integer into a group of smaller blocks of X bits.

There’s many variants of VLQ encoding, in this article we’ll consider the following implementation:

  • A block consists of 6 bits.
  • In each of these blocks, we’ll have one reserved bit at the beginning (which is called continuation bit) to indicate whether there’s a block that follows. This bit will be 0 if the corresponding block is the last block amongst these, 1 if it’s not.
  • In the first block, we’ll have another reserved bit at the end to indicate the sign of the integer. This bit will be 0 if the integer is positive; 1 if it’s not. So the first block will include just 4 bits and the other blocks will contain 5 bits, to represent some value.
  • We try to build block by block, by considering some least significant bits first.

For example, let’s say we want to represent 188 in 6-bit blocks in VLA format. 188 is 10111100 in binary, which cannot fit in a 6-bit block. So it has to be represented in at least two 6-bit blocks.

Let’s take out the first four least significant bits: 1100 to build the first block. The first block will contain these four bits, prepended by 1 as continuation bit because we know that there’ll be another block after (which represents the remaining binary content), appended 0 because 188 is a positive integer. So the first block is 111000.

To make another 6-bit block for the four bit 1011 remaining, let’s simply prepend 0 as padding bit to have a block of length 5 and prepend also another 0 as continuation bit to indicate that this is the last block because there’s no longer binary value to represent.

So 188 will be 111000 001011 and when encoded to VLQ with 6 bits per block.

188 in Base64 VLQ format

Of course we could repeat these experiments for block of any bits.

VLQ with 6 bits per block is called Base64 VLQ encoding because each 6-bit block corresponds to a Base64 character.

3. How sourcemap works

Let’s say we have a simple ES2015 function

Simple ES2015 arrow function in script.js

If we transpile this ES2015 code to JavaScript that can be executed in all browsers by Babel (with babel-preset-es2015):

npx babel script.js --out-file script-transpiled.js --source-maps --presets=es2015

In the output we’ll have

  • Output file:
  • Sourcemap file:

We all know that this sourcemap file is important to trace back to original file, which is useful when debugging.

According to Sourcemap Proposal, mappings is the information that allows us to connect the generated file script-transpiled.js to the original source script.js. It can be splitted into groups by ;, each group corresponds to a line in the generated file.

So, we can represent this mapping information in the generated file in a more easy-to-understand manner:

Each line in generated file

Each of these groups can be splitted into segments by ,, each segment contains some information about mapping location between original file and generated file.

The secret is that if we consider every character of each segment as a Base64 VLQ encoded character, we will end up by having some interesting mapping information between positions in generated file and original file. Just by doing reverse-engineering on the process of encoding an integer to Base64 VLQ above, like the following:

1st segment of 1st group

Notice that all the numerical values ​​decoded from the segment are relative ​​to the ones obtained from the previous segment. Once decoded, we must take into account the previous corresponding values.

2nd segment of 1st group
3rd segment of 1st group

The library vlq helps us decode more quickly:

Using vlq library

From the first results above, we can make some mapping positions from the generated file to the original one:

Mapping information demystified

So the question that might arise is why to use VLQ to encode mapping information? All sophisticated algorithms described in the 2nd section aren’t not used in the example. The answer is that usually we have to transpile our JavaScript code and minify it before going to production. The minified content is just one giant text and positions in it could reach very big integer numbers. VLQ is optimised to make it easy to have mapping between these big numbers and corresponding information in source files.


Written by

Passionate Software Engineer, Docker

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade