When writing this article, I feel like I’m about to explain how to translate back from Chinese to English although I don’t understand much Chinese. You know what I mean, it’s all about a magic dictionary that helps us trace back to the language we initially better understand: the sourcemap.
This tool is so magic that not all developers can see what’s inside, but they are able to use it. Just like a dictionary that we don’t know how it works but it… just works when we use it.
In this tutorial, we’ll explore in more details how it works from the inside out. I’ve read another similar articles on the subject, especially the official specification, but I found that it wasn’t clear enough about its internal mechanism. So, I will try to explain sourcemap in my own words in this article.
For the same reason mentioned above, some of us do not really need to understand deeply the sourcemap to use it. Just skip this pretty long article if you’re not interested in computing history and complex mathematical concepts.
Before going deep dive into how sourcemap works, let’s get back to some mathematical concepts: Base64 Encoding and VLQ (Variable Length Quantity). I think we need to take time to completely understand what they are and how they work before digging into sourcemap.
1. Base64 Encoding
A little bit of history
Base64 encoding or Base64 is a way to encode and decode binary data to an American Standard for Information Interchange (ASCII) text format, and vice versa.
What? Can you repeat that please?
Let’s get back to the computing history. We all know that our computers process data in binary form (series of 0s and 1s), so all data transferred between computers have to be encoded into that binary form before being sent and decoded back when being received. ASCII is a standard way to make all computers agree on the same encoding and decoding method, allowing them to understand data transmitted.
For some reason ASCII was originally conceived as a 7-bit code (though it was extended to 8-bit code after), you can refer to the famous ASCII table to as some sort of dictionary between ASCII code and American English characters. Every value from
0111111 corresponds to a control character or a printable character, grouped in the so-called ASCII character set.
But sending data over the wire doesn’t mean that it has to be streamed in that raw binary form because traditional network media are made for transmitting textual data, more specifically on ASCII format-compliant. There were two main problems with ASCII at that time:
- If people try to send emails in another language than English, which might contain characters that are not part of the ASCII set, these characters won’t be encoded or decode correctly. Same problem when trying to attach a file into emails, how does this file will be encoded?
- Some systems interpret special character differently (for example line ending character
Base64 was originally invented to solve these problems. It will take a stream of characters and converts them to Base64 characters that is a subset of the ASCII character set, so that they will be transported correctly over the wire using the email protocols. So if we need to attach a file (uploaded in binary form by the computer) to an email, we could convert it to Base64 before sending the email without problem.
How Base64 works
Let’s say if we want to send
Bonjour as content of an email, it will be interpreted like this:
More details on how to convert a text to Base64:
2. VLQ (Variable Length Quantity)
A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an arbitrarily large integer. It was defined for use in the standard MIDI file format to save additional space…
It could become confusing at first when trying to figure out what is VLQ from this definition. Let’s make it simpler. Given an integer of any size, we could use blocks of
X bits to represent it. The idea is to break the binary representation of the integer into a group of smaller blocks of
There’s many variants of VLQ encoding, in this article we’ll consider the following implementation:
- A block consists of 6 bits.
- In each of these blocks, we’ll have one reserved bit at the beginning (which is called continuation bit) to indicate whether there’s a block that follows. This bit will be
0if the corresponding block is the last block amongst these,
1if it’s not.
- In the first block, we’ll have another reserved bit at the end to indicate the sign of the integer. This bit will be
0if the integer is positive;
1if it’s not. So the first block will include just 4 bits and the other blocks will contain 5 bits, to represent some value.
- We try to build block by block, by considering some least significant bits first.
For example, let’s say we want to represent 188 in 6-bit blocks in VLA format. 188 is
10111100 in binary, which cannot fit in a 6-bit block. So it has to be represented in at least two 6-bit blocks.
Let’s take out the first four least significant bits:
1100 to build the first block. The first block will contain these four bits, prepended by
1 as continuation bit because we know that there’ll be another block after (which represents the remaining binary content), appended
0 because 188 is a positive integer. So the first block is
To make another 6-bit block for the four bit
1011 remaining, let’s simply prepend
0 as padding bit to have a block of length 5 and prepend also another
0 as continuation bit to indicate that this is the last block because there’s no longer binary value to represent.
So 188 will be
111000 001011 and when encoded to VLQ with 6 bits per block.
Of course we could repeat these experiments for block of any bits.
VLQ with 6 bits per block is called Base64 VLQ encoding because each 6-bit block corresponds to a Base64 character.
3. How sourcemap works
Let’s say we have a simple ES2015 function
npx babel script.js --out-file script-transpiled.js --source-maps --presets=es2015
In the output we’ll have
- Output file:
- Sourcemap file:
We all know that this sourcemap file is important to trace back to original file, which is useful when debugging.
According to Sourcemap Proposal,
mappings is the information that allows us to connect the generated file
script-transpiled.js to the original source
script.js. It can be splitted into groups by
;, each group corresponds to a line in the generated file.
So, we can represent this mapping information in the generated file in a more easy-to-understand manner:
Each of these groups can be splitted into segments by
,, each segment contains some information about mapping location between original file and generated file.
The secret is that if we consider every character of each segment as a Base64 VLQ encoded character, we will end up by having some interesting mapping information between positions in generated file and original file. Just by doing reverse-engineering on the process of encoding an integer to Base64 VLQ above, like the following:
Notice that all the numerical values decoded from the segment are relative to the ones obtained from the previous segment. Once decoded, we must take into account the previous corresponding values.
The library vlq helps us decode more quickly:
From the first results above, we can make some mapping positions from the generated file to the original one:
I hope that through this long adventure you will have some clear idea about our magical dictionary sourcemap. I list below excellent resources on sourcemap without these I could not accomplish this article. Thanks for your attention and please let me know if some points aren’t clear enough for you.