(Some of) the math behind Bech32 addresses
By now, native SegWit, bech32, or BIP-173 addresses have increased significantly in adoption. I’m talking about addresses like bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4. Articles such as the one by bitcoin op-tech have explained excellently how these addresses are used in practice to construct bitcoin transactions. But other than the introduction given by Pieter himself on YouTube, I could find very little layman-accessible explanation for what is actually going on during the encoding. There are some excellent comments in the code itself, but these still require a fair amount of prior knowledge to understand — code isn’t really the place for the full theory to be written out.
There exists a nice step-by-step explanation for how “legacy” addresses are constructed from a public key on the bitcoin wiki. But because Bech32 is designed from some cool (but potentially confusing) mathematics rather than just chucking a hash on the end, this article will hopefully make some of the underlying technical details a bit more transparent for all those interested. A lot of these details are from the BIP, Pieter’s talk, and the code itself, but by bringing them together here, it may help make things a little more accessible. Most of this article will be about how things work, not why exactly they are the way that they are, because its already getting quite long, and I myself am no expect on coding theory.
Apologies in advance for any mistakes or typos. Thanks to Murch and Pieter for helpful feedback. And thanks as always to John Pfeffer for his support of my work. Enjoy :)
Mathematical background
We can’t start at the complete basics here, but hopefully with at least an idea of some of the terms used in this article, even those without formal math education will be able to follow. You can skip this section if you’re comfortable with the concepts.
Finite fields
The first important idea is that of finite fields. Put simply, a field in math is just a set (of e.g. real numbers like 1.5 and π, or complex numbers like a+b*i), where we can do addition, subtraction, multiplication and division (except by 0), and everything kinda works how we would expect. So we have things like x+0 = x and x*1 = x, and if a*b = 0 then either a=0 or b=0. So far so good? As well as the real and complex numbers which are both infinite, though, we have finite fields — fields with a finite number of elements in them. It is a mathematical fact that for every prime p, there is a unique finite field with exactly p elements in it. We can write the p elements as the integers from 0 up to p-1, and remember that things loop around, so p is equivalent to 0 and p+1≡1 (we will use the special equals sign to show they’re equivalent, because otherwise things can get confusing). Now if we have a prime p and raise it to the power of a positive integer, e.g. pᵐ, then there is also always a unique finite field with that many elements. But the little difference is that we can’t just use the numbers from 0 to pᵐ-1 to represent these elements, its a little bit more like vectors of m elements from 0 up to p-1 again. The details don’t matter too much.
Polynomials
A (single variable) polynomial is a function which looks something like this: f(x) = 5x³ + 8x + 1. It is made up of terms which are all positive integer powers of one variable x (including x⁰=1), and each power of x has a coefficient. In this article we will be using polynomials where all the coefficients are elements in a finite field. We can abbreviate a polynomial using sum notation like this:
So all the aₖ’s are elements in the finite field we will be using. The biggest k for which the xᵏ term is not 0 is known as the degree of the polynomial.
Now polynomials are actually more useful than just for evaluating. It might seem weird, but we don’t actually ever bother with evaluating the polynomials in this article. As you’ll see, all we care about is the polynomials themselves, and whether they are multiples of other polynomials. It might seem weird to multiply polynomials together, but its perfectly possible — we can multiply them, find their lowest common multiple (LCM), and even divide and find remainders. Pieter explains the multiplication in the comments of his code, and its explained elsewhere like Wikipedia, but we won’t worry about exactly how they’re multiplied here, let’s just assume we can.
Base 32
Firstly a word on Base32. Legacy address encoding uses Base58. But a 32 character set means we don’t need to deal with uppercase/lowercase — all characters are either upper or lower (mixed case is explicitly invalid), which makes encoding in QR codes more efficient and means typos are less likely when typing addresses in (for example, if you read an address out to someone else, there is no confusion over capitalisation as they type it in).
The fact that 32 is a power of 2 is also nice, because it means that every 5 bits becomes one base 32 character. This is much nicer than Base58, where things have to be treated all at once in a kind of messy and inefficient way. The fact it is a power of a prime also means that there exists a finite field with exactly 32 (2⁵) elements. We will refer to this as GF(32). This means we can use cool finite field math to calculate and verify the checksum, unlike the legacy address formats which just put a hash on the end (this can detect errors but cannot correct them nor even identify where in the address the error is). This will be discussed more below.
Remember before when we said that in a finite field with pᵐ elements, the elements were like vectors of numbers up to p, not just integers from 0 to pᵐ-1? Well in this case, p=2, so this is exactly like a binary string of length 5. So an element in GF(32) could look like 11011 or 01101. So we CAN actually represent these more compactly as integers (27 and 13 in this case). But the catch is that addition is NOT the same as normal. To add these, we have to add each bit individually mod 2, which you might recognise as XOR! So lets say we have elements 27 and 13, to “add” them in GF(32), we XOR them bitwise, and get 10110 which we can represent as 22. It is NOT 27+13 mod 32, which would be 8. Keep this in mind, as it could get confusing, but I’ll try and remind you throughout this article. And actually, addition and subtraction here are exactly the same — XOR goes both ways because this is all bitwise so x-y = x+y = x XOR y.
Converting bits to characters and back
In the Bech32 code using Base32, each of the 32 symbols represents 5 bits of data. But how do we actually assign characters to bit combinations and encode and decode? The following table shows how this is done
So for example to encode 10010 (which is 18 in decimal), we would use the character j (16+2). And to decode character 9, we would get 00101 (5 in decimal). This table is specially crafted so that commonly mistaken characters (e.g. 5 vs S, 2 vs Z, p vs q vs g, etc.) are always one bit different. For example, 5 is 10100 and S is 10000, so they only differ in the third bit. This is because in choosing the specific BCH code to use, Pieter and Greg chose one which also performs well on single-bit errors. In fact, the code has distance 6 for these single-bit errors. So by making these the most common type of errors for user-typed addresses, the number of errors which can be caught is maximised. Taking this into account, an average of up to 4.85 “likely errors” per address can be caught with probability greater than that of using the legacy 4-byte-hash-checksum.
The characters “1”, “b”, “i”, and “o” are not used in the character set. But “1” is still used, as the separator between the human readable part and the data part of the BIP-173 addresses.
BCH codes
First some coding-theory terminology. A codeword is a term referring to a valid message+checksum combo. Distance between two messages is the number of characters that must be changed to get to the other message. The minimum distance d is the smallest distance between any two valid codeword pairs. That means we can detect at least d-1 errors (if we only changed d-1, we would always end up with an invalid message+checksum no matter what). We can also correct (d-1)/2 errors by just going to the “closest” valid codeword. This will probably end up being the wrong one if more than (d-1)/2 errors were made, so its dangerous in the bitcoin use-case. It is better to just make the user go back, check, and fix it themselves.
Now, BIP-173 uses a specific type of code known as a BCH code. BCH stands for Bose–Chaudhuri–Hocquenghem after the original designers of this type of code (it has nothing to do with certain altcoins). These are nice because they deal well with incorrectly typed characters (which are the most likely errors when users are copying down addresses) rather than just random bit-flips. Reed-Solomon codes are a very popular type of code which can be constructed as a BCH code, but have a bit of a limitation in the bitcoin use-case: the block length or total number of characters in a message+checksum codeword is typically one less than the number of characters. In Bitcoin with base 32, we don’t want to limit to only length 31. So instead we use more general BCH codes which don’t have this restriction. We have already decided to use the finite field with 32 elements. Now we also have to choose the following parameters:
- n, the size of the codeword block (the length of the data+checksum). For BIP-173 this was chosen to be 1023 = 32²-1. So the order of 32 modulo n is m=2.
- d, the minimum distance between codewords. This was chosen to be 4 for the block size of 1023 in BIP-173 (meaning that up to 3 errors was guaranteed to be detected), but the exact generator polynomial (discussed below) was then picked to actually guarantee error detection of 4 errors in up to 89 characters.
- c, which determines exactly which minimal polynomials (defined below) to use to determine the generator polynomial.
Now we need to choose a number α which is known as a primitive element in GF(32²). Primitive means that every non-zero element in GF(32²) can be written as a power of α (or “α generates the multiplicative group of the field”), meaning α has order 1023. Remember that if you don’t quite understand all the details of what’s going on here, thats fine, in practice all of this stuff has already been done by Pieter and Greg and you don’t need to worry about it if you don’t want to. A couple of technical details: In Bech32, GF(32²) is defined using the polynomial x² + 9x + 23. The α used is a root of this polynomial.
Now we have α, we are going to use it to calculate the generator polynomial for our code. The use of this generator polynomial will be made clear later. First, let’s take αⁱ for some positive integer i. The minimal polynomial of αⁱ, which we will call mᵢ(x), is the lowest degree polynomial with coefficients in GF(32) such that mᵢ(αⁱ) = 0. That is, out of all the polynomials which have αⁱ as a root, we pick the one with the lowest degree. We do this for all the integers i. Then for d-1 consecutive mᵢ(x)’s, starting at i=c up to i=c+d-2, we find the lowest common multiple (the smallest degree polynomial such that all d-1 of the mᵢ(x)’s divide it with no remainder). In Bech32, we specifically use α⁹⁹⁷, α⁹⁹⁸, and α⁹⁹⁹, because we want d = 4 as mentioned above. The generator polynomial g(x) is the least common multiple (lcm) of these chosen minimal polynomials. That is, g(x) = lcm(m₉₉₇(x), m₉₉₈(x), m₉₉₉(x)).
For BIP-173, if you follow the above steps, you get g(x) = x⁶ + 29x⁵ + 22x⁴ + 20x³ + 21x² + 29x + 18. It has degree 6 as you can see. Remember that the coefficients are not just integers! They are elements in GF(32), so you have to remember that 29 is actually just a shorthand way of writing 11101, and “adding” two of these numbers is actually done by XORing them together. This specific g(x) was chosen to behave as well as it could or the bitcoin use case, which Pieter explains in his SF Bitcoin Developer’s talk.
Now we have g(x), we are free to start generating and verifying checksums. Let’s take a list of 5-bit data values which we want to create a checksum for. We treat the whole list as a list of coefficients for a big polynomial which we shall call p(x). For example, if our list of data values was [4, 11, 5, 7, 22], we would make a polynomial p(x) = x⁵ + 4x⁴ + 11x³ + 5x² + 7x + 22. We always have an extra term at the front (we can imagine this as prepending a 1 to our array of coefficients). If we didn’t have the extra x⁵ there, then the polynomial for [4, 11, 5, 7, 22] and [0, 4, 11, 5, 7, 22] would be the same, which we don’t want. And remember again, all these coefficients are 5-bit elements in GF(32), not integers.
Now the way the code works, is that all valid codewords are multiples of the generator polynomial g(x). If we get a message which isn’t a multiple of g(x), we know it must have an error in it, and the math involved let’s us find the errors and usually correct them too. But if we just use the data array as it is to make p(x), it usually won’t be a multiple of g(x) (that’s a good thing — if we randomly generate a message and checksum, intuitively they should hardly ever be valid). There are typically two ways we can do this. The first is by just multiplying our polynomial by g(x). Then by definition it will be a multiple of g(x) and thus a valid codeword! But this has a caveat— the message itself doesn’t actually appear in the codeword. We have to undo the multiplication to get back to the message even if no errors were introduced.
Instead, we use systematic encoding. This means that the first coefficients in the codeword will actually be the message itself, and just the last 6 will be the checksum values. To do this, we just put six 0’s on the end (the same as multiplying p(x) by x⁶), and then work out what we need to change these last 6 coefficients by to make it divisible by g(x). For polynomials, there is a quotient-remainder theorem just like for integers, which says that we can uniquely write things like this using Euclidean division: p(x)*x⁶ = g(x)*q(x) + r(x) for some q(x) and r(x). You should be able to see that r(x) is the remainder of p(x)*x⁶ divided by g(x), or p(x)*x⁶ modulo g(x). If we know what r(x) is, we can just subtract it (Important: subtraction here is bitwise XOR of the coefficients as we mentioned above) away from p(x)*x⁶ to get g(x)*q(x) which is a valid codeword. Calculating this modulus is exactly what the PolyMod function does in the Bech32 code (the name is literally modulus of polynomials). You can read the section below if you want to know the nitty-gritty of how that function works, but it safe to skip it if you take its behaviour at face value.
How PolyMod works
PolyMod iterates over all the values in the list it is given, which correspond to the coefficients of the polynomial v(x) = p(x)*x⁶ as described above. The polynomial c(x) will hold the current state as we loop over all the coefficients. We start with c(x) = 1.
Starting with the first data value v0, which corresponds to the coefficient of the largest power of x, we calculate c’(x) = c(x)*x + v0 = 1*x + v0. Then we move on to the second value, v1. So c’’(x) = c’(x)*x + v1 = 1*x² + v0*x + v1. Continuing this trend will slowly build up c(x). This is done in a single line of code for each v_i: c = ((c & 0x1ffffff) << 5) ^ v_i; (you can see that every iteration, c is shifted to the right by 5 bits which corresponds with moving the coefficients one to the left or multiplying by x, then v_i is “added” or XORed).
This is fine by itself until we start getting to polynomials bigger than g(x), because we haven’t actually started working out any mods. So once c(x) = c0*x⁵ + c1*x⁴ + c2*x³ + c3*x² + c4*x + c5, the next step in the loop would lead to a c0*x⁶ term. Instead we should reduce that term mod g(x): replacing it with c0*(x⁶ mod g(x)). Let’s call k(x) = x⁶ mod g(x) = 29x⁵ + 22x⁴ + 20x³ + 21x² + 29x + 18 (notice that these are the 6 terms of g(x) without the first x⁶ term). So we calculate c0*k(x) so that we never end up with a term in c(x) which has a power greater than x⁵. Then c’(x) = (c1*x⁵ + c2*x⁴ + c3*x³ + c4*x² + c5*x + v_i) + c0*k(x). We just keep iterating through the data value list, working out what c0 is on the current iteration, shifting the other coefficients to the left by one power of x, and then adding in c0*k(x). What we are left with at the end is the final c(x) = v(x) mod g(x) which is exactly what we needed for the checksum.
You’ll see these five weird-looking constants in the code: [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]. These are just compact ways of writing the coefficients of k(x) and multiples of k(x) by powers of x. Let’s look at the first one. 0x3b6a57b2, if split into groups of 5 bits, gives us [29, 22, 20, 21, 29, 18], which are the coefficients of k(x). The other four constants are just 2,4,8, and 16 times k(x) respectively. All the code is doing is calculating c0*k(x), but by doing the multiplication by c0 bitwise, it speeds things up and ensures that the “addition” is all done as XOR as it must be in our finite field.
Bech32
6 character checksum gives a chance of less than 1 in a billion that an invalid address is deemed correct, even if more than 4 errors are made. It has much better chance of detecting errors than using the first 4 bytes of a hash like the legacy address encoding does assuming we expect an average of 3.5 characters per address or less are typed wrong. Because BIP-141 defines a witness program as a data push of up to 40 bytes (320 bits), the data length in the address will be 71 characters: first is the witness version which is one byte, and then is the (up to) 40 byte witness program, and finally 6 characters for the checksum.
As Pieter describes in his SF Bitcoin Dev talk, there are 159,605 possible BCH codes to choose between even after fixing the parameters like the 32 field size and everything. So they did a lot of computation to pick one that would be the best possible in the bitcoin use case, especially by comparing how the different codes compare with even more errors than just 4. I won’t repeat that discussion here, because you can watch the video if you’re interested :)
As pointed out in the BIP, actually correcting errors is probably a dangerous thing to do, because even if we correct the errors to a valid address, it might not be the one the user intended (if too many errors were made for the code to handle)! This is worse than telling the user it is wrong and making them fix it themselves.
Creating a checksum
Assume we have a HRP string, and an array of data values, and we want to create a checksum.
First we need to convert the HRP into data values too, so that we can do the math on it as well. This is done by splitting each character bitwise into its high and low bits, and creating a list of each, separated by a “0”: [upper 3 bits of each character] + [0] + [lower 5 bits of each character]. This process is known as “expanding” the HRP. The assumption is that most typos will only modify the lower 5 bits of a character (because in ASCII, if you change a letter to a different letter of the same case, or a number to a number, only the bottom 5 bits will ever change). So separating them out makes the checksum more powerful.
Now we add the data list on the end of that. And the checksum will be 6 characters so we add six 0’s to the end as well (a bit like a placeholder). So in total we have [high bits of HRP] + [0] + [low bits of HRP] + [data] + [0,0,0,0,0,0]. Remember from above, that a valid codeword is one which is divisible by the generator polynomial g. So to make the checksum valid, we have to replace these six 0’s with something that will make the whole thing divisible by g. As described above, we calculate the modulus (remainder) of the polynomial when divided by g using PolyMod. Then we subtract it from the 0’s, which is the same as XORing bitwise with the 0’s, which in turn is the identity operation — we just end up replacing the 0’s with the coefficients from PolyMod.
Technical note: We actually do one more step here, and XOR the last coefficient with 1. Earlier we said valid codewords were only multiples of g(x). But if we do this, appending a 0 to a valid list of values (shifting the data+checksum coefficients all to the left by one power of x) would result in a new valid list. For that reason, Bech32 actually requires the codeword to be a multiple of g(x), plus 1. So when we calculate the PolyMod on a valid codeword, where we would previously have got 0 (an exact multiple of g(x)), now we get 1. Now adding a 0 to the end of the list will no longer be valid, because the modulus by g(x) won’t be 1.
Technical note 2: If you look at the code, you’ll see that PolyMod returns an integer, and then that integer is XOR’ed with 1. The resulting integer is then split up into six 5-bit blocks just like the rest of the data list (ret[i] = (mod >> (5 * (5-i))) & 31), so that each 5 bits can be encoded in a Base32 character as well.
Verifying a checksum
Verification is pretty simple. If you followed the checksum creation part, you might have already guessed by now that all we have to do is check that the codeword we received is one more than a multiple of g(x). So we just use PolyMod on the codeword, treating it as coefficients of a polynomial again, and check that the remainder when divided by g(x) is equal to 1.
Encoding and decoding
Now we have all the parts we need. Given a human readable part (HRP), and the data to encode, we simply create a checksum over the HRP and data, and create a string HRP + “1” + BinaryToBase32(data+checksum).
If we receive a BIP-173 address, we can decode by ensuring only upper or lowercase is used (not both) and that it is at most 90 characters long. Then split it at the position of the “1” character to get the HRP and data separately. Convert each character from the data part back into a number using the Base32 conversion table above. Once we have HRP and a list of data values, we can expand the HRP as before, obtaining [upper 3 bits of each character] + [0] + [lower 5 bits of each character] + [data] + [checksum], and then verify the checksum as above.
Cool/useful links
So that ends this (hopefully enlightening) explanation of the math behind BIP-173 and Bech32 bitcoin addresses. The following are just a few useful resources/links where you can find more information or test things out.