Ethereum Under The Hood Part 3 (RLP Decoding)

Derao
Published in
5 min readFeb 25, 2021

--

Update: Grammer fix

Part-3 of the Recursive Link Prefix(RLP Decoding), if you have not gone through Part-1, Part-2, please do. In this section, we address decoding Ethereum data using the RLP decoding specification.

Let’s do a mini recap of what we learned so far:

Ethereum is world computer (global State +a virtual machine(VM) ). VM can add a new state to the global state machine. State is a collection blocks, blocks contains a set of transactions. Ethereum uses RLP to serialize/deserialize data. RLP specification only understands two data primitives(item, list of items). World state is stored as a Ethereum Patricia Merkel Trie(note: if this does not sound familiar please read Part 1 again.)

Now let’s jump right into it, RLP decoding is straightforward, “decoding” is a fancy word of saying that I am going convert this information to its original format.

RLP decoding rules:

  1. Look at the first byte,The first byte should fall in one of the following ranges : [ [0x00 .. 0x7f] , [0x80.. 0xb7] , [0xb8 .. 0xbf ] , [0xc0 .. 0xf7], [0xf8 .. 0xff] ] and Decipher Data type using the following rule if the byte falls within:

[0x00 .. 0x7f] : Data is of type String and should be decoded as it is.

[0x80 .. 0xb7] : String and its a short string

[0xb8 .. 0xbf ] :String and its a long string

[0xc0 .. 0xf7] : List and short list

[0xf8 .. 0xff] : List and it’s a long list

2. Get the length of the byte array :

[ First byte — First Byte from the Byte Range] = length of the data, e.g.: 0x83–0x80 = 3 , 3 is the length of the byte array

3. Perform Step one and two all over until the end of the byte array.

Let’s take a simple example of a string “dog” encoded into RLP as:

"dog" = [0x83, 0x64, 0x6f, 0x67]
Ascii Chart for handy reference

Let’s decode the input as per our rule, so given a byte array as an input

0x83 'd' 'o' 'g'

  1. Retrieve the first byte: 0x83
  2. Check If the range falls within the specified range set, in this case, it falls between [0x80 .. 0xb7] since the first byte is 0x83.
  3. Calculate Length: First byte- First byte in Byte range, 0x830x80 = 3
  4. Data is of type string, and its length is 3
  5. With these facts, parse until end of string : 0x64, 0x6f, 0x67

Let’s take another example which deals with List’s; the previous example was a simple data structure String “dog”. Decoding Lists is a little more involved; we need to figure the total length of the list and decipher the elements with the List. We also need to keep decoding the list until the end by using recursion techniques for repeatable tasks.

[“cat”, “dog” ] -> C8 83 636174 83 646F67 , pause here for a second and let’s decipher by looking at the structure. I see a couple of items here, this is a List, and it has two elements in it, and the data type of those elements is a String, Decoding the List we getC8 83 63 61 74 83 64 6F 67 (tip: refer to the ascii chart above)

  1. Retrieve the first byte : 0xc8
  2. Find the closest Byte Range which the first byte falls within; In this case, the First byte falls within the range [0xc0 .. 0xf7] hence the data is a List.
  3. Create an empty List [] to accumulate
  4. Length of the list, [0xc8 -0xc0] = 8
  5. Start deciphering after c8 to get the encoded data from the List until end of List.
  6. C8 83 63 61 74 83 64 6F 67
  7. We know we have a list [ and it has eight elements], now let’s look at what’s inside the List.
  8. Figure out length which is: first byte-range, 0x83–0x80 = 3
  9. The data type is String, and its length is 3
  10. Parse and decipher the next 3 bytes 0x63, 0x61, 0x74
  11. Repeat Steps 8 through 11 until end of the list for the next set of bytes resulting in 0x83, 0x64, 0x6f, 0x67
  12. Calculate the Length and interpret the data, in this case, the data is 0x64,0x6f,0x67.

Well, the steps involved are repeatable, and recursion would be a helpful pattern to decode the entire Byte Array. Have a look at the code skeleton which implements the RLP decoding spec. Note the Byte Ranges in the code below converted to decimals.

RLP Decoding Skeleton Code in Elixir
Encoding/Decoding RLP for a Small String.

I choose Elixir due to various reasons, including its list manipulation and primitive data types. There are many examples covered in other languages which I would highly recommend to go through, in addition to this check out the references below.

Before we wrap this section up, there is one more encoding and decoding specification which we need to look into for a brief moment. We learned about RLP, which deals with Values. To retrieve the data, we need to have a Key which provides a path on where to find the Value. So let’s quickly dip into Hex Prefix(Appendix C) from the yellow paper.

Hex Prefix:

Think of Hex Prefix as an encoding/decoding mechanism of the “Path” to a given value in the Ethereum network, in Plain English; it means that I know a way to get to this house and here is the way of storing the map to the house. I think of this as the “Key” in {Key, Value} tuple and I envision the whole data structure as:

[{key1, value1}, {key2, value2}, {key2, value2} ]

[ {hp1, rlp1}, {hp2, rlp2}, {hp3, rlp3} ]

That’s about it; I would highly recommend Phan Sơn Tự medium post about Hex Prefix for further reference. In Part-4, we connect some of the pieces while talking about Trie and relate to what we learned so far. See you around for Part-4Patricia Merkle Trie, till then.

References:

--

--