RLP ENCODING and ZkSync Era library review PART I

0xWeiss
4 min readMar 11, 2023

I am 0xWeiss, a security researcher, auditor, and Co-Founder. Let’s break RLP encoding !!!

RLP stands for Recursive Length Prefix encoding algorithm and standardizes the transfer of data between nodes in a space-efficient format. It is a recursive encoding scheme that allows data of arbitrary size to be efficiently encoded and decoded.

It is used extensively in Ethereum for encoding transactions, blocks, and state data. Its simplicity and efficiency make it a popular choice for serializing data in decentralized systems.

An RLP encoding function takes in items.

Those items can be strings, bytes arrays, a list of items, an empty string, and other data structures

Example: ["weiss", ["I", "love"], "eth", [[]], "pig", [""], "weiss2"]

RULES and EXPLANATION of RLP encoding:

  • For a single byte whose value is in the [0x00, 0x7f] (decimal [0, 127]) range, that byte is its own RLP encoding.
  • Otherwise, if a string is 0–55 bytes long, the RLP encoding consists of a single byte with value 0x80 (dec. 128) plus the length of the string followed by the string. The range of the first byte is thus [0x80, 0xb7] (dec. [128, 183]).
  • If a string is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xb7 (dec. 183) plus the length in bytes of the length of the string in binary form, followed by the length of the string, followed by the string. For example, a 1024 byte long string would be encoded as \\xb9\\x04\\x00 (dec. 185, 4, 0) followed by the string. Here, 0xb9 (183 + 2 = 185) as the first byte, followed by the 2 bytes 0x0400 (dec. 1024) that denote the length of the actual string. The range of the first byte is thus [0xb8, 0xbf] (dec. [184, 191]).
  • If the total payload of a list (i.e. the combined length of all its items being RLP encoded) is 0–55 bytes long, the RLP encoding consists of a single byte with value 0xc0 plus the length of the list followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus [0xc0, 0xf7] (dec. [192, 247]).
  • If the total payload of a list is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xf7 plus the length in bytes of the length of the payload in binary form, followed by the length of the payload, followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus [0xf8, 0xff] (dec. [248, 255]).

Examples provided by ethereum.org explained:

  • the string “dog” = [ 0x83, ‘d’, ‘o’, ‘g’ ] ← — EXAMPLE 1

Dog is a string of fewer than 55 bytes long. According to the rules we stated firstly, that is encoded by adding a 0x80 + the length of the item as the first bytes. In this case 0x83 and then the string

  • the list [ “cat”, “dog” ] = [ 0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g' ] || 0xc8 0x83 63 61 74 0x83 64 6f 67 ←—EXAMPLE 2

The total payload is less than 55 bytes, but because it is a list we have to prepend 0xc0 + the length of the list.

FULL DETAILED MEGA ALPHA EXPLANATION

First, we encode each string “cat” and “dog” as a byte string:

  • “cat” = 0x83 63 61 74
  • “dog” = 0x83 64 6f 67

You will ask yourself, mmmm…, hey 0xWeiss, what are those strange numbers there? 63, 61, 74 etc.

Well, the numbers 63, 61, and 74 represent the ASCII codes of the characters ‘c’, ‘a’, and ‘t’, respectively. In ASCII encoding, each character is represented by a unique number between 0 and 127. These numbers are then represented in hexadecimal form, which is commonly used in computer systems. Link to an ASCII table here.

To break it down further, the ASCII code for ‘c’ is 99 in decimal or 0x63 in hexadecimal. Similarly, the ASCII code for ‘a’ is 97 in decimal or 0x61 in hexadecimal, and the ASCII code for ‘t’ is 116 in decimal or 0x74 in hexadecimal. Therefore, the RLP encoding of the string “cat” is 0x83 63 61 74.

Ok, now that we know why the numbers, let’s continue:

As said previously, we have to add the length to the 0xc0 value.

len([ "cat", "dog" ]) = len("cat") + len("dog") = 3 + 3 = 6

The length of the items is 6 + 2 items is 8, or 0x08 in hex. You add 0xc0 to 0x08 = 0xc8

  • the empty string (‘null’) = [ 0x80 ] ←—EXAMPLE 3

Less than 55bytes and no length = 0x80 without any following string

  • the empty list = [ 0xc0 ] ←—EXAMPLE 4

Less than 55bytes, no items and no length = 0xc0

  • the encoded integer 0 (‘\x00’) = [ 0x00 ] ←—EXAMPLE 5

Single bytes, the RLP encoded is itself as hex = 0x00

  • the encoded integer 15 (‘\x0f’) = [ 0x0f ] ←—EXAMPLE 6

Single bytes, the RLP encoded is itself as hex = 0x0f

We will leave more complex encoding for part II. Part II will be released in 3/4 days, and will consist of more complicated encoding. Part III will be the breakdown of Zk-Sync Era lib. I decided to split it due to the current ongoing audit.

HOPE YOU LIKED THE ARTICLE, IF YOU FELT LIKE IT WAS HELPFUL, I WOULD APPRECIATE ANY TYPE OF SUPPORT.

Twitter: https://twitter.com/0xWeisss

ETH-Wallet: 0xEa1b93cdB67a92Fd961FFE82d15eFE414AD08F92

--

--