Ethereum Transaction Structure — Part 1

Santony Choi
Santony’s Blog
Published in
4 min readMay 22, 2018

Most of Ethereum users connect to Ethereum network using wallet programs, such as MEW(My Ether Wallet) and Metamask, or a special browser like Mist. For more advanced purpose, there are clients programs, such as Geth and Parity. These client programs include most of functionalities the wallets have and provide higher level of controllability with CLI and JSON RPC interface. Mining Ethereum also has done with it.

However, tools above need to import private keys. Although they protect keys in appropriate way, it somehow has a potential risk of key exposure. So, in some extremely sensitive cases, you are supposed to make signed binary transactions by yourself in an offline environment without any help of those convenient and smart tools.

This series is made for this special case. We are going to analyze the Ethereum transaction and follow the process of creating a RawTransaction.

RLP Encoding

The first topic we cover is RLP, the way Ethereum serializes data. RLP, standing for Recursive Length Prefix, is a rule that expresses data and its prefix in a recursive way. Let’s start with an example.

0x8773616e746f6e79

That is an encoded value of a string santony. You can easily discover an ascii string santony starting from the second byte 0x73. Then what is 0x87 at the first position?

String Encoding Rules of RLP

1. A single byte value between [0x00, 0x7f] is written just as it is.

2. If the length of a string is between 0 byte and 55 bytes, write 0x80 + length of the string at the head and then write the content of the string. The value of the first byte is thus in the range of [0x80, 0xb7].

3. If the length of a string is longer than 55 bytes, write 0xb7 + the bytes of the length prefix first, and then the length of the string , and the content of the string at the end. For example, a 1024 bytes-long string will be encoded as 0xb90400 + the content of the string. The value of the first byte is thus in the range of [0xb8, 0xbf].

The rules are simper than you might have worried. With looking at the first prefix byte, we can easily figure out if the following byte is a beginning of an item(Rule 1) or the content of a string(Rule 2) or the length of a string(Rule 3). Even if several strings are linked consecutively, we can simply encode / decode those recursively, as the byte comes after decoding a string represents a prefix for the next item.

Let’s look at a new example.

0xd08773616e746f6e7982697384636f6f6c

Can you guess what 0xd0 at the head mean?

List Encoding Rules of RLP

1. If the sum of all elements in a list is in the range from 0 byte to 55 bytes, write 0xc0 + the length of all elements and then write the elements of the list. The value of the first byte is thus in the range of [0xc0, 0xf7].

2. If the sum of all elements in a list is over 55 bytes, it’s similar to Rule 3 of String Encoding. Write 0xf7 +the bytes of length prefix first, and then the length of the list, and the content of the list at the end. The means, the value of the first byte is in the range of [0xf8, 0xff].

RLP has only two type of items, String and List and there are rules only for these two. That is, we’ve learned all the rules of RLP! I will reorganize the example in a prettier way.

0x d0 87 73616e746f6e79 82 6973 84 636f6f6c

You can figure out it is a list of 16 bytes of elements by its 0xd0 prefix. Note that the prefix doesn’t tell the number of items and we can get it after decoding. Besides, prefix bytes of the items under a list are also included when counting the length. It might look unfriendly or intricate, but under the Ethereum’s structural restriction of saving bytes, it looks like a brilliant choice.

So, back to the example, you can find out the 7 bytes subsequent to the second byte 0x87 would be ascii string. Following delimiters 0x82, 0x84 also shows that the bytes subsequent would be 2 bytes and 4 bytes strings. In summary, the example can be decoded as 1 byte of list prefix + 1 byte of string prefix + 7 bytes of ascii + 1 byte of string prefix + 2 bytes of ascii + 1 byte of string prefix + 4 bytes of ascii, and the decoded value is below.

[“santony”,”is”,”cool”]

RLP, which we covered in this article, is very important to understand Ethereum because it is the finalized form of a Ethereum transaction we may create afterward.

0xf86f128609184e72a00082271094566f1850d6aad76b3d219c7782d90ba25b5846ac8a31373438373645383030802aa08c53bce1bae2949275685e9a7dcea2b0e0c35bcbfc46f6c8abf48829f46c736ca0434124904285519e387b925a20431cf26dfc37441ec001968130c96987ed7911

See you again with the next article trying decoding a real RPL encoded transaction like above.

--

--