Image source: pxhere.com

Data structure in Ethereum | Episode 4: Diving into examples.

--

It seems to be a long journey from first episodes to here, we have gotten all we need and this time, it will be the last shot to make everything clear about how Ethereum data organized in practice.

In my opinion, practising with examples is the best way to approach and get deep into any problems. By creating an example, I truly hope it can help you understand clearly about data structure in Ethereum.

You can clone my git (develop branch) for more convenient if you don’t want to get your hands dirty.

Some specs:

  1. Full node: Geth
  2. Network: Ropsten testnet
  3. Programming language: Javascript/NodeJS
  4. Subject of the study: state trie (stateRoot)
What we have to do.

Geth

Following this link, you will know the way to setup geth as a full node in your computer.

To start full sync mode of Ropsten testnet and open RPC, you can use this command:

geth --testnet --datadir "~/Library/Ethereum/ropsten" --rpc --rpcapi "eth,net,personal,web3" --rpcaddr "0.0.0.0" --rpccorsdomain "*" --ws --wsapi "eth,net,personal,web3" --wsorigins "0.0.0.0"

And remember that you should change the parameter after --datadir flag to be available to you.

Because we run geth in full sync mode, it means geth need time (it’s pretty long 😂 in my situation, around 3 days) to sync entire blockchain data. When you see the logs like that, I’m pretty sure it done.

Full sync.

Web3 — Testing geth

Refer this link for web3:

First of all, we need to create a nodeJS project and then install web3 package.

Try running getStateRoot function with blockNumber is nearly newest block number on Ropsten. We avoid to get the newest because it will lead to risk of delaying sync, so nearly newest block number is a wise choice. My choice is 2596315 and it may be different at the moment you read this article. Be careful.

My result of running getStateRoot(2596315):

stateRoot: 0x1a63facb2a82966504a643f7c6cce28ddb47ea056b02009975c665bdada64c81

At this time, we can make sure that our full node work perfectly.

levelDB

Something we need to be careful about levelDB is that it merely allows one connection at one time. Thus, we need to stop geth after full sync for next steps.

In order to create a connection by NodeJS, we will use 2 packages are levelup and leveldown. So please install levelup, leveldown and path modules.

Create a connection:

Create a connection to levelDB.

Here, I tried connectting to my levelDB with specific path that points to my chaindata folder (this path depends on your config when we start geth). And then, I globalized it to use afterward.

First diving into database

In the Web3 — Testing geth part, I got stateRoot of block number 2596315. Because we used web3, so the result is certainly correct.

Now, we warn up by getting stateRoot in a block header corresponding with a specific block number and then we compare it to the previous result in the Web3 — Testing geth part.

Please install ethereumjs-block module first, we need it to parse block data.

Warn-up steps.

Source code:

** About utils library, please take a look at my repo to get source code. The path is ./libs/utils.

First steps, we need to pad a number of 0 to the left of 2596315 so that total length will be 16, notice that everything we do will be in hex.

hexBlockNumber = 00 00 00 00 00 27 9d db

In geth, they used h as prefix and n as suffix.

prefix = 68
suffix = 6e

And then, we concatenate all of them in sequence.

keyString = prefix + hexBlockNumber + suffix = 68 00 00 00 00 00 27 9d db 6e

Here the result:

Warn-up results.

As we can see, the final result is the same with the result in web3 part.

Congratulation!!! We got a first diving into a real 💩

Get deeper

We will use merkle-patricia-tree and rlp module, let’s install it.

Now, we are starting to create a trie library that uses an ethereum address to parse whole info saved in state trie.

Focusing on getInfoByAddress function, we use merkle-patricia-tree to create trie with root inputed, then we get data of an address by this trie. Remember that all data was encoded by rlp before saved down, in order to read it out, we need to decode them.

This is completed example:

The result:

The final result.

An address data contains 4 info. In sequence, they are nonce, balance, storageRoot and codeHash.

Get Best Software Deals Directly In Your Inbox

Conclusion

This is not the end of this series, we will have something about Prunning Tree. But it maybe will be shared in the future because my knowledge about it still not much.

😱

Maybe sad when you hear that :)))

References

--

--

Phan Sơn Tự
Coinmonks

A lucky guy was born in the Age of Cryptocurrency Boom