Data structure in Ethereum | Episode 4: Diving into examples.
It seems to be a long journey from first episodes to here, we have gotten all we need and this time, it will be the last shot to make everything clear about how Ethereum data organized in practice.
In my opinion, practising with examples is the best way to approach and get deep into any problems. By creating an example, I truly hope it can help you understand clearly about data structure in Ethereum.
You can clone my git (develop branch) for more convenient if you don’t want to get your hands dirty.
Some specs:
- Full node: Geth
- Network: Ropsten testnet
- Programming language: Javascript/NodeJS
- Subject of the study: state trie (stateRoot)
Geth
Following this link, you will know the way to setup geth as a full node in your computer.
To start full sync mode of Ropsten testnet and open RPC, you can use this command:
geth --testnet --datadir "~/Library/Ethereum/ropsten" --rpc --rpcapi "eth,net,personal,web3" --rpcaddr "0.0.0.0" --rpccorsdomain "*" --ws --wsapi "eth,net,personal,web3" --wsorigins "0.0.0.0"
And remember that you should change the parameter after --datadir
flag to be available to you.
Because we run geth in full sync mode, it means geth need time (it’s pretty long 😂 in my situation, around 3 days) to sync entire blockchain data. When you see the logs like that, I’m pretty sure it done.
Web3 — Testing geth
Refer this link for web3:
First of all, we need to create a nodeJS project and then install web3 package.
Try running getStateRoot
function with blockNumber
is nearly newest block number on Ropsten. We avoid to get the newest because it will lead to risk of delaying sync, so nearly newest block number is a wise choice. My choice is 2596315
and it may be different at the moment you read this article. Be careful.
My result of running getStateRoot(2596315)
:
stateRoot: 0x1a63facb2a82966504a643f7c6cce28ddb47ea056b02009975c665bdada64c81
At this time, we can make sure that our full node work perfectly.
levelDB
Something we need to be careful about levelDB is that it merely allows one connection at one time. Thus, we need to stop geth after full sync for next steps.
In order to create a connection by NodeJS, we will use 2 packages are levelup
and leveldown
. So please install levelup
, leveldown
and path
modules.
Create a connection:
Here, I tried connectting to my levelDB with specific path that points to my chaindata folder (this path depends on your config when we start geth). And then, I globalized it to use afterward.
First diving into database
In the Web3 — Testing geth part, I got stateRoot of block number 2596315. Because we used web3, so the result is certainly correct.
Now, we warn up by getting stateRoot in a block header corresponding with a specific block number and then we compare it to the previous result in the Web3 — Testing geth part.
Please install ethereumjs-block
module first, we need it to parse block data.
Source code:
** About utils
library, please take a look at my repo to get source code. The path is ./libs/utils
.
First steps, we need to pad a number of 0
to the left of 2596315
so that total length will be 16, notice that everything we do will be in hex.
hexBlockNumber = 00 00 00 00 00 27 9d db
In geth, they used h
as prefix and n
as suffix.
prefix = 68
suffix = 6e
And then, we concatenate all of them in sequence.
keyString = prefix + hexBlockNumber + suffix = 68 00 00 00 00 00 27 9d db 6e
Here the result:
As we can see, the final result is the same with the result in web3 part.
Congratulation!!! We got a first diving into a real 💩
Get deeper
We will use merkle-patricia-tree
and rlp
module, let’s install it.
Now, we are starting to create a trie
library that uses an ethereum address to parse whole info saved in state trie.
Focusing on getInfoByAddress
function, we use merkle-patricia-tree
to create trie with root
inputed, then we get data of an address by this trie. Remember that all data was encoded by rlp
before saved down, in order to read it out, we need to decode them.
This is completed example:
The result:
An address data contains 4 info. In sequence, they are nonce, balance, storageRoot and codeHash.
Conclusion
This is not the end of this series, we will have something about Prunning Tree. But it maybe will be shared in the future because my knowledge about it still not much.
Maybe sad when you hear that :)))
References