Making Virtual Machine Integrity Checks Efficient
Is the latest Aergo Virtual Machine OK?
Smart contracts on the Aergo blockchain are written in Lua. The Aergo Virtual Machine (AVM), a modified version of LuaJIT, reads and translates smart contract code written in Lua once it is compiled and deployed. This means that Lua smart contracts can run on any computer that has the AVM.
The AVM is continuously being iterated and improved upon by our VM developers. From time to time, the team is checking to see whether or not any of the new modifications made to the VM implicitly creates any bugs in the system. Until recently, the only way to do this was by fully synchronizing with the mainnet.
As the Aergo mainnet has been live for over two months, its block height has already surpassed 4.5 million blocks. It now takes on average 5 hours to get a new node fully synchronized from block 0 (genesis block). This means it takes over 5 hours to conduct a simple VM integrity check! That’s like using a sledgehammer to crack a nut. Five hours is far too long.
After discussion and testing, the team figured out a way in which we can optimize the process of assessing the integrity of the VM with some simple code tweaking. This article describes what we’ve done.
First, let’s review what’s happening during block synchronization and what slows down the process. Roughly speaking, block synchronization proceeds like this:
- A block is sent to the Chain Service¹ (CS) via the P2P Service² from the network.
- The CS verifies it along with its constituent transactions and then, if valid, saves it to a storage device (disk).
- The transactions are executed.
- The results are checked against the corresponding fields of the block header.
- All the execution results are committed to a storage device.
A trie records all the changes made by every block execution. Once the changes are applied to a trie, they are never deleted. Every modification on a trie inevitably affects the content of the root node, which in turn mutates the state root hash. Additionally, the execution state of every transaction in a block is also saved as a receipt. From the receipts of each block, a Merkle tree is built and its corresponding hash (Merkle root hash) is subsequently put into the block header.
So the results mentioned in the step 4 above are these two kinds of hashes:
- State Root Hash (SRH)
- Receipts Merkle Root Hash (RMRH)
Both are included in the block header when a block is generated.
Every VM must produce the same SRH and RMRH as the results of execution for a block. If two VMs generate different SRHs (or RMRHs), it indicates that one of them did not follow the protocol. Thus, only the 3rd and the 4th are indispensable for VM integrity check. However, actually, most of the time is consumed by the other phases.
Why? Because they include slow tasks like disk I/Os, requests & replies via the network, IPC between various modules, etc. For example, to synchronize one block at least 1 disk read (from a remote node) and 2 disk writes (to the current node) are triggered, even when I/O only for the block itself is considered.³ Among those I/O operations, the disk writes are completely unnecessary for VM integrity check.
So what should we do to get a fast VM integrity checker? We must do only what we really need to do: execution (3rd) and verification (4th phases).
For an Efficient VM Integrity Checker
To check a virtual machine’s integrity, we need a blockchain that is fully synchronized with the mainnet. Hence, the I/O operations are inevitable. But that doesn’t mean we have to do it during a VM integrity check.
If we prepare a full node in ordinary times and check VM integrity only when necessary, we can skip those long, dragging operations. The former part (maintaining a full node) does not require any coding. The remaining part is processed as follows:
- Read block number 1 from a pre-synchronized blockchain data.
- Check the resulting SRH and RMRH against those in the block header without disk writes.
- Repeat the above 1 ~ 3 for all the subsequent blocks of the blockchain.
In other words, we need to develop a VM integrity checker operating on a pre-synchronized blockchain data. By using it, we can completely eliminate slow procedures including the network communications and disk write operations from VM integrity check itself.
So, what happens? The code modification itself is complete in under 6 hours. After this change, the whole VM integrity check for a pre-synchronized blockchain data takes just about 8 minutes. That’s only 2.7% of the full synchronization time. So now we can work much more efficiently. :-) Furthermore, since what this feature does is basically sort of a fast replay and verification of a blockchain, it can even be used to detect data counterfeiting.
¹ This is a software layer responsible for blockchain management.
² This is a software layer responsible for P2P communication with the other nodes.
³ The actual I/O operations are more since IO for every block is accompanied by those for some associated meta information.