MetaMask Labs presents Mustekala — the Light Client that seeds data

Last week, MetaMask labs proudly announced its work on a Light Client solution! We presented it at Devcon4 in Prague Czech Republic, as well as at Hi-Ether Con in Tokyo Japan!

The Metamask Labs team with the organizers of Hi-Ether
The Metamask Labs team presenting At DevconIV

Light Client vs Full Node

A node is a device that connects to a blockchain network. Any computer connected to the Ethereum network can be referred to as a node. A Full Node is a device that verifies and enforces all of the rules of the blockchain, while a Light Client is referencing a trusted Full Node’s copy of the blockchain. This is important because you can interact on the blockchain without downloading an entire copy of the blockchain.

Running a Full Node is hard

Interacting with the Ethereum blockchain requires access to a node to retrieve the latest data. The blockchain is huge, and running a Full Node is very resource intensive. Already requiring a lot of memory and storage, the blockchain only continues to expand by the minute. An average user does not have the gigabytes of memory and storage space to spare.

A Light Client allows for as little verification as possible while remaining secure and fully capable of interacting with DApps and using the Ethereum network. Light clients do not store the entire copy of the blockchain, only specific elements of the blockchain that they need to operate. They store these elements to execute tasks and functions, such as sending and receiving funds. Consequently, they are required to connect to and query Full Nodes.

Verification

When you sync your full node to the blockchain you are verifying the proof of work consensus. This is the solution to difficult puzzles, one for each block, which get more or less difficult in order to keep the block time consistent. Syncing takes a lot of time because you have to start all the way at the genesis block down to the next, verifying it down the line millions of times. Each subsequent block carries the accumulation of state changes and transitions over time. In our experience, it can take anywhere from a couple hours to a few days to sync a Full Node. It uses a significant amount of IOs and needs a solid state drive at the very least. Even if you are using an implementation that starts from a checkpoint block then it’s still difficult to discover useful p2p nodes you’re able to fetch data from. Syncing state takes a long time due to network issues and having to process every block to reconstruct the state. As a result among other factors, users are forced to rely on centralized infrastructures.

Full nodes and Light Clients both sync blocks. However, Full Nodes hold the whole recent state (everyone’s account balance and storage info) whereas Light Clients hold specific portions of the state for those accounts and transactions that they care about. Full nodes validate state transitions by using the state it currently holds and when a new block arrives, new transactions are factored in the block and generates a new state.

Simply put, a Full Node makes sure everything matches up and is validated. Conversely, a Light Client is unable to undergo the same validation process as it lacks the recent state. However, assuming it has a reliable source for the latest block header, it can ask for elements of the state — like account balance or a smart contract — from Full Nodes.

We need options for access

Most applications access blockchain data through a small handful of providers, creating silos of information where data is redundant but not highly available. Unchecked, this trend will continue towards centralization. Not only does it require a great deal of trust but it isn’t the peer to peer experience that Ethereum embraces. Not to mention the cost of these providers to maintain this information distribution.

Light clients sound great — what’s the problem?

There are too many Light Clients per Full Nodes. Let’s say there are 10,000 discoverable online/active Full Ethereum Nodes. If we were to turn all Metamask users (About 1.5 million) into Light Clients it would overwhelm them very quickly. The merkle tree that holds all accounts and token balances of Ethereum has over 200 million tree nodes at a given block. Every new block adds thousands of new ones that have to be reconstructed and shared. The volume would be like half time at a world cup concession stand. The Full Nodes being the people serving and the light nodes being the sea of people so eager to request and get what they want. It would be madness and the Full Nodes would not be able to keep up with the demand.

Enter Mustekala — the Light Client that seeds data!

As it stands today Light Clients are leechers of data and information. But what if Light Clients could share the data it has also? They don’t have all the data that a Full Nodes has but what if they could share the data that they do have to other Light Clients? This would distribute the load on Full Nodes while also making information readily available to everyone.

MetaMask’s solution is to take and share slices of the merkle tree. These are essentially neighborhoods around your account which include other accounts. The same slicing principle applies to smart contract data as well. It’s understandable that token holders will be concerned about their key-value inside the storage data, which could be referenced and hence shared using the same slice taxonomy. This would bring down the amount of tree nodes shared from 200 million to about 65 thousand instead. Each slice is about the size of a picture (128KB) which is manageable to deal with in comparison to the multiple GBs of storage required from the Full Node experience.

So now we have the idea that Light Clients are going to be serving information to each other. What does that look like? Each dot represents a computer and the lines represent connections to other computers. Eventually you will have a large well connected graph. In a p2p network a great way of distributing information is gossiping which is taking information that is shared with you and then passing it along to your neighbors. This works especially well when everyone is interested in the data that you are sharing. For instance, there is high demand for information on the latest block.

A p2p network

So what would it look like if we are gossiping slices of the blockchain instead? If you are looking up an account or token balance you have to get the slice associated with it. You would have to find others in the network who are looking for that also. You can form p2p neighborhoods, and the latest version of the slice can be passed between that network as well.

Gossiping slices

How do we plan to do this?

Mustekala offers an alternative approach to a light client solution by using the libp2p network stack. This is the same technology that powers IPFS (InterPlanetary File System) to make Ethereum data available as ultra-light elements that can be retrieved based on its content, not its location. This is made possible by a p2p overlay network that we created called “Kitsunet”. Your MetaMask client can be one of many participants in a massive mesh that helpes make blockchain data available to all types of devices, from phones to browsers, IoT hardware, and even miners. The possibilities have expanded in exciting ways!


Conclusion

Users depend on MetaMask to interact with the Ethereum blockchain. From DApps, to tokens, to transferring ETH directly, but Metamask and other tools currently rely on centralized infrastructures. This was instrumental for bootstrapping the Ethereum ecosystem, but this breaks the premise of decentralization and we can now do better.

The goal of the Mustekala Light Client is to

  • Have native browser support for ease of use.
  • Be fast: Ready to be used in seconds.
  • Seed its data to other users for better distributed networks.

Learn more about Mustekala on Github

Stay up to date on Twitter

Thanks for reading and stay tuned!
 @JSONLEE3