By now probably everyone has heard about sharding. In very simple terms sharding is a mandatory approach for scalability of blockchains which want to be more than just settlement layers.
Sharding was not invented for blockchain, it has been around for a very long time mostly for databases or data storage. It’s simply a way to store data across multiple machines. Basically every machine will have a “shard” (basically a piece) of the data.
This is all very easy to implement when you yourself control all the machines. If you have a data center and you must store a very large database (think Google or Facebook) the way to do this is pretty straightforward.
When you work with decentralized blockchain technology however you always have to assume that you will have some malicious nodes. This requires the development of a consensus algorithm. The issue is that it’s more tricky to develop a consensus algorithm for a sharding based blockchain than for a simple blockchain where every node stores all the transactions.
At the moment ethereum is trying to build a fully complete sharding mechanism. The problem with that however is that when you split the data you basically do it in a way that is almost random. This means that most of those shards will need to communicate extremely frequently with each other.
Let’s take an example, nowadays people don’t need to cross the border with other countries very frequently. If they did it would create disastrous queues and delays at the customs. But imagine now that for example the border between US and Canada is changed so that the line between the two countries is in the middle of Manhattan in New York. How many people will need to cross that line every single day? Potentially millions.
This metaphor seems crazy but it’s the major issue with sharding. If you split the data in a way that is not based on a specific logic (which would be almost impossible as every smart contract has its own logic) then basically you will have an extremely high amount of intercommunication between those different nodes. To make things worse those nodes are all over the world and connected by Internet and not in the same room like it would be for at the data center storing a large database. In some way the current model of sharding has to make a trade-off and in order to reduce how much data a single note has to store it has to increase the amount of communication between nodes very significantly.
Zilliqa is however going to use a different approach. They are making a very clear distinction between the state and the transaction history. Let me try to explain this concept in a simple way with a metaphor:
The state is what is now, and it is created by the history of all the transactions:
So basically how much you have in your bank account is the current state. If you have a lot of money that is your current state. If you don’t have anything in your bank account that is your current state. The transactions are what have made you reach your current state. If historically you have been earning more than what you have been spending then your current state is that you have money, if historically you have been spending more than what you have been earning then your current state is that you don’t have money.
The principle is that it takes a lot more space to store the history of the transactions than the space it takes to store your current balance (state).
The history of the transactions however is very important in case someone needs to check if the balance is correct. But as long as you know your balance you know how much you can spend. From time to time you may want to verify if your balance is correct by checking against all your transaction history, but that’s not something you will need to do every day.
The approach that Zilliqa has chosen with sharding is that every single node will have a copy of the current state (the bank balances in our example) but then the transactions history will be split in pieces so that not everyone will have to have a full copy of it.
This clever solution allows to fix the main issue with sharding that is the very high amount of intercommunication with notes that would be required.