Goodbye Blockchain, hello Datachain?

As you might have read, a few weeks ago I was surprised by Maidsafe and their current state of development. Maidsafe (a company led by David Irvine) is working on a decentralized internet (SAFE network) for over 10 years now. Although their “Alpha version” is located on dozens of nodes in a data center (droplets) it’s actually more decentralized than I expected it to be. I joined their TEST 12 network and dove into the inner-workings of SAFE.

Since my last post I read quite some dev updates and blog posts with intriguing names like: “Structuring Networks with XOR” and “Data chains: what? why? how?”. I also dove into the RFCs (request for comments) which can be written by anyone. The idea is to let developers (both internal and external ones) come up with ideas for software features and let the leading devs decide if they implement it or not. When they don’t, everybody can see the proposed idea and even fork the project to implement it themselves. It was David Irvine (CEO Maidsafe) to come up with the idea of a “datachain”.

Blockchains and their limitations
As most people already know every Blockchain comes with it’s limitations. The idea is that all nodes check all transactions added to a chain which adds to security, but limits speed. Most blockchains break with only 7 to 15 Ts/sec. so adding more features to them like voting, requests for data chunks between nodes will slow down the system immediately. Adding more blockchains is probably possible to a small amount, but syncing 200 or 300 of them is quite unthinkable. The only solution I see is the addition of “lightning networks” but these are quite limited (although a great addition) to transactions.

So what are these Datachains? It took me several reads of this piece by David Irvine to understand what he has come up with. In short: on the SAFE network, a group of nodes (let’s call them “Group 1”) is responsible for a range of addresses in the network. Let’s say Group 1 is responsible for address-range AAA to CCC. Someone uploads a chunk to the network with a hash that starts with the letters Ab. This means that the network will route this chunk to Group 1 and they will store it on several nodes. If someone else requests this chunk again, Group 1 will route it to the user.

This idea is quite easy to follow but comes with a trade off. What if there’s an error in the network? A disconnection between millions of nodes with all chunks gone? It’s like a torrent loosing all it’s seeders at once. This is one aspect where datachains come in. Whenever Group 1 stores something it’s all registered in a datachain. They could have several of them and can be updated as well. So Group 1 chains all the hashes of all chunks together and shares this “hash list” with other groups. Whenever there’s an update to an address (say a .doc file got edited in an address) they simply update the particular data-hash in the datachain and all agree on this update as a group (example: 66% needs to sign). Whenever there’s a big error in the network or a global shutdown of the internet, groups could re-join later an proof they are part of Group 1 as this proof is hashed in the datachain as well. And as all nodes of Group 1 share the same datachain they could re-construct all data in their group and bring it back up as it was before the crash. Other groups will do the same as the Vaults make money when they “farm” (mine Safecoin). So there’s an incentive to secure and always serve data to the network.

Group security
But wait a second, thousands of nodes protect blocks and transaction on a blockchain, how secure are the groups in Maidsafe’s network? What if 12 out of 20 nodes corrupt a group?

This is the most important thing for the SAFE network IMO. And there’s an RFC on that. It’s called “Node Ageing” and it makes sure you can’t target a group and get away with it. In short: you can’t just join the network and pick you address. So if you join the network, you aren’t trusted yet and a group of nodes picks you up and gives you an address. let’s say you get SAFE-address ABC so you are accepted by group 1. This group won’t let you vote on anything but they expect you to do a little POW on some data they provide. They also check your bandwidth. So maybe after 6 or 7 minutes you are part of Group 1 but you can’t sign anything with this group. After some time you are relocated to another group chosen by Group 1. So you now may be accepted by Group 323025 as they got the crypto signs from Group 1 that you did your little POW and behaved as wanted. Now your “age” goes from 1 to 2 and it might take an hour before you are relocated again. Maybe you are asked to do some more POW again. And this is where the group security comes from. It might take 3 relocation's before you’re able to secure the group’s data and even then they can kick you out if 66% (example) of the group doesn’t think you are doing a good job. So there is an incentive as well: want to farm Safecoin? Behave, work hard and do a very good job.

All data secured by datachains
After reading about “Node Ageing” I really got convinced of the idea of Datachains. Just think about it: all data from pictures to video but also mail, chats and wallets are stored by groups in the SAFE network. Even your own personal file with your private keys is stored in there. You are the only one that can locally remove the encryption and get to your wallets, data etc. But it’s all data stored on the network. If you want this data to be secure, you need to make all groups as secure as you can. Maybe even have several groups checking each other by the use of datachain hashes. You could also chose to have certain transactions signed by several groups and store the transaction proof in a public part of the datachain. A non-profit could chose to have a “public” wallet where everybody can check their accounts, while you might chose to send some Safecoin to a friend without anyone noticing. Overall such datachains could provide a lot of extra features to a decentralized network including making things more secure and public when needed.

Here it is in David Irvine’s words:

Data chains would appear to be something that is a natural progression for decentralised systems. They allow data of any type, size or format to be looked after and maintained in a secure and decentralised manner. Not only the physical data itself (very important), but the validity of such data on the network.

If Maidsafe can come up with these secure groups and the right implementation of datachains they might have something quite amazing. Let’s see how this evolves in coming period.