The hard DiSC of the world computer
Swarm’s decentralized storage model and its inception
It was December 2014 right after we returned from the devcon0 in Berlin where there were 30 odd people constituting the extended halo of the whole Ethereum dev community. Dani and I were already working for Jeff as part of the go ethereum team. We spent countless hours in Berlin debating the devp2p architecture layer. I remember back in Budapest, we were sitting with Dani (Daniel A. Nagy, the father of Swarm) and Fefe (Zsolt Felfoldi, of ethereum light client fame) in the Arena shopping mall and discussing how we could kickstart the development of the network protocol layer of Swarm. We knew the wisdom that kademlia routing needed UDP transport to be efficient enough. However, the devp2p that we had available for the Ethereum client worked with keep-alive TCP connections. I suggested to the guys that I could cobble together a peer-to-peer protocol using this framework with the live connections arranged in a kademlia table and used directly to route requests via a chain of relaying nodes: and forwarding kademlia was born. Later I found that such attempts had been published in the literature under the name of recursive kademlia as opposed to iterative / zooming kademlia.
It was Dani’s insight from the beginning that whatever routing we use the resulting DHT should already be a storage model by itself. Meaning that as opposed to BitTorrent, IPFS etc where the DHT tracks the seeders of content, our narrower interpretation tracks the content itself. This implies a new paradigm to distributed storage: in contrast to the “traditional” file sharing type of systems, this solution qualifies as shared storage of files.
In a file sharing system you open up your harddrive to the network, which means that you register yourself as a seeder of the content found on your local harddrive. In Swarm if you open up your harddisk you basically become responsible for a “shard”.
A shard is part of the network’s storage designated as an area of the keyspace based on your node’s overlay address. The stored units are assigned to the shards based on their content address.
Letting the system decide what you store locally has many implications and puts further constraints on the system. To have natural load balancing of storage you gotta store content in fixed sized data blocks called chunks. There needs to be a protocol called syncing responsible for transporting chunks from wherever they were uploaded by a Swarm node towards the neighbourhood of their address where they will be requested by downloaders. Note that the uploader may not even take part in the storage itself so Swarm qualifies as a genuine cloud service allowing users to just “upload and disappear” — an often quoted motto on our banner. This by itself already points to the fact that Swarm is not just a decentralized storage and content distribution solution but a platform for permissionless and censorship-resistant publishing.
So this model, if complemented by proper bandwidth and storage incentives, can implement a maximum utilization distributed storage where the resources are optimally allocated. Earlier we called this model DPA for distributed preimage archive, but since then we introduced single-owner chunks that are not content addressed. The acronym was not compliant with the 3 rules of Swarm acronyms, so we found one, aptly called DiSC, a Distributed Immutable Store for Chunks with data integrity by signature or content address.
To add to permissionless publishing on the uploader side and plausible deniability on the storer side, forwarding kademlia also provides another important privacy feature for the swarm: private browsing. Relayed messaging enables nodes to stay anonymous or deny being the originator or destination of request or content.
Forwarding kademlia proved an excellent design choice which is best indicated by the fact that several surprisingly elegant solutions to new problems have fallen out of this architectural feature. For example, it solves the opportunistic caching problem that takes care of autoscaling popular content. With the quasi permanent peer set that it implies, it offers a scalable peer-to-peer accounting and settlement scheme for the incentivisation of routing, i.e, to compensate nodes for sharing their bandwidth. In particular, for the final settlement on the blockchain, it allows nodes to maintain a balance with only a small number of peers and yet has the ability to send messages to any node or neighbourhood in the network. Similarly to a payment channel network, this entails that nodes can engage in implicit transactions with exponentially more nodes than your local peer set. But this is the topic for another blog post …
Let’s stay in touch!
- The Swarm team is reachable on Discord. All tech-support and other channels moved there. Please join us on Discord!
- Discussions about Swarm on /r/ethswarm and /r/ethereum subreddits.
- Please feel free to reach out via info@ethswarm.org
- Swarm up your inbox with our monthly newsletter! Subscribe here.