The hard DiSC of the world computer

Swarm’s decentralized storage model and its inception

Viktor Tron
Ethereum Swarm
4 min readMay 12, 2020

--

It was December 2014 right after we returned from the devcon0 in Berlin where there were 30 odd people constituting the extended halo of the whole Ethereum dev community. Dani and I were already working for Jeff as part of the go ethereum team. We spent countless hours in Berlin debating the devp2p architecture layer. I remember back in Budapest, we were sitting with Dani (Daniel A. Nagy, the father of Swarm) and Fefe (Zsolt Felfoldi, of ethereum light client fame) in the Arena shopping mall and discussing how we could kickstart the development of the network protocol layer of Swarm. We knew the wisdom that kademlia routing needed UDP transport to be efficient enough. However, the devp2p that we had available for the Ethereum client worked with keep-alive TCP connections. I suggested to the guys that I could cobble together a peer-to-peer protocol using this framework with the live connections arranged in a kademlia table and used directly to route requests via a chain of relaying nodes: and forwarding kademlia was born. Later I found that such attempts had been published in the literature under the name of recursive kademlia as opposed to iterative / zooming kademlia.

Iterative (r) and Forwarding (l) Kademlia routing: A requester node shown with a cross in the circle at address …0000… wants to route to a destination address …1111… to which the closest peer online is the blue circle at …1110… The initial ellipses represent the prefix shared by requestor and destination addresses which is n bits long. Left: In the forwarding flavour, the requestor forwards a message to the connected peer they know that is nearest to the destination (green). The recipient peer does the same. Applying this strategy recursively relays the message via a chain of peers (green, orange, blue) each at least one PO closer to the destination. Right : In the iterative flavour, the requestor contacts the peers (step 1, dotted black arrows) that they know are nearest the destination address. Peers that are online (green) respond with information about nodes that are even closer (orange arrow, step2) so the requestor can now repeat the query using these closer peers (orange, step 3). On each successive iteration the peers (green, orange and blue) are at least one PO closer to the destination until eventually the requestor is in direct contact with the node that is nearest to the destination address.

It was Dani’s insight from the beginning that whatever routing we use the resulting DHT should already be a storage model by itself. Meaning that as opposed to BitTorrent, IPFS etc where the DHT tracks the seeders of content, our narrower interpretation tracks the content itself. This implies a new paradigm to distributed storage: in contrast to the “traditional” file sharing type of systems, this solution qualifies as shared storage of files.

Distributed hash tables (DHTs) used for storage: node D (downloader) uses Kademlia routing in step 1 to query nodes in the neighbourhood of the chunk address to retrieve seeder info in step 2. The seeder info is used to contact node S (seeder) directly to request the chunk and deliver it in steps 3 and 4. || Swarm DISC: Distributed Immutable Store for Chunks. In step 1, downloader node D uses forwarding Kademlia routing to request the chunk from a storer node S in the neighbourhood of the chunk address. In step 2 the chunk is delivered along the same route using pass-back response.

In a file sharing system you open up your harddrive to the network, which means that you register yourself as a seeder of the content found on your local harddrive. In Swarm if you open up your harddisk you basically become responsible for a “shard”.

A shard is part of the network’s storage designated as an area of the keyspace based on your node’s overlay address. The stored units are assigned to the shards based on their content address.

Letting the system decide what you store locally has many implications and puts further constraints on the system. To have natural load balancing of storage you gotta store content in fixed sized data blocks called chunks. There needs to be a protocol called syncing responsible for transporting chunks from wherever they were uploaded by a Swarm node towards the neighbourhood of their address where they will be requested by downloaders. Note that the uploader may not even take part in the storage itself so Swarm qualifies as a genuine cloud service allowing users to just “upload and disappear” — an often quoted motto on our banner. This by itself already points to the fact that Swarm is not just a decentralized storage and content distribution solution but a platform for permissionless and censorship-resistant publishing.

So this model, if complemented by proper bandwidth and storage incentives, can implement a maximum utilization distributed storage where the resources are optimally allocated. Earlier we called this model DPA for distributed preimage archive, but since then we introduced single-owner chunks that are not content addressed. The acronym was not compliant with the 3 rules of Swarm acronyms, so we found one, aptly called DiSC, a Distributed Immutable Store for Chunks with data integrity by signature or content address.

To add to permissionless publishing on the uploader side and plausible deniability on the storer side, forwarding kademlia also provides another important privacy feature for the swarm: private browsing. Relayed messaging enables nodes to stay anonymous or deny being the originator or destination of request or content.

Forwarding kademlia proved an excellent design choice which is best indicated by the fact that several surprisingly elegant solutions to new problems have fallen out of this architectural feature. For example, it solves the opportunistic caching problem that takes care of autoscaling popular content. With the quasi permanent peer set that it implies, it offers a scalable peer-to-peer accounting and settlement scheme for the incentivisation of routing, i.e, to compensate nodes for sharing their bandwidth. In particular, for the final settlement on the blockchain, it allows nodes to maintain a balance with only a small number of peers and yet has the ability to send messages to any node or neighbourhood in the network. Similarly to a payment channel network, this entails that nodes can engage in implicit transactions with exponentially more nodes than your local peer set. But this is the topic for another blog post …

Let’s stay in touch!

--

--