Early file sharing networks such as Napster relied on centralized search systems. These networks were fast and efficient but not censorship resistant. Later file sharing networks such as Gnutella used decentralized search systems. These networks were censorship resistant but not fast and efficient. Modern file sharing networks such as Ethereum Classic (ETC), IPFS (InterPlanetary File System) and BitTorrent use the Kademlia system. Now search is fast, efficient and censorship resistant!
The Kademlia system was developed by Petar Maymounkov and David Mazières of New York University. Kademlia requires network nodes and content to have unique numerical identifiers. Kademlia defines the “exclusive or distance” between network nodes as the exclusive or of their identifiers. Network nodes focus on storing information for “close” network nodes. They also focus on storing content with “close” identifiers. This is why Kademlia is fast, efficient and censorship resistant. Note that exclusive or distance is unrelated to geographical distance.
Network nodes maintain lists for various exclusive or distance ranges defined using powers of two. For example, a network with eight bit identifiers would have eight lists for the following ranges:
2⁷ ≤ exclusive or distance < 2⁸
2⁶ ≤ exclusive or distance < 2⁷
2⁵ ≤ exclusive or distance < 2⁶
2⁴ ≤ exclusive or distance < 2⁵
2³ ≤ exclusive or distance < 2⁴
2² ≤ exclusive or distance < 2³
2¹ ≤ exclusive or distance < 2²
2⁰ ≤ exclusive or distance < 2¹
Lists only contain information with identifiers corresponding to their ranges. For scaling reasons, lists only contain a limited number of entries. Note that lists associated with smaller exclusive or distances can be complete unlike those associated with larger exclusive or distances.
If a network node does not contain information for some identifier, it queries the network nodes closest to that identifier it knows about. The queried network nodes return information on the closest network nodes to the identifier they know about. This process continues recursively and can quickly locate any desired information.
Network nodes that leave the network may have their information removed from these lists. When a list is full and cannot accept a new network node entry, the last network node mentioned in the list is queried. If that network node does not reply, its entry is replaced with the new network node entry. Note that long running network nodes are favored and that the lists are constantly updating.
Kademlia defines four messages types:
PING — confirms the existence of network nodes
STORE — stores information on network nodes
FIND NODE — returns the network nodes closest to an identifier
FIND VALUE — returns the information associated with an identifier if known, or, the network nodes closest to that identifier
Multiple FIND NODE messages are typically sent simultaneously in case the receiving network nodes have left the network. To avoid losing information, information can be periodically copied to nearby network nodes with STORE messages. Popular information can be copied to larger numbers of network nodes for greater speed and efficiency.
Joining a Kademlia network requires knowledge of at least one network node. Reliable network nodes commonly used to assist new network nodes are referred to as bootstrap nodes. New network nodes first send FIND NODE requests for their own identifiers. This reveals nearby network nodes that can also be queried.
ETC uses the Kademlia system to find network nodes but not for file sharing. PING and FIND NODE messages are used but not STORE and FIND VALUE messages. ETC network node identifiers are the Keccak 256 hashes of their public keys.
Kademlia provides swift, scalable, decentralized and censorship resistant search services on modern networks. ETC uses a subset of its features for network node discovery. Kademlia is just one of the many brilliant innovations that make ETC possible.
Feel free to leave any comments or questions below. You can also contact me by email at firstname.lastname@example.org or by clicking any of these icons:
I would like to thank IOHK (Input Output Hong Kong) for funding this effort.
This work is licensed under the Creative Commons Attribution ShareAlike 4.0 International License.