Implementing Kademlia in Go

Emily Tang
Princeton Systems Course
8 min readFeb 9, 2019

I. Introduction

For our project, we decided to reimplement Kademlia, a distributed hash table described in this paper by Petar Maymounkov and David Mazieres published in 2002. The paper describes a distributed hash table that uses an XOR metric to determine the distance between two keys. Although the paper gives a thorough explanation of the algorithm and a proof sketch of its O(log N) lookup, there are no results evaluating an actual implementation. We were interested in how the routing algorithm performs and how effective caching is at improving performance. We reimplement the lookup algorithm in an iterative fashion as described in the paper and evaluate our implementation on varying distributions of keys as well as with and without caching. The code for our project can be found here: https://github.com/peterdelong/Kademlia

II. Kademlia background

The Kademlia paper emphasizes two features of the algorithm

  1. XOR distance metric: Kademlia defines the distance between two IDs as their XOR interpreted as an integer. The paper states that XOR is important, because it is symmetric. It also leads to the address space being laid out as a binary tree.
  2. Lookup based on binary tree: In the Kademlia lookup algorithm, nodes are treated as leaves of a binary tree. Finding the closest node ID to a key is done by repeatedly querying nodes in the subtree closest to that value which the node knows about (see explanation and diagram further below).

Kademlia defines a notion of distance between a node’s ID and the key it is trying to store, and the key-value pair is stored on the node whose ID is closest to that of the key. The paper defines node IDs and keys as 160-bit IDs. There are also several parameters such as k, a replication parameter that is related to the number of nodes expected to fail in an hour, and α, the amount of parallelism in a query.

Each node maintains identifying triples (IP address, Port, Node ID) about other nodes in the system using k-buckets. k-bucket number i contains k triples for the nodes between 2i and 2i+1 in distance from itself. The k-buckets are updated during node lookups and upon receiving RPCs. In this way, a node has exponentially more information of the nodes close to it and less the further away you get. This is similar to the concept of the finger table in Chord where you can move in a fine-grained way when you’re close to the value, but when you’re far away, the jumps are much larger.

The paper defines 4 RPCs to be used in its protocol:

  • PING: Used for testing node liveness
  • STORE: Takes a key and value as arguments. Receiving node stores the key-value pair.
  • FIND_NODE: Takes a node ID as its argument. Returns the k nodes the receiving node knows about closest to the ID
  • FIND_VALUE: Takes a key ID as its argument. If the receiving node has the key-value pair stored, returns it. Otherwise, returns the k nodes the receiving node knows about closest to the ID (just like FIND_NODE)

Figure 1: Example of a node with prefix 0011 querying for the node with the prefix 1110. Source: Maymounkov et al

The core of Kademlia is its lookup algorithm. It is used to locate the node whose ID is closest to a given ID (either a node or a value). A node u begins a lookup for an ID i by using its k-buckets to find the k closest nodes to the ID from all the nodes u knows about. u then sends RPCs to α of the nodes concurrently, asking each of them for the nodes it know about closest i. Once the nodes reply, u updates the list of the closest nodes it knows about. This process repeats until u has sent messages to and received responses from the k closest nodes it knows about. This means that the list of the closest nodes u knows about isn’t changing. In the case that Kademlia is trying to find another node, it will send FIND_NODE RPCs. For locating a key-value pair, u sends FIND_VALUE RPCs. In this case, the lookup will terminate upon a node returning u a value instead of a list of nodes.

An example of the lookup process is shown in Figure 1 where the node sends 4 rounds of RPCs. In the first round, it queries the closest node to the target it knows about. In subsequent rounds, it queries the closest nodes it learned about from the previous rounds of RPCs.

To store a value in the system, a node performs a find node, giving the key as the ID. It then sends a STORE RPC with the key-value pair to the k nodes returned by the FindNode query.

Lookups for a key converge to the same path as they approach the correct node, so values can be cached along the path. For caching, after the value is successfully located, the value is cached on the closest node it observed that didn’t return the value by sending an RPC.

The Kademlia lookup algorithm or a modified form of it has been implemented and used in several distributed applications include BitTorrent and Ethereum.

III. Implementation

We reimplement Kademlia’s basic lookup algorithm and caching to replicate its distributed hash table functionality in Go. For the lookup algorithm and caching, we follow what is described in the paper’s basic explanation of it. We use 160 bit IDs like in the paper which are generated by taking the SHA1 hash of either the keys or the address:port for key–value pairs and nodes respectively.

To make requests to the system, we also gave each node a REST API. We built a Python client that could make requests against this API so we could query the system easily. We also used this Python client to facilitate the testing of the system as a whole since it allowed us to easily query keys based on a given distribution.

The paper details methods to add fault tolerance to the system such as republishing key-value pairs every 24 hours, but we did not implement these.

IV. Evaluation

We evaluated the performance of our implementation using FindValue requests for a variety of distributions of keys. We store all of the keys before querying to make the testing more straightforward.

We evaluate using Professor Freedman’s cluster. A DHT would normally be running over a geographically distributed area, so evaluating within a cluster is not the most realistic environment. For our caching evaluations, we assume we have an infinite cache available. We use an alpha of 3 and k of 4. Our evaluation always uses a key space of size 1024.

We evaluate using two classes of distribution: Uniform, and Zipfian. We are able to vary the alpha parameter for the Zipf distribution to generate distributions with different weight tails.

Figure 2: CDF of latency for Zipf and uniform key distributions with and without caching. The evaluated system had 75 nodes and the Zipf distribution has a parameter of 1

Figure 2 plots the CDF of latency for Zipf and uniform key distributions with and without caching. Caching along the path improves latency for both the Zipf and uniform distributions. We assume we have an infinite software cache available, so this likely exaggerates the benefits of caching using a uniform distribution a bit. However, Zipf with caching has better latency than uniform with caching, despite having an infinite cache available. This is most likely due to physical cache effects where the relevant entries in the node’s hash table are more likely to be in a CPU’s cache under a Zipf distribution than under a uniform distribution

Figure 3: CDF for Zipf distributions with different values of alpha.

Figure 3 plots the CDF of latency where keys are queried from Zipf distributions with different values of alpha. As expected, caching performs better for Zipf distributions with a higher value of alpha since the tails are lighter.

For the remaining results, we evaluate using a Zipf distribution, since it is more representative of a realistic key distribution.

Figure 4: Latency versus number of requests made against the system for a Zipf distribution with a parameter of 1 with and without caching.

Figure 4 shows how latency changes as more requests are made against the system with caching on and off. Each point represents the average of a window of 100 requests. Although this graph is rather noisy, we can still see some general trends in latency. The cached and non-cached systems both start off seeing similar latency. However, as more requests are made against the system, we see the latency drop as more requests are served from cache earlier in the process. The latency for the uncached system does drop a bit as nodes fully populate their k-buckets, but the latency it converges to is still higher than with caching.

Figure 5: Latency versus number of nodes. The x-axis scale is exponential in powers of two and the error bars are at the 5th and 95th percentiles

Figure 5 shows how latency scales with the number of nodes. The number of nodes are shown by their log2, so the x-axis is exponential. Latency increases linearly in this scale, so the scaling in absolute terms is logarithmic. This is expected since Kademlia should have an O(log(n)) routing complexity.

V. Implementation Notes

We used Go’s RPC library which operates over TCP and unfortunately doesn’t have a UDP option, although the protocol is defined as operating over UDP in the paper. This does increase latency, because of the TCP setup. Additionally, we found that this constrained the throughput of the system, because Go would only allow us to have so many open TCP connections at once. Also, as stated before, we don’t implement features related to fault tolerance, so we didn’t handle nodes leaving the system, key republishing, or cache pruning. Although we obtained our results without running into any issues, subtle concurrency corner cases that were not tested may exist in the code.

VI. Conclusion

From the implementation and above analysis, we see that Kademlia scales quite well as nodes are added into the system. We were able to quantify the performance of the system (which was something the paper never did) and show that caching can be effective at reducing latency. If we were able to continue work on this, we would test over an external network to give accurate results under more realistic conditions and implement fault-tolerance related features, so we could try to reproduce the results of some other Kademlia-related papers, which specifically seek to look at Kademlia networks under conditions where nodes go on and offline continuously.

--

--