Structuring Network with XOR
*Originally posted on the MaidSafe blog 27th May 2016*
A prerequisite to understanding the SAFE Network on a technical level, including the consensus process, requires knowledge of the underlying structure which powers it as a decentralized, autonomous system. Peer-to-peer networks can be categorised into two general types: structured and unstructured. In unstructured networks, there is no explicit organization of nodes and the process of joining and forming connections to others is random. In structured networks, like those which use a distributed hash table (DHT) such as Bittorrent or the SAFE Network, the overlay protocol includes a structure for organizing the network and makes purposeful connections to specific nodes more efficient.
One of the most widely adopted DHT’s, named Kademlia, was originally popularised through its implementation in Bittorrent’s Mainline DHT which removed dependence on central trackers for finding the locations of nodes and data stored on the network. Kademlia employs a rather simple operation called “exclusive or” (XOR) to establish a mathematical relationship between node identifiers. While SAFE uses a modified version of Kademlia, the XOR operation is consistent across implementations and understanding this equation will give insight to all networks based from Kademlia.
To best understand how XOR facilitates a structured, p2p network, let’s start from the very basics of the operation which at its foundation, compares two inputs and then outputs their difference. The input numbers used are binary, meaning they are made of only 0’s and 1’s. The mathematical symbol for a XOR operation is ⊕.
To show the simplicity of calculating the XOR output of two binary numbers, let’s first look at an example with fairly large numbers as inputs:
Input A: 11001010011010100100010101011110
Input B: 01001000101110011010011111000101
Now, to find the XOR output, simply compare the bits (a bit is a single digit in a binary number) individually and in order. Where the bits are the same, place a zero (0) and where the bits differ, place a one (1).
The table below shows the calculation of the 32-bit inputs we chose where input A is the first row in yellow, input B the second row in blue and the XOR output last in green.
Since the first bit in input A is 1 and the first bit in input B is 0, the XOR output for that digit is 1. Meanwhile, the second bit in both numbers is 1 so the XOR output for that digit is 0 and the third bit in each number is 0 so the XOR output for that digit is also 0. By comparing each digit down the line as the same or different, we arrive at an XOR output of 10000010110100111110001010011011. The decimal conversion of that value is 2194924187 which is not such a straightforward calculation, however, it can be helpful to know how the pattern of binary counting works:
And so on...
Properties of XOR
Now, to get a grasp on the usefulness of XOR calculations, let’s take a step back and focus on 1-bit numbers as our inputs.
XOR OPERATIONS ON 1-BIT NUMBERS (0, 1)Input AInput BOutput COperation A⊕B==C0000⊕0==00110⊕1==11011⊕0==11101⊕1==0
Using the table above (which shows every possible combination of those values), we can see that regardless of the input values, if they are equal to each other, the output is zero. Alternatively, if input values are not equal, the output is a non-zero value (1 in the case in 1-bit values). The last characteristic we can gather from this table is that if we swap A for B then C stays the same which in mathematics is called commutative and can be expressed as:
if A ⊕ B == C therefore B ⊕ A == C
Furthermore (but a bit more difficult to tell in this simple example), if we swap A or B for C, the new output will be the value which C replaced and can be expressed as:
if A ⊕ B == C therefore C ⊕ B == A and A ⊕ C == B
We can now observe how these properties hold true with slightly larger binary values.
XOR OPERATIONS ON 2-BIT NUMBERS (00, 01, 10, 11)Input AInput BOutput COperation A⊕B==C00010100⊕01==0100111100⊕11==1101010001⊕01==0001101101⊕10==1110110110⊕11==0111011011⊕01==1011110011⊕11==00
Using the table above (which only shows a sample of possible combinations) we can still see that equal inputs give an output of zero (00), unequal inputs give a non-zero output and the property where swapping any of A, B or C for each other in the operation holds valid (highlighted in coloured rows). As the binary numbers grow larger, these characteristics of XOR operations will continue to hold. Additionally, we can deduce that the XOR output of two values (also called XOR distance) A and B will always be unique for those inputs. In other words, there is only one value B at a given distance C from given value A which can be expressed as
if A ⊕ B == C then never A ⊕ B == D and never A ⊕ D == C
XOR Relationships in Networks
With basic understanding of XOR characteristics, let’s now explore how it maps onto a peer-to-peer network using binary tree graphs.
The two graphs above illustrate the simple tables we used to explain XOR properties with the left side being a 1-bit network (two nodes) and the right side, a 2-bit network (four nodes). Within each graph, a step to the left at a vertex point adds a zero (0) bit to the end of the number and a step to the right adds a one (1).
For better understanding of these properties in larger networks, the graph above shows a 5-bit XOR network consisting of 32 nodes (00000 to 11111 in binary, 0–31 in decimal) and follows the same vertex stepping rule. The two blue coloured nodes, 12 (01100) and 15 (01111), have an XOR distance of 00011 (3 in decimal) while the two orange nodes 16 (10000) and 31 (11111) have an XOR distance of 01111 (15). However, even though the blue node 15 and the orange node 16 are next to each other in the line, their XOR distance is even larger at 11111 (31) and shows that XOR distance does not follow the same distance assumptions as we are used to. Since every node has a unique distance to every other node in the network, it becomes quite simple to reliably relate them to each other using this property and therefore finding nodes that are closest to each other is straightforward by doing an XOR calculation on a node ID and the smallest distances (think back on the property of swapping values in XOR calculations).
Say we want to find the closest 4 values to the input value of 01010 (10) then we can XOR the input value with the 4 closest non-zero distances 00001 (1), 00010 (2), 00011 (3) and 00100 (4).
⊕00100Output(11) 01011(8) 01000(9) 01001(14) 01110
Now, if we take one of those closest values, say, 01110 (14) and again find the closest 4 non-zero values to it, we get a unique set of outputs.
⊕00100Output(15) 01111(12) 01100(13) 01101(10) 01010
Since the XOR distance between a particular value and every other value is unique, the closest value set will also be unique. With that in mind, imagine a network using XOR calculations to relate node IDs where each node establishes connections with and stores data about their closest nodes. By communicating with the group of nodes closest and asking what each of their closest nodes are, any single node can eventually locate any other node in the network creating a distributed database.
It is extremely important in XOR networks that each node has a unique ID. In decentralized networks where there is no central party to issue and enforce unique IDs, this task requires a large enough ID range (ie. with 128-bit or 256-bit numbers) to reduce chance of overlap and a secure hashing algorithm to prevent predetermination of a node’s ID and therefore a node’s placement in the network. Due to the necessarily high ID space and random placement, decentralized networks will not have nodes occupying every value in the ID range and therefore, the closest nodes are most likely not going to be one of the closest 3 nodes like the example above. Regardless of what the actual closest nodes are, however, the relationships between them allows for each node to maintain a narrow perspective of the network and use their established connections for scoping beyond that when needed. This type of relay network makes it easier to discover data and route messages in a targeted but decentralized manner.
This concept is the foundation for Kademlia-based distributed hash tables (understand the name better, now?), including those used in Bittorrent and SAFE. In Bittorrent, this discovery pattern allows nodes to discover which other nodes are storing a particular file. In SAFE, the data on the network is identified with IDs in the same range as nodes so that the data itself can be mathematically related to nodes for storage purposes in the same way nodes are related to each other. XOR closeness is the basis for the SAFE Network’s consensus processes to ensure data reliability and security. This will be covered in more detail in a future post now that we have established an understanding of XOR properties in structured, p2p networks.