v0.7 Release — Distributed Networking
v0.7 is here. This is a great opportunity to thank the community for their support and being here for this milestone as its certainly one of the more important milestones that we’ve reached.
As you are most likely aware; v0.7 is all about Tangram’s network, and logically describing the topology. It comes with an added level of anonymity that was originally planned for the long-term, but brought it forward as we have come to the conclusion that this is something necessary to implement sooner rather than later.
The aim of this article is to explain the updates in v0.7.
- Introduction to Tangram’s network architecture through logic and outlining the properties that make up gossip protocols.
- Modifications to both the networking layer, and also the storage and look-up system
- Variation in a basic approach to achieve a more robust and efficient strategy handling members in a network and how it is applied to Tangram.
- Particulars of the specific storage and look-up system (Kadence) implemented in Tangram
- Features currently integrated that Kadence offers
- Security model that is currently being operated under
- Information on the number of nodes in v0.7 and their specifications, along with a brief overview of future performance evaluation considerations.
Last but not least, we are announcing that an added level of anonymity has been brought forward: Tangram now runs in Tors free and open network!
An Introduction to Tangram Network Topology
Tangram uses a gossip protocol to manage memberships and events of nodes organized in an arbitrary way. The organization of the nodes is referred to as the network topology. Our topology of choice will be detailed in another paragraph. In this section, we discuss the gossip protocol used by Tangram, which is based on “SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol”. — We strongly advise people who want to know more to read the paper in its entirety. The below will be Tangram’s attempt to describe and reformulate the approach simply.
Briefly, a membership protocol provides each process (“member”) of the group with a locally-maintained list of other non-faulty processes in the group. The protocol ensures that the membership list is updated with changes resulting from new members joining the group, or dropping out (either voluntarily or through a failure). [SWIM]
Basic concepts of a gossip protocol
There are two important components in the SWIM protocol:
- Failure Detection Component, that detects failures of members
- Dissemination Component, that disseminates information about members that have recently either joined, left the group, or failed.
Both components work together sequentially in order to ensure four key characteristics meet the composition of failure detection protocols.
- Strong Completeness: crash-failure of any group member is detected by all non-faulty members
- Speed of failure detection: the time interval between a member failure and its detection by some non-faulty group member
- Accuracy: the rate of false positives of failure detection;
- Network Message Load, in bytes per second generated by the protocol. [Unreliable Failure Detectors for Reliable Distributed Systems]
Failure Detection Component
With failure detection in SWIM the following procedure takes place. A ‘ping’ from an arbitrary member’s (Alice) membership list to a random member (Bob) within that membership list. Alice then waits for a reply ‘ack’ from Bob within a predefined time-out (set to 5 seconds).
If Alice does not receive a reply (‘ack’), she indirectly probes Bob by selecting a failure detection sub-group at random (Carol, Dave and Eve etc…) and sends each a ping-req message. Each of these members then probes Bob with the same ping and forwards (if received from Bob) the ack to Alice.
Alice checks if any acks have been received from Bob or any from Carol, Dave and Eve (failure detection sub-group), if none have been received from Bob and her membership list then Alice declares Bob as ‘failed’ from her membership list and gives this update to the Dissemination Component which we describe down below.
There are a couple of scenarios in the above described failure detection component which may cause false positives and this is where there needs to be a mechanism to reduce the number of false positives in the network. Scenarios:
- Network packet loss
- Bob is asleep
- Alice has a slow process
Now one or more of these determine that Bob should and would be removed from his cluster within the network.
Once Alice has detected that Bob is failed, she notifies her membership list. The members within this group delete Bob from their membership list.
The same procedure takes place when newly members join or members who voluntarily leaves.
When a member wishes to join the Tangram network, join messages are broadcasted and group members within a cluster who are listening for join messages probabilistically (tossing a coin) to reply.
There are several other ways for members to join a cluster (membership list).
- Well known server or IP multicast address, all joins could be directed to the associated address
- Static coordinator could be maintained within the group for the purpose of handling group join requests
- Multiple coordinators are maintained within a group
Efficient and robust SWIM is introduced with the following:
We briefly described the basic approach to a SWIM protocol. We will now discuss the approach used in Tangram.
- Infection-style dissemination component
- Suspicion mechanism
- Round-robin probe target selection (for time-bounded strong completeness)
Infection-style dissemination component
The premise of the infection style dissemination component is to piggyback membership updates on the ping, ping-req and ack messages sent by the failure detector protocol. This approach benefits the protocol by giving better qualities to packet losses and of low latency from the failure detector protocol to the dissemination component by eliminating extra packets IE — (failed, join and leave).
What this means is failed, join and leave messages are put into the same packet / transfer as ping, ping-req and ack, fewer message types = lower rate of failure and higher consistency of message throughput.
We described the basic approach for the SWIM protocol to declare a member (Bob) failed. We also described the alternate reasons for which a member could be falsely classified as failed (due to false positive). To address the probabilistic nature of the classification, a Suspicion subprotocol is introduced.
So, whenever the failure detector component detects a failed packet from Bob or from Carol, Eve or any other members (as described above), instead of Alice declaring Bob as failed and forcing Bob to leave the group Alice declares Bob as suspected in her membership list which then propagates this message through the Dissemination component and all members mark Bob with the same message. This is done using an infection style procedure as described above rather than the basic SWIM approach. This means that Bob is now suspicious and is not kicked from the network and rather is treated as a normal member of the network where pings would still be sent to Bob to re-address whether a failed is true.
There are two possible scenarios:
- Bob was a victim of a false-positive
- Bob is indeed a failed member
Bob is a victim of a false-positive
If Bob is still marked as suspected in any of the member’s (Alice) membership list when he successfully responds with ack, he is unmarked as suspected and Alice broadcasts that Bob is alive through the group (which is handled by the Dissemination component). Naturally, Bob isn’t suspected anymore. It is important to note that whenever Bob is marked as suspicious, he broadcasts the alive message immediately to all members for propagation.
Bob is indeed a failed member
If Bob is still marked as suspected and does not reply with an alive message before a pre-specified time-out (5 seconds), members that already have Bob as suspected in their membership list then broadcast a message that confirms Bob is indeed a failed member and removes them from their membership list.
Round-robin probe target selection
Above we briefly described how the failure detection component in SWIM selects a member (Alice) randomly from their membership list (Bob), this satisfies the Strong Completeness characteristic in failure detection protocols where the detection of failed would be eventual. This may however lead to a large delay in the process across members.
A solution to this would be to modify the basic SWIM approach so that Alice does not select a member from her membership list by random (Bob) but rather in a round-robin fashion.
Let’s take an example of a new member joining a cluster. Bob gets selected by Alice to be a part of the the same cluster and ping Bob exactly once during each update in Alice’s membership list which lowers the rate of detection time for failures of any member for Alice. This satisfies the Time Bounded Completeness property for members being detected as faulty or non-faulty within the Alice’s membership list.
We described the process of a group membership protocol (SWIM) above. Herewith we describe a peer-to-peer storage and lookup system utilising Kademlia.
Kademlia has a number of desirable features not simultaneously offered by any previous peer-to-peer system. It minimizes the number of configuration messages nodes must send to learn about each other. Configuration information spreads automatically as a side-effect of key lookups. Nodes have enough knowledge and flexibility to route queries through low-latency paths. Kademlia uses parallel, asynchronous queries to avoid timeout delays from failed nodes. The algorithm with which nodes record each other’s existence resists certain basic denial of service attacks. [Kademlia]
To implement a distributed peer-to-peer storage and lookup system, Tangram uses the extensible, hardened, and secure distributed systems framework library of Kadence with some modifications that incorporate SWIM and Kadence to be fully supported within TOR’s hidden network and a lockstep operation for sending and receiving which we will describe in the next section.
Feature highlight for Kadence, currently in use for Tangram:
DDoS & Spam Protection
Kadence can automatically generate SSL certificates and supports full end-to-end encryption via TLS using it’s built in HTTPS transport adapter to prevent eavesdropping and man in the middle attacks.
Kadence extends Kademlia’s node identity selection with the same cryptography bitcoin uses for securing funds. Node identities are derived from the hash of the public portion of an ECDSA key pair and each message is signed to ensure it hasn’t been tampered with in transit.
Sybil & Eclipse Mitigation
Kadence employs a proof of work system using Scrypt for generating valid node identities and subsequent acceptance into the overlay network. This forces nodes into sufficiently random sectors of the key space and makes Sybil and Eclipse attacks computationally very difficult and ultimately ineffective.
Automatic NAT Traversal
Kadence supports multiple strategies for punching through network address translation. This enables peers behind even the strictest of firewalls to become addressable and join the network. Fallback to secure reverse tunnels is supported through the use of Diglet servers.
Multiple Network Transports
Kadence supports the use of multiple transport adapters and is agnostic to the underlying network protocol. Support for UDP and HTTP/HTTPS ship by default. Plugin your own custom transport layer using using a simple interface.
Persistent Routing Tables
Kadence remembers peers between restarts so after you’ve joined the network once subsequent joins are fast and automatically select the best initial peers for bootstrapping.
Sender & Destination Anonymity
Kadence ships with full support for Tor Hidden Services out of the box with no additional software installation or configuration required. This enables fully anonymized structured networks and leverages the latest version 3 hidden services protocol.
Let’s explain what a lockstep operation is before detailing how it works in Tangram:
Say Alice sends Bob a transaction, and includes:
- Value of transaction
Alice ‘hits’ send and a lockstep operation will first send the properties and notify Bob that a transaction is coming. Bob is now aware and receives the notification and awaits for the transaction to take place, he verifies the transaction based on the lockstep operation.
Lockstep process and operation works as follows where a key/value pair is created and passed to the message entities hash function.
Then it is added to the lockstep entity.
Key is used as a lookup and the hash is the combination of the key and value.
This is passed on as a notification to the other nodes (Bob). When it’s received by Bob, Bob will then store the lockstep message (no check is processes if a key exists.). Key will get overridden with the new value i.e creates a new entry.
The lockstep is used for validation when extra data is received or data that has the same key arrives. The key is used as a lookup against the store and can only do this if the lockstep has been seen IE - with the key to retrieve the data that matches the hash sent, with the stored hash and therefore it only pass as valid message.
Security models that we are operating under:
- 3rd party actors (non-members of a cluster within the network) getting access to events
- Member(s) in a cluster manipulation
- Invalid messages / data transfer
- Man in the middle attacks
- Denial of service attacks to the network or any member(s) of the network
n Nodes and specifications
There will be 6 fully validating nodes. Following specifications for initial release.
Disk Storage: 500GB SSD-Boosted
Bandwidth: 100 Mbit/s port UNLIMITED Traffic
OS: Linux — CentOS7
Following the initial release 3 further nodes will be brought up for further testing which are as follows:
Disk Storage: 1000GB SSD-Boosted
Bandwidth: 100 Mbit/s port UNLIMITED Traffic
OS: Linux — CentOS7
Following applies for release 0.7 and will be tested with a membership cluster of 1 member each, which make up 3 clusters with 6 to be fully operational in the coming week.
All messages and transfers are made through TCP packets as this is limited by the TOR architecture. Maximum payload size is 2kb.
Will release further performance tests within the coming 2 weeks of release date.
- Bandwidth Metering
- Configurable Trust Policies
- and more…
Tangram is growing steadily and is still in its infancy. By introducing v0.7 we have cemented a solid foundation to strengthen the software and move on to further enhancements and optimisation.
The key to this release was to combine several features for node to node propagation, as well as the data integrity of node communication within the network and much more.
We expect questions, suggestions and discussion. Again, a big thank you to the community for your support.
We’ll be updating this article based on community feedback.
[Unreliable Failure Detectors for Reliable Distributed Systems] tdchandra, stoueg.
[Kademlia] sgrumbach, rriemann