MaidSafe part III — Joining & anonymity

David Irvine
MetaQuestions
Published in
5 min readSep 3, 2014

This is a large area of the network, partially covered in a small post. I hope it makes sense. My intent is lots of posts and all small, I think it works better.

A lot of chat about, is the system really anonymous is based on tracking IP addresses etc. I was surprised recently when I heard that Tor/i2p exposed users ip address at compromised web sites. Not because there was a flaw, that happens, but because the source IP address was available at all. I thought further and it made sense. There is a significant difference and this issue will always exist in those systems as they are currently designed. This is not an anti Tor or i2p post in any way, but a foray into a different approach. I would love the TOR/i2p folks to be part of this journey, so poking a stick in their eyes is not an intent of mine.

A different approach

As we covered in earlier parts of this series, MaidSafe uses a secured DHT implementation, based on XOR networking. This is hugely important, the DHT has the ability to hold information. Some of that is encrypted and secured (in our case all of it bar, public data and public keys). So lets not dive too deeply in that for now. Keep in mind though this is not an encryption mechanism to traverse todays Internet (web services), its very different.

This is peoples data we are talking about, it is very important!

It may be a debate to some, but it should be restricted to pure logic where possible.

Types of nodes on MaidSafe

There are two main node types for now (to keep it simple), a client and a vault. So lets recap

Client -> Unique and private ID that has no link to any public ID or name. This ID is used to GET and PUT data. It is not linked to a person or a name, it is a unique 512 bit id. Even owners will probably not know what it is. A client does not perform routing tasks, that means it is like a passive producer/consumer of information.

Vault-> Unique and private ID that has no link to any public ID or name. This ID is used to store data. These ID’s are the routing infrastructure and what we all connect to, essentially.

Connection to network

Both node types join the network in fundamentally the same way. They read from either a locally cached list of previously known nodes, or they fall back to hard coded nodes in the source code. The nodes they list have IP:PORT and Public keys. The node will encrypt a message to one of these nodes requesting login (or connect). The bootstrap node gathers this info and returns it to the joining node (encrypted).

The joining node then connects (encrypted) to the closest nodes as returned by the bootstrap node.

So there you have it, 100% encrypted communications from the very first message. [note the node connects to multiple bootstrap nodes to confirm the answers match and a user can be emailed a connection list from a friend if they wish. This gets away from arguments about centralised root nodes etc, which is pretty silly, but folk do argue that point].

Operate on the network

As a client you will join / login (explained in a later post) as above and be connected. Many people ask, can a client store data (for the network) etc. or be part of routing and we say no. This seems harsh, but important. I will explain.

A client connects to its close nodes (with an anonymous ID as explained) and then requests data etc. The request goes to the close nodes (who do know the IP:PORT) and is then relayed across the network (with the IP:PORT stripped). At the end or location where the data exists, it’s sent back to the anonymous ID. Each node sends this closer to the ID and the close nodes eventually deliver it.

So on the network there is no notion of IP:PORT for messages unless you are close to the node (XOR close).

So then people say, I will get close and then get the IP address. Well this is not simple, close to where? You cannot connect or ask for connection info to a client, it cannot answer, it is not part of routing. Where is it located geographically for you to snoop on? Can you become a close node vault? That’s going to take a huge amount of work now. It will mean getting a vault ID close to the node. You do not choose your vault ID the network does and this is random across the address range. Even then you are not part of routing either until you have enough rank. This is not an easy feat at all, very likely millions of computers running correctly for a while with a random ID and to potentially get close to a single node, which may change its ID anyway. A very hard task indeed. At the end of such an attack, who are you monitoring, you will have zero clue as any public name info is further encrypted inside the messages encrypted by the anonymous ID. We could get into reams and reams of what ifs here, I intentionally don’t as it makes the post too long and boring for most. The system docs go into great detail and are continually updated for those answers. TL;DR a targeted attack like this is not easy. Non targeted attacks are covered later in the consensus chain mechanism, this is a good precursor to that.

We could go further and create a random client ID every time, or even easier create a data-getter/putter id every time. I do not think it necessary, but very simple if required. So you cannot see an IP address or tie a data ID to a person, so already it feels a bit secure. As every single message is encrypted, who cares if you compromise the routers etc. We don’t.

Vaults are persistent on the network and connect to more than just their close nodes, but each node decides to respond to a request for connection or not. It will only connect if the request looks good, i.e. improves the routing table. To put this in perspective after a node starts the first such accepted connection will be 1 in 1 million, after that its circa 1 in several million and increases quickly. What this means is that to improve a nodes routing table you will need a huge amount (100’s of millions) of connection requests and these have to be from valid well ranked nodes.

I hope this small post helps people put the network into perspective. The fact the IP address does not traverse the network is important. This is all down to the fact everything lives on a decentralised addressable network where all communications and exchanges of messages are encrypted by mechanism that ensure in transit encryption and end to end identification of every network hop. If people ‘get’ that part then it makes this journey so much simpler, but also shows this path has not been trodden, so there is a huge amount to consider. A simplification is to say, this is a fully encrypted system that includes the data within it, as data is included there is no leak out to any third parties at any point (no server or machines that know or choose what they store).

So there you go another 6am post, I am off for a sleep now.

[edit fixed typo i2c/i2p too much arduino recently :-) Thanks @anonymouscoin (Kristov Atlas)

--

--

David Irvine
MetaQuestions

Great spirits have always encountered violent opposition from mediocre minds.