We have been asked many times, given that the MaidSafe network is decentralized and has no data centre backup, how data integrity and data access can be maintained when storage locations (vaults) continually join and leave the network.
In this blog post, we are going to answer this very valid question. If you read our previous blog post (Multiple Store/Delete) you will already be familiar with the concept of persona’s and if you haven’t read it, it would be worth doing so now.
Having read our previous article you will also now be familiar with the PmidNode, we’re now going to introduce the PmidManager. The PmidManager, as name indicates, manages the status of PmidNode. It retains information on whether the PmidNode is online or offline and what chunks have been stored on that node. Vaults running in this persona or role bear the address, which is in the closest group to the PmidNode’s address.
PmidNode forced to leave the network
Prior to starting this demonstration we stored one chunk to the network and this is shown in the figure below as a couple of PmidNodes (in blue) receive the request and start holding data.
We then manually remove one of the nodes by killing the process directly. Ending the process in this way means that the node won’t have the opportunity to send out a leave notification (which would normally happen in a real world example). As can be seen in the following figure, the node is now gone and is shown as a light grey ellipse.
Leave notification and data spawning
The leaving node does not send out a message to say he is leaving, rather the nodes neighbours notice the node has left after the lack of connection times out. Currently this is set to 10 seconds and can be adjusted depending on the real world network status.
As shown in the following figure, the PmidManager (shown as green square) close to the PmidNode notices the node leaving the network stating “dropping PmidNode …”. The PmidManager then checks the record of that PmidNode and calculates the data that has been stored to that node, (stating “holding chunk …”) before sending out a leave notification to the corresponding DataManager.
Once the DataManager (shown as yellow square) receives the message, it will mark that node down (i.e. take it out of the alive PmidNode list). If there is not enough alive PmidNodes, it will pick a new PmidNode and re-put the data to that node (stating “re-put chunk … to PmidNode …”).
Notice the statement “PmidNode storing chunk …” in the green square. This is the new PmidNode storing data (bear in mind a Vault can run in many different roles). It is also worth noting that when spawning the data, each DataManager will choose a PmidNode totally at random. This enhances the security and robustness of the network. Given a group size of 4, there might be maximum 4 new PmidNodes to be chosen to store data. If we set the threshold to trigger spawning to be half the group size, the total copy of data on the network will be 1.5 * GroupSize (6 copies if GroupSize is 4).
Now we describe the process of the node re-joining the network.
The PmidManager will notice the node re-joining immediately. The record of the node will then be checked and notification will be sent out to corresponding DataManager (stating “PmidManager joining PmidNode … holding chunk …”). Once the DataManager receives the notification, it will mark that node up, i.e. put it into an alive PmidNode list (stating “DataManager marking node … up for chunk …”).
In conclusion, a PmidNode holds the data and only the DataManager knows where the data goes (i.e. by maintaining a list of alive PmidNodes). The PmidManager monitors the status of PmidNodes and notifies the relevant DataManager when the status of the PmidNode changes by either leaving or joining the network. The PmidManager is able to do this as it retains the list of data that is stored in the PmidNode and the DataManagers are guaranteed to be the closest group to the DataName. By sending out a group message to the address of DataName, the appropriate DataManager will be notified.
Spawning of data will only be triggered when the DataManager notices there is not enough alive PmidNode’s in it’s record. The total number of copies of data on network will be: group_size + threshold.
The following is the full video of the demo.