Syncing Devices Offline at SALIDO
At SALIDO, we understand the importance of the need for restaurants to be able to continue operations in the event that network connection is disrupted.
How Do Restaurants Use SALIDO?
SALIDO provides a point-of-sale system for restaurants to handle all aspects of their operations. The point-of-sale system consists of mobile devices running the SALIDO software.
Communication Between Devices
Syncing orders between devices is an excellent example of the need for peer-to-peer communication. For editing an order, the devices in the system must know which device owns the order, and request that device for ownership of the order. This information is dependent on the devices updating information in the database. However, what happens when access to the server is cut off due to the internet or wifi connection going down?
Introducing Raft Consensus Algorithm
Raft is used for obtaining consensus between servers in a distributed system. In our usage, the servers can be thought of as the mobile devices and the distributed system is the entire point-of-sale system that handles all the necessary operations in a restaurant. Consensus occurs when a majority of the servers in the system agree on a value or action.
Raft uses a selected leader to handle all operations. All servers in the system start off as a follower. Each server also contains an election timer, which starts an election when it times out. The timeout for the election timer is random and at least an order of magnitude larger than the time it takes for devices to communicate with each other.
In the event that a server times out, it becomes a candidate. When a server becomes a candidate, it increments its term. A term is used to track election cycles. After incrementing its term, the candidate sends a request for other servers in the cluster to vote for it. The candidate will continue sending this request until it becomes the leader, another server becomes the leader, or it times out and starts another election.
A candidate becomes leader when a majority of the servers in the cluster vote for it. A server grants a vote to a candidate if its own term is less than the candidate’s term and if it has yet to vote for another server.
Another server can become the leader in the same term by receiving a majority vote, and sending a message to the candidate stating its dominance. A candidate accepts another server as leader if the leader’s term is greater than or equal to the candidate’s term.
In the case of a split vote, the candidate times out and starts another election.
Raft uses a replicated log for determining consensus between servers. The replicated log can be thought of as an array.
Lets consider an example: We have Terminal A, Terminal B, and Terminal C.
For simplicity we’ll choose Terminal A to be the leader.
If Terminal A receives an action, “Add extra noodles to the order”, then it will store this action in its log. After storing the action, Terminal A will send out a request for its followers, Terminal B and C, to replicate this action. If Terminal B and C replicate this action, they’ll return a success to the leader.
The leader can check to see if a majority of its followers have replicated this entry, and if so, it can commit the entry by updating its commit index. A commit index is just an integer to let a server know which entries it can apply safely. For example, if the commit index is 2, entries 0, 1, and 2 can be applied safely. For a leader to update its commit index, the entry at the desired index must match the leader’s current term. In our current example, after the followers respond with a success, the leader can update its commit index and add extra noodles to the order.
When a leader sends an entry for its followers to replicate, it also sends other data for the followers to use to determine if it can successfully replicate the leader’s log. The leader sends the index and term of the log entry preceding the entry that is to be replicated, its current term, the current message of the log entry, and its IP.
If the follower’s term is less than the leader’s term, it will update its term, and reset its voted for value. If the follower’s term is greater than the leader’s term, it will send back a failure to append the entry and its term. If the two terms are equal, then it is able to proceed.
After proceeding, the follower attempts to find an entry with index and term equivalent to the index and term of the entry preceding the desired replication entry. If it is able to find this entry, the follower will truncate all entries following this entry, and append the new entry after it.
The follower responds to the leader with either success or failure. If the replication was successful, the leader increments the next index for that particular follower. The next index is just a variable that tells the leader what the next entry to send to the follower is. This index can be potentially different for all followers. After incrementing the next index, the leader will check to see if it can update its commit index, then it will check to see if there are any other entries to replicate to the follower. If the follower responds to the leader with failure, the leader will decrement the next index for that particular follower and attempt to replicate the entry at that index.
The check for the preceding entry and term is an induction step that is used to guarantee consistency between servers.
Device communication is done via UDP / TCP. We found UDP to work better in this case because it allows us to have a lower election timeout range.
We used delegation to modularize Raft into testable components.
Using Raft For Syncing Devices Offline
Going back to the example where access to the server is cut off, we can use Raft to handle the editing of orders. The leader device in the cluster is the owner of the order, and any client request to edit the order on a follower device is redirected to the leader device. The leader device then stores this action to its log as an entry, replicates the log entry to its followers, commits the entry if a majority of the followers replicate it, and finally performs the update to the order if the entry is committed.
A quick demo of our implementation can be found below.
We are looking forward to enhancing the offline syncing capabilities and providing the data consistency that restaurants deserve through fully integrating Raft into our system.
If this sounds interesting to you, SALIDO is hiring!