BAO Box: custom solution for Fault Tolerance

Dasha Korotkykh
Hivecell
Published in
3 min readSep 15, 2020

--

Rewind to a few days ago: at a remote location of a company the bare-metal server is set up and connected to the admin dashboard. The app is up and running, the endpoint data starts to fuel the business process, and that remote branch is not a blind spot for the company anymore. That is until the hardware fails at a most inconvenient time, with no replacement, and no technical crew onsite to fix it immediately.

Fast food restaurant, tank ship, rural hospital, telco tower, airport security room — it doesn’t matter where it happened. The practical value of data produced at the edge relies on its integrity and uninterrupted computing.

The BAO Box from Hivecell is built exactly for this scenario, reflecting software container architecture. The hardware architecture is atomic — separate blocks working organically in a cluster, but autonomous as needed, managed by the primary unit, directly accessible by the user.

BAO Box from Hivecell brings to a table Hardware Fault Tolerance, Database integrity, and Virtual infrastructure scalability.

Here’s how it is organized:

The BAO cluster consists of units, stacked on top of each other — a primary unit and any number of replica units, with the primary actively managing the cluster and the replica in standby mode, sharing the load. Both the primary and replica are identically configured and both databases are represented by a single IP address.

The primary node database is in active mode and provides read/write access to the DHIS2 application. The replica node database continuously replicates the primary node database for a backup. It also works as an additional read-only data provider for user requests.

The primary load balancer distributes traffic to the DHIS2 applications running on the cluster of BAO boxes (there is another idle load balancer at each replica node, which will take over as needed). These applications also connect to the database through a single IP address.

Should the primary node become unavailable for any reason — software/hardware/network failure, manual removal, or even physical damage, the next replica box is automatically “promoted” to a primary and takes over the cluster management.

For the administrator, the IP address of the cluster and the database IP address have not changed; the failover is resolved seamlessly. The system has already received a notification, and upon user confirmation, a replacement node is shipped out to support the processing capacity of the cluster.

This can happen any number of times — the multi-box clusters are ready by always having the data backup and automated standby control infrastructure.

The BAO Box is designed primarily for medical institutions that are dependent on the wholeness of gathered records. This also takes into account legal regulations which require that the data is kept within the country it was produced in. The Hivecell design addresses all of the data security, storage, and processing efficiency concerns.

Our next step is to develop the capability to deliver updates to a remote clinic’s cluster by loading the updates onto a BAO box, transporting it to the clinic, and simply placing it atop the existing stack. After a security handshake authenticates the new node, the whole cluster is updated with the latest code. The entire update process is designed to require no expert knowledge to complete.

With this automation in place, it is a solution that allows even the most remote locations to operate independently of network stability with no downtime and without IT support.

--

--