An automated node failover solution is surprisingly easy to set up on a Web 2 infrastructure. It has two nodes with identical content. An external server monitors the primary node and detects if there is a problem. If so, your site’s internal DNS is automatically updated so that traffic will be redirected to your secondary node. Once your primary node is working again, traffic will be returned to your primary node.
Depending on the final configuration, this solution could bring high redundancy. However, it isn’t cheap; it requires three servers that continuously consume many resources even if they are not doing much work.
One of the principal objectives from this tutorial is to keep costs down. Virtual cloud providers have many tools to manage the cloud infrastructure giving us more control to manage our budget.
In a Web 3.0 infrastructure, we would need a more sophisticated solution to create a similar approach. Managing peer-to-peer connections and redirecting traffic is far more complex to handle.
An alternative solution that we will explore in this tutorial will be to start the secondary validator node only when our primary node goes down. We can think of it as an “emergency power system.”
To achieve this, we will use an AWS EC2 instance to replace the primary node in case of failure. The primary node could be at your office, home or any cloud provider location. This node will be connected to the AWS cloud through the AWS IoT gateway by using a simple pub/sub setup.
What is AWS IoT?
AWS IoT is a platform for efficiently managing “Internet of Things” devices, also called “things.” AWS IoT can connect to other applications in the AWS cloud, such as Lambda functions, which are a handy tool for a serverless infrastructure.
Things transfer data, called messages, through the IoT gateway, where they update a thing shadow or get to a rule engine. A thing shadow is a (JSON) object that contains information about a device that is updated when the state of the thing/device changes. It allows us to set, store and synchronize the things state and their cloud representation.
The rules engine, as the name implies, is a set of rules that enable messages to interact with SQL type statements from the IoT gateway to a handful of downstream AWS services.
We will focus on the rules engine as we want to connect our thing/device to Lambda functions.
Message Queue Telemetry Transport (MQTT) is a Client-Server publish/subscribe messaging transport protocol. It is lightweight, open, simple, and designed to be easy to implement. These characteristics make it ideal for use in many situations, including constrained environments such as for communication in Machine to Machine (M2M) and Internet of Things (IoT) contexts where a small code footprint is required and/or network bandwidth is at a premium. It enables a mechanism to notify interested parties when an abnormal disconnection occurs.
The protocol runs over TCP/IP, or over other network protocols that provide ordered, lossless, bi-directional connections. — http://mqtt.org/
The IoT gateway works as the MQTT broker to allow device-cloud communication.
- The subscriber (thing/service) connects to the broker. It can subscribe to any message “topic” in the broker.
- The publisher (client/thing) publishes messages under a topic by sending the message and topic to the broker. This connection will be encrypted for sensitive messages.
- The broker (IoT MQTT broker)then forwards the message to all things/services that subscribe to that topic.
What are Lambda Functions?
AWS Lambda is an event-driven computing platform that executes code on-demand. This feature enables serverless architectures for any application.
A common way of triggering an AWS Lambda function is using rules through a topic. We previously mentioned the AWS IoT Rules Engine; we will use it as our tool to trigger rules from the MQTT broker. It’s a compelling way of decoupling the publishers and subscribers of messages and removes the need to poll for new messages.
We’ll set a rule and subscribe a Lambda function to start/stop an EC2 instance every time a topic receives a message. This rule will also trigger an SNS topic to notify us by email.
CloudFormation is an AWS tool that helps us set up a cloud infrastructure effortlessly. We define all the resources that AWS should build into a template document, click a button, and AWS magically creates everything.
We are going to use a template that will create our IoT Thing, configure the rules engine, create the Python Lambda functions and finally, set the related events, roles and policies.
The serverless architecture will handle the start and stop of an EC2 instance. This tutorial requires that you already have a working EC2 instance validator node with the polkadot client and configured with a systemd service. It remains as a separate component to this stack.
We need an X.509 certificate and private key to establish a secure AWS IoT connection. For security reasons and following the best practices, we will not create it on CloudFormation.
On the IoT Core console, go to Secure -> Certificates. Click on Create and select the recommended one-click option to create the certificate.
Once done, download all of the certificates into your local workspace folder and click Activate.
Don’t forget to download the general root certificate to authenticate the connection against AWS servers. A root CA for AWS IoT Download.
Download or copy the next template into your local workspace folder. We will upload it to the CloudFormation console.
Before creating our ClourFormation stack, we need to get the IoT certificate ID and the EC2 instance ID.
On the IoT Core console, go to Secure -> Certificates. Click on the certificate and copy the ID.
From the EC2 instance console, copy your validator’s instance ID.
It’s time to build our serverless infrastructure. Go to CloudFormation console and click on create stack.
We will be asked to choose a template or upload one. Select the upload template option and use the BasicServerlessValidator.yml file that we previously stored on our local workspace.
Fill in all the required fields (Stack Name, Certificate ID, Email, Instance ID).
Click Next twice to skip the creation stack customization until you get to the create stack option. Tick the box to acknowledge the creation of IAM resources required for the roles and policies used by our serverless stack.
Click on create stack and wait for creation completion.
We have created our serverless cloud infrastructure. 🎉
IMPORTANT: You need to verify that the triggers got registered in the Lambda function. It sometimes happens that due to a race condition on CloudFormation, they don’t appear on the Lambda function.
To fix this, you only need to update the trigger event without changing anything. It will then appear on the Lambda function triggers. Repeat this step for the 3 IoT rules (ValidatorNodeDisconnected, ValidatorNodeStart, ValidatorNodeStop).
After updating, you can verify that the trigger got registered on the Lambda function.
Primary Validator Node
Now that we have ready our serverless cloud infrastructure. We need to establish the data connection between our validator node (Thing) and the AWS IoT gateway.
Download or copy the next python script into your primary validator node.
Install the required dependencies to use the Python script.
$ python3 -m pip install AWSIoTPythonSDK pystemd
We need to replace the configuration parameters on the Python script with our certificate and private key files.
In the IoT Core console, click on the Thing that we created “ValidatorNodeThing”. Select the Interact tab and copy the HTTPS IoT Rest API Endpoint address. Replace the host_name value with this address.
E.g.- ‘host_name’: “a1yfesihws3xl1-ats.iot.eu-west-1.amazonaws.com”
For security reasons, AWS doesn’t use files with open permissions. Change file permissions to enable read only access.
$ chmod 440 /path-to-downloaded-files/*
Register Python script as a Linux systemd service
Once you have your Python script ready, next thing you’ll do is create a configuration file that tells systemd what we want it to do.
$ sudo vi /lib/systemd/system/polkadot-node-iot.service
The file needs to have the following text (replace the path-to-script):
Description=Python script for the ValidatorNodeThing connection.
Update file permissions to 644:
$ sudo chmod 644 /lib/systemd/system/polkadot-node-iot.service
$ sudo systemctl daemon-reload
Enable service autostart on boot:
$ sudo systemctl enable polkadot-node-iot.service
Our serverless failover solution is complete!! 🎊 🎉
Every time that our primary node gets disconnected or the polkadot-validator.service goes down; our serverless infrastructure will take care of managing a secondary validator.
Follow the Github repo for the latest work on more advanced solutions.