Staking KEEP on a Raspberry Pi cluster with Ansible, Kubernetes (k3s), GlusterFS and more

Steven Yuan
Aug 31 · 11 min read
Image for post
Image for post

Before we begin, there are plenty of other guides out in the public right now that address setting up a standalone Keep node (even one for Kubernetes). Instead of reinventing the wheel, this guide is meant to be an opinionated take on how you can run a bare metal Raspbery Pi cluster at home using an Ansible setup script. By the end of the guide, you should be able to get up and running using only one command + two relatively short config files.

For less than $300 you can stand up your very own in-house Kubernetes cluster that self-heals and has redundant replicated storage. Let’s get started!

Side note: At the time of this article, there are no official keep-network Docker images for the arm64 architecture. The images used in this guide are compiled and uploaded to my own Docker Hub account in an automated fashion with the latest release tags (keep-core, keep-ecdsa).

The first step is to finish all the pre-requisite steps of obtaining testnet KEEP, delegating tokens, authorizing contracts, and setting up an Infura account for Ethereum endpoint access. If you haven’t done so yet, follow this guide by Ben Longstaff and come back here after you finish Step 8.

Hardware

  • At least four Raspberry Pi 4B (2 GB) are recommended. You could probably get by with three units, but four will provide better stability. I personally use 4 GB Pis to run other applications.
  • 16 GB microSD Card for the operating system. I use 32GB to be safe.
  • 32 GB USB 3.0 external storage for replicated data. 1 TB SSD is highly recommended for mainnet if you decide to run geth with your deployment. I use 64 GB drives due to my other applications.
  • 5-port unmanaged switch. You’ll need more ports if you have more than 4 Raspberry Pis (one port is the uplink).
  • Check out the repository for more specific equipment recommendations including power supplies and cables.

Read the Quick Setup below to get started immediately. If you need more details, jump down to the section that starts with “Full end-to-end setup instructions”.


Quick setup

Install Ansible.

Install jmespath:

$ pip install jmespath

Checkout the repo:

$ git clone https://github.com/syuan100/keep-pi-cluster
$ cd keep-pi-cluster

Make a copy of hosts.ini.example and remove .example :

$ cp hosts.ini.example hosts.ini

Edit hosts.ini and add the IP addresses for your Pis, replacing the xxx.xxx.xxx.xxx's:

[master]; Put the IP address for your main Kubernetes serverxxx.xxx.xxx.xxx[nodes]; Put the IP addresses for the rest of the nodesxxx.xxx.xxx.xxxxxx.xxx.xxx.xxxxxx.xxx.xxx.xxx...

Copy your keystore file from Step 1 of Ben Longstaff’s guide into the keystore directory.

Make a copy of group_vars/all.yml.example without the .example:

$ cp group_vars/all.yml.example group_vars/all.yml

Most of group_vars/all.yml can be left as default. Below are a list of variables you need to change:

# Use the endpoints provided from your Infura account
eth1_endpoint: https://ropsten.infura.io/v3/XXXXXXXXXXXXXXXXXXXXXXXX
ethereum_url: wss://ropsten.infura.io/ws/v3/XXXXXXXXXXXXXXXXXXXXXXXX
# storage_path will depend on how you plugged in your storage device. Use the command lsblk while logged into your Raspberry Pi to see what to change this value to.
storage_path: /dev/sda1
# Change storage_size to be LESS than the amount of external storage you have plugged into one Pi
storage_size: 30
# Change to your account address authorized for the random beacon
beacon_account_address: "0xAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
# The exact name for your keystore file
beacon_keystore_filename: UTC--2020-08-29T17-45-22.740497000Z--AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
# The password used to secure your keystore file
beacon_account_password: yourpasswordhere
# Your public IP address for P2P discovery in multiaddr format
beacon_announced_addresses: '"/ip4/xxx.xxx.xxx/tcp/3919"'

You can use the same information above for the ECDSA node if you wish, or you can generate a new address and keystore file.

Important! Ensure you have SSH key access to all your Pis that don’t require a password prompt.

You’re now ready to run the Ansible script! In the root of the repository, run this command:

$ ansible-playbook -i hosts.ini initial_setup_playbook.yml

Full end-to-end setup instructions

Download Ubuntu 18.04 64-bit for the Raspberry Pi here.

Image for post
Image for post

Download Balena Etcher to flash Ubuntu to your microSD cards.

Image for post
Image for post

We’ll be accessing the Raspberry Pis over SSH, so no need for an extra monitor and keyboard! All you need is to connect your Raspbery Pis to the network using ethernet cables.

After inserting all the SD cards, attaching all external drives, and connecting all the Raspberry Pis to the switch (which is then connected to your home network), you’re ready to power everything on.

When all the Raspberry Pis are done booting and lights are blinking, take note of the IP address for each Pi on your local network. You’ll need it in the next steps. Check out this article if you’re not sure how to do that.

SSH Setup

(This portion is taken directly from the repository readme)

From your host machine, open a terminal and type the command below. Replace the x’s with the IP address of the Pi.

$ ssh ubuntu@xxx.xxx.x.x

The default password is ubuntu. Upon successfully logging in you will be prompted to update the password.

Note: You will only be using password login during this step. For the rest of the guide, we will setup SSH keys and disable password login to increase security.

Go back to your host machine and type this command to see if you have a set of SSH keys already:

ls -al ~/.ssh

If you see something like id_rsa.pub you can skip the next step. If you don’t have an SSH key already for your host machine, generate an SSH key:

$ ssh-keygen

Follow the instructions. Note: This process will be faster if you don’t use a passphrase. (If you DO use a passphrase on your key, follow the instructions here for how to use ssh-agent to store your passphrase).

Use ssh-copy-id to copy the SSH keys to your Raspberry Pis:

$ ssh-copy-id ubuntu@xxx.xxx.xxx.xxx

Once your keys are successfully copied, you should be able to login without using the password:

$ ssh ubuntu@xxx.xxx.xxx.xxx

When you are SURE you can login to each Pi using SSH without a password, move on to the next section. Ansible requires SSH access in order to work smoothly.

Running Ansible

On your home machine, you’ll need to install Ansible and jmespath to run the scripts. Follow the instructions here to download and install Ansible for your machine.

For jmespath, install using pip:

$ pip install jmespath

Next, clone the repository and navigate into the folder:

$ git clone https://github.com/syuan100/keep-pi-cluster && cd keep-pi-cluster

Make a copy of hosts.ini.example and remove .example :

$ cp hosts.ini.example hosts.ini

Edit hosts.ini file and add the IP addresses for your Pis from the initial setup, replacing the xxx.xxx.xxx.xxx's:

[master]; Put the IP address for your main Kubernetes serverxxx.xxx.xxx.xxx[nodes]; Put the IP addresses for the rest of the nodesxxx.xxx.xxx.xxxxxx.xxx.xxx.xxxxxx.xxx.xxx.xxx...

Copy your keystore file from Step 1 of this guide into the keystore directory. The folder structure should look like this:

keep-pi-cluster/
└ keystore/
└ UTC--2020-08-29T17-yourkeystorefile

Next, make a copy of group_vars/all.yml.example without the .example :

$ cp group_vars/all.yml.example group_vars/all.yml

Most of group_vars/all.yml can be left as default. Below are a list of variables you need to change:

# Use the endpoints provided from your Infura account
eth1_endpoint: https://ropsten.infura.io/v3/XXXXXXXXXXXXXXXXXXXXXXXX
ethereum_url: wss://ropsten.infura.io/ws/v3/XXXXXXXXXXXXXXXXXXXXXXXX
# storage_path will depend on how you plugged in your storage device. Use the command lsblk while logged into your Raspberry Pi to see what to change this value to.
storage_path: /dev/sda1
# Change storage_size to be LESS than the amount of external storage you have plugged into one Pi
storage_size: 30
# Change to your account address authorized for the random beacon
beacon_account_address: "0xAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
# The exact name for your keystore file
beacon_keystore_filename: UTC--2020-08-29T17-45-22.740497000Z--AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
# The password used to secure your keystore file
beacon_account_password: yourpasswordhere
# Your public IP address for P2P discovery in multiaddr format
beacon_announced_addresses: '"/ip4/xxx.xxx.xxx/tcp/3919"'

You can use the same information above for the ECDSA node if you wish, or you can generate a new address and keystore file.

You’re now ready to run the Ansible script! In the root of the repository, run this command:

$ ansible-playbook -i hosts.ini initial_setup_playbook.yml

It’ll take 20 min or so for all the steps to finish, and when you are done you can run this command to see if everything is working well:

sudo kubectl get all -o wide

You should see an output like below:

Image for post
Image for post

Monitoring

Netdata is the default monitoring solution at the moment. You can use your browser to navigate to port 19999 of your Raspberry Pis to see server metrics like CPU usage, memory usage, and much much more. You can also sign up for a free Netdata account to aggregate your cluster into one dashboard.

Image for post
Image for post

Port forwarding

To be a good peer and to improve performance, it is wise to forward our p2p ports to the public. However our load balancer uses node ports, so it’s not as straight forward as mapping port-to-port one-for-one:

keep-random-beacon: Router port 3919 → Raspberry Pi port 30011

keep-ecdsa: Router port 3920 → Raspberry Pi port 30012

For example, for the keep-random-beacon our load balancer service port is 30011. So when we open up and forward ports on our router, we open port 3919 and forward traffic to port 30011 on any one of our Raspberry Pis. The beauty of a load balancer is that any traffic that hits port 30011 on any node in our cluster will be directed to our pod running the random beacon. Neat, huh?

The abstraction may not be necessary, so feel free to edit the service files. However, you might need to reconfigure the kubelet to change the node port range.

Ok, but Will it Blend™?

Image for post
Image for post

Or rather, “Will it self-heal?”

You can conduct a simple test! First find out which node your workloads are running on. The command below on your master node will tell you which node the random beacon is running on:

$ sudo kubectl get pod -l app=keep-random-beacon -o wide
Image for post
Image for post

Next, SSH into the node running the workload, in my case it was rpi-03. Run the shutdown command and wait a minute for the node to shut off:

$ sudo shutdown
Image for post
Image for post

Back on your master node, check to see that the node is unresponsive:

$ sudo kubectl get nodes
Image for post
Image for post

After about a minute, you can run run the first command to see that the original pod is terminating and starting up a new pod on an available node:

$ sudo kubectl get pod -l app=keep-random-beacon -o wide
Image for post
Image for post

Success! And don’t forget to go and turn the node back on.

Troubleshooting playbooks

If at any point during the initial setup a particular step fails, you can either fix the error and replay the initial_setup_playbook.yml or you can use one of the playbooks under the playbooks directory to only replay a specific segment:

1_ubuntu_playbook.yml: Initial setup for the Raspberry Pi — including disabling password SSH login, updating host names, and updating apt.

2_k3s_playbook.yml: Setup kubernetes — including Pi specific settings, downloading and installing master and worker nodes, and setting hard and soft pod eviction limits.

3_glusterfs_playbook.yml : Setup replicated storage — including formatting the specified drive, creating an LVM group, installing and starting a GlusterFS volume.

4_services_playbook.yml : Run containers and setup services — including setting up GlusterFS persistent volumes, MetalLB load balancer, copying application specific config files, and deploying KEEP staking nodes.

5_netdata_monitoring_playbook.yml : Deploy Netdata to monitor your servers.

In addition, there are utility playbooks under playbooks/utils :

remove_glusterfs.yml : Removes GlusterFS and deletes all content on drives. Useful if you are having issues with replicated storage and need to start fresh.

update_deployment.yml : Similar to the services playbook, except it removes all relevant services before deploying new configuration.

update_kubelet_config.yml : If you wish to tweak kubelet parameters, you can run this script afterwards to apply the changes.

Common issues

“Ansible isn’t able to SSH into my Pis since it needs a passphrase.”

Please ensure that your SSH key doesn’t require a passphrase to login to your Raspberry Pis. If it does, you can use ssh-agent and ssh-add to help move things along. Or, generate a new SSH key that doesn’t require a passphrase.

“For GlusterFS the playbook says that /dev/sda1 is too small or cannot be found”

Sometimes, the external drive may have two partitions on it, a boot partition and a storage partition. For optimal performance, you should repartition and reformat the drive so that it only has 1 partition, otherwise use the lsblk while logged into your Raspbery Pi to see what device path your storage partition is on so you can change the storage_path variable to that inside group_vars/all.yml.

Final Considerations and future improvements

While this is creates quite a usable cluster on its own, there are a few improvements that I would like to make down the line. I also welcome anyone to open a PR themselves on the repo!

  • Cluster monitoring with Prometheus and Grafana: A cluster should be monitored and instrumented with alerting no matter how redundant it may be! Netdata as server monitoring is fine, but cluster level logs and metrics would be better.
  • k3sup integration: k3sup massively simplifies how k3s is deployed and also unlocks the ability to add nodes to a cluster easily and seamlessly (including multi-master deployments).
  • Hybrid cloud automation: While I don’t like seeing cloud services monopolizing blockchain infrastructure, the uptime they provide can’t be beat. Adding cloud support as a fail-safe can make our deployments more dependable. However, when deploying to the cloud, network security must be taken much more seriously due to the lack of a router acting as a natural firewall.

Thank you for reading! If you like what you see let me know in the comments below. Or, you can reach me in the Keep discord with the handle @alphamethod.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store