Distributed database across multiple clouds with WireGuard and Netmaker

A CockroachDB setup across the tree major cloud providers.

Rui Grafino
Marionete
8 min readMay 18, 2021

--

Image by author

Multi-Cloud vs Across Cloud

The concept of multi-cloud refers to the act of using different providers, but not necessarily using all of them connected to each other.

Using more than one cloud provider can add some benefits for business purposes. If one cloud provider is offline, we could simply use another one, continue our business and process our workloads with minimal effort or downtime. We can escape some vendor lock-in. But there are also some potential downfalls, for example, difficult integrations due to specific configurations and implementations.

But what if on top of that we want to leverage those benefits and go beyond, to configure our own platform or a distributed system across multiple cloud providers seamlessly with single global network? Is this possible? Yes.

There are a lot of difficult challenges to overcome, starting with different separate networks, complex configurations, latencies, name resolutions, security and privacy-related issues.

As a single global network, all these issues are simply solved for us in each cloud provider individually, through the use of different VPCs, subnets, regions, zones. This is great, but can we simply abstract the network complexities and create a single secure network across different providers, on our datacenter, or even with hybrid deployments in an easy way?

Yes, there is WireGuard, a free and open-source communication protocol that implements encrypted virtual private networks (VPNs). It was designed with the goals of ease of use, high-speed performance, and low attack surface.

WireGuard is merged in Linux Kernel from version 5.6 onwards, but even previous versions of the Kernel can use it, simply by installing the tools as native packages.

Using a database in a Multi-Cloud scenario

We will demonstrate a distributed SQL database setup across multiple cloud providers. CockroachDB was chosen because its recent, distributed in nature, and with a global scope. Many other tools with a distributed architecture could be used on top of a private network, like storage, orchestration, message queuing, or processing.

Tools:

  • 4 Linux Virtual Machines, one from each cloud provider, Digital Ocean, AWS, GCP and Azure
  • Docker and Docker Compose
  • Netmaker, a tool to simplify WireGuard configurations.
  • CockroachDB setup in cluster mode.

It’s relevant to mention that this setup or even the WireGuard protocol has some potential security and privacy issues.

Configuring Netmaker

Netmaker is simply a tool that makes use of WireGuard under the hood and takes the complexity out of the user when creating secure network tunnels across nodes.

“If you have servers spread across multiple locations, data centers, or clouds, they all live on separate networks. This can make life very difficult. Netmaker takes all those machines and puts them on a single, flat network so that they can talk to each other easily and securely”

For this demonstration, we will use Ubuntu 20.04 LTS distribution, 2 CPU, and 8 GB Memory Virtual Machines, more than enough for the purpose. We will use one of these from a different cloud providers (AWS, GCP, Azure), each one with a public IP address.

We will use a controller machine on Digital Ocean, so we are in fact using 4 different cloud providers in the whole process. We could also add the Digital Ocean VM as a WireGuard network client node if we like, but I chose not to do so as a best practice because Netmaker could be used to manage many different networks, act as a Gateway and to configure DNS.

Installing WireGuard

WireGuard should be installed on all machines (AWS, GCP, Azure).

$ sudo apt update && sudo apt install wireguard -y

The remaining tools installations are just required for the Netmaker controller machine. We will use the Netmaker docker-compose to spin up the containers with the tools and NetMaker UI, so please do install docker and docker-compose.

But for it to successfully start we need to make some configurations to the systemd-resolved daemon to prevent it from binding to port 53, we will need the following:

$ systemctl stop systemd-resolved
$ systemctl disable systemd-resolved
$ vim /etc/systemd/resolved.conf
  • uncomment DNS= and add 8.8.8.8 or whatever is your preference
  • uncomment DNSStubListener= and set to “no”
$ sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

Installing Netmaker

$ git clone https://github.com/gravitl/netmaker.git
$ cd netmaker
$ vim https://github.com/gravitl/netmaker/blob/master/docker-compose.yml

Edit the file and on the line with BACKEND_URL just replace with your public IP on the BACKEND_URL: “http://your-pub-ip:8081" line.

$ docker-compose up -d

If all was properly set up, you just need to open your browser with your public IP and port as stated in BACKEND_URL. Configure a username and a password for the tool.

Now let’s create a network:

Simply choose a name and a valid private CIRD addressing range, I used 10.10.10.0/24

A new network is created:

The next step is creating access keys:

Install the agent on the remaining intended network client nodes following the instructions provided after the key creation form is submitted. In the demonstration, we will use a VM from each major cloud provider (AWS, GCP, Azure).

All nodes should be visible on our network now:

Let’s look under the hood

On Digital Ocean, in the controller machine we have the main WireGuard interface nm-default:

# wg show
interface: nm-default
public key: DUNEc/++fqgZ1Qof+uRDUQfKp48G+/nXUGvtKhhlaDE=
private key: (hidden)
listening port: 51821

On all other cloud providers:

On every other provider, Netmaker added a network interface called nm-across-cloud, and we should also see its peers.

# wg show
interface: nm-across-cloud
public key: k4RpNYpjT/bRCwHkKlakTCQwwT9karOnmR7O12l7wFs=
private key: (hidden)
listening port: 51821
peer: 5Y0UPiX2UfNWy6M2vE0m9TfN7yQ3thPUx067G5sEmC0=
endpoint: 54.219.150.147:51821
allowed ips: 10.10.10.3/32
latest handshake: 53 seconds ago
transfer: 156 B received, 392 B sent
persistent keepalive: every 20 seconds
peer: cRxIgl+QGPiVzjnon3ez1syBISUxM1Nn33y3xcIAL1k=
endpoint: 51.104.197.249:51821
allowed ips: 10.10.10.2/32
latest handshake: 53 seconds ago
transfer: 248 B received, 360 B sent
persistent keepalive: every 20 seconds

Now let’s ping and test connectivity inside the WireGuard network

The setup was not properly planned and the shared nature of the VMs, Network Tier, or Region selection clearly impact the latency between the WireGuard clients. But all these are not relevant for the purpose.

# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=153 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=153 ms
64 bytes from 10.10.10.1: icmp_seq=3 ttl=64 time=153 ms
...
# ping 10.10.10.2PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=64 time=143 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=64 time=143 ms
64 bytes from 10.10.10.2: icmp_seq=3 ttl=64 time=143 ms
...
# ping 10.10.10.3PING 10.10.10.3 (10.10.10.3) 56(84) bytes of data.
64 bytes from 10.10.10.3: icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from 10.10.10.3: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 10.10.10.3: icmp_seq=3 ttl=64 time=0.033 ms
...

Configuring CockroachDB

For the purpose of this demonstration, we will use CockroachDB in insecure mode, without certificates. This setup is not recommended for production, but it’s easier for a simple test while still having a full distributed cluster.

Download and install CockroachDB on all servers except on the Netmaker controller one.

$ wget -qO- https://binaries.cockroachdb.com/cockroach-v20.2.9.linux-amd64.tgz | tar xvz
$ cp -i cockroach-v20.2.9.linux-amd64/cockroach /usr/local/bin/

Your network configuration at the provider side must allow TCP communication on the following ports:

  • 26257, for intra-cluster and client-cluster communication
  • 8080, to expose your DB Console

To avoids the risk of consistency anomalies, the best practice to prevent clocks from drifting too far is to install NTP on each node. A clock that is out of sync spontaneously shuts down the service on that node. It is a usual requirement on distributed systems but is not explained here.

On each node:

Start the database with the main ip address on the advertise-addr and add the remaining nodes to the join argument.

Do the following on all 3 nodes with the required changes.

cockroach start --insecure \
--advertise-addr=10.10.10.3 \
--join=10.10.10.2,10.10.10.1 \
--locality=provider=azure \
-–background

After they are all started we must initialise the cluster on just a single node:

cockroach init --insecure

Use the Web UI:

CockroachDB has a UI with plenty of metrics and information, but since we are inside a private network we can’t reach it. Let’s port-forward the request from our public IP to the private network.

The destination IP should be the one from the machine you are logged in. You just need one machine to expose it, but for redundancy and resiliency, you can do it in all nodes.

#  echo 1 > /proc/sys/net/ipv4/ip_forward
# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 8080 -j DNAT --to-destination 10.10.10.2:8080
# iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

On your browser pointing to public IP port 8080 you should be able to reach the CockroachDB UI.

Testing the cluster

For simple testing purposes we are going to create a simple table after accessing the cli from any node:

cockroach sql --insecure 

Let’s create a database and table:

CREATE DATABASE cloud_database;
SHOW DATABASES;
USE cloud_database;CREATE TABLE cloud_database.cloud_table (
column1 int PRIMARY KEY,
column2 varchar,
column3 int
);

We can see many information on the user interface on any of the cockroach instances. It is irrelevant which one you connect because the database has peer-to-peer architecture.

Conclusions

This was a simple demonstration for educational purposes and does not fit a production scenario. It is a simple concept of VPN extended across multiple cloud providers, creating a secure tunnel where we can deploy and use a distributed system.

--

--