Deploy Apigee hybrid on-prem with VMware and pfSense

Arnaud Dezandee
8 min readOct 11, 2022

In this article, I will guide you through installing Apigee hybrid on-premise with a minimal set of hardware. The hybrid runtime needs a Kubernetes cluster, specifically an Anthos cluster on VMware (GKE on-prem).

Note: You can find more detail and instruction alongside configuration file examples in this GitHub repository.

Prerequisite

You will need admin access to a GCP account with Anthos/Apigee Hybrid Eval enabled. If it is not the case, contact the Google Cloud support team.

This guide assumes prior knowledge of:

  • pfSense & network — routing, firewall, VLANs, VPN
  • Virtualization Hypervisor and VMs management
  • Kubernetes deployment and administration on bare metal

Before jumping into the installation, make sure that you are familiar with all the key points listed on the Anthos clusters on VMware installation overview.

Hardware

The CPU, RAM, and storage requirements for Anthos clusters on VMware and Apigee Hybrid are relatively high; you will need a pretty beefy workstation or multiple ESXi hosts.

I will be using a single Dell 7820 Workstation with the following specs:

  • 2x Intel Xeon Gold 6150 CPU — 36 Cores, 72 Threads
  • 128GB DDR4 ECC RAM
  • 1.5 TB SSDs storage(1x NVMe and 2x SATA)

Local Network

Here is the network layout I will use:

The physical network is composed of four devices, interconnected with one ethernet cable (trunk):

  • pfSense: Router, Firewall, Internet Gateway, Load balancer
  • ESXi host: vCenter, Google DNS, Anthos WS & Clusters (Admin + User)
  • managed switches: VLAN trunking
  • workstation: your computer

Why do we need three subnets?

The Main subnet is freely NATed to the internet and we want to isolate it from the Anthos clusters we will deploy for Apigee.

The Admin & User subnets will be used for the Anthos admin and Apigee user clusters. Note that those two subnets will not be NATed to the internet. We will set up a site-to-site VPN tunnel between the pfSense router and the GCP account that will host the Apigee org. Only HTTPS traffic to Google private APIs IPs is allowed.

VPN Tunnel

The subnets hosting all VMs inside the VMware environment must talk to Google APIs for Anthos and Apigee management plane. Traffic will flow through a site-to-site IPsec VPN between Google Cloud and the on-prem cluster.

To achieve this, we need the following:

Setup local network

Now that our network layout is set let’s start the deployment. We will begin by defining the three subnets on the pfSense router.

VLANs

The LAN subnet should already be configured if you have a pfSense router deployed on your local network. So add the two new subnets with VLAN IDs 25 & 26. You can follow the Netgate docs about VLAN configuration.

Important: set tagged state for VLAN IDs 25 & 26 on your managed switch Ethernet ports. Check the documentation of your network switch to know how to do that.

IPsec Tunnel

You can find the Terraform code in the repository to deploy the required GCP resources. Generate a strong PSK for your tunnel. I chose 10.30.0.0/24 CIDR for the GCP VPC subnet.

On the pfSense side, you can follow the Netgate docs for IPsec Site-to-Site VPN Example with Pre-Shared Keys.

Firewall

To set up all the firewall rules, we need to define some static IPs and VIPs.

Next, declare the VIPs on pfSense:

Configure the firewall rules for the Admin subnet (VLAN 25) and User subnet (VLAN 26) as shown below:

Admin subnet rules
User subnet rules

Note that on the admin subnet, the Anthos admin workstation needs some additional rules for the installation. This VM will have the second IP of the subnet, 10.25.0.2 (pfSense uses the first).

Load Balancing

Anthos k8s clusters require some form of load balancing between nodes for the control planes and the ingress service. There are several possible implementations of load balancing available for the Anthos clusters.

Here I will use ManualLB for cluster load balancing with the HAProxy pfSense package.

Manual load balancing specs

Here is the expected resulting frontend on the pfSense side:

pfSense HAProxy frontends

ESXi and vCenter

If you’ve not already done that, install VMware ESXi 7.0+ on your host(s). Your ESXi should now be available via the web interface on the main subnet from your workstation.

Next, we will deploy an embedded vCenter Server Appliance on the ESXi.

You can find the deployment template here:

mount VMware-VCSA-all-7.0.3-20150588.iso VMware
cd ./VMware/vcsa-cli-installer
cat ./templates/install/embedded_vCSA_on_ESXi.json

Copy the file and fill in the relevant configuration & credentials:

cd ./VMware/vcsa-cli-installer/lin64
./vcsa-deploy install embedded_vCSA_on_ESXi.json --accept-eula

At this point, you need to setup:

  • ESXi: port groups — add the two subnet VLAN IDs 25 & 26
  • vCenter: add your ESXi hosts
  • vCenter: create a Datacenter & Cluster & Resource Pool
vSphere Client web interface

Google DNS

pfSense cannot serve different DNS Resolver configurations per interface/subnet, so we have to set up a custom resolver elsewhere.

I deployed a simple Ubuntu VM with unbound on the main subnet; it will be assigned IP 10.0.0.20 (as per our firewall rules).

Here is the minimal config:

local-zone: "googleapis.com" redirect
local-data: "googleapis.com A 199.36.153.8"
local-data: "googleapis.com A 199.36.153.9"
local-data: "googleapis.com A 199.36.153.10"
local-data: "googleapis.com A 199.36.153.11"
local-zone: "gcr.io" redirect
local-data: "gcr.io A 199.36.153.8"
local-data: "gcr.io A 199.36.153.9"
local-data: "gcr.io A 199.36.153.10"
local-data: "gcr.io A 199.36.153.11"
local-zone: "accounts.google.com" redirect
local-data: "accounts.google.com A 199.36.153.8"
local-data: "accounts.google.com A 199.36.153.9"
local-data: "accounts.google.com A 199.36.153.10"
local-data: "accounts.google.com A 199.36.153.11"

To check if your DNS server configuration is valid, this is the expected dig result from your workstation:

$ dig @10.0.0.20 +short googleapis.com
199.36.153.8
199.36.153.9
199.36.153.11
199.36.153.10

Anthos

At this point, your local network and VMware environment should be ready to begin the Anthos installation.

Admin Workstation

Before creating the admin workstation, enable the required APIs on your GCP account and prepare the service accounts and IAM roles for Anthos. You can use the Terraform code and configuration files in the repository.

Your workstation needs direct access to the following:

  • VMware vCenter Server: vCenter.local:443
  • VMware ESXi host
  • Anthos admin WS subnet (VM Net 25)
  • Google APIs

Then create the VM:

./gkeadm create admin-workstation
Using config file "admin-ws-config.yaml"...
Running preflight validations...
- Validation Category: Tools
- [SUCCESS] gcloud
- [SUCCESS] ssh
- [SUCCESS] ssh-keygen
- [SUCCESS] scp
- Validation Category: Config Check
- [SUCCESS] Config
- Validation Category: SSH Key
- [SUCCESS] SSH key path
- Validation Category: Internet Access
- [SUCCESS] Internet access to required domains
- Validation Category: GCP Access
- [SUCCESS] Read access to GKE on-prem GCS bucket
- Validation Category: vCenter
- [SUCCESS] Credentials
- [SUCCESS] vCenter Version
- [SUCCESS] ESXi Version
- [SUCCESS] Datacenter
- [SUCCESS] Datastore
- [SUCCESS] Resource Pool
- [SUCCESS] Folder
- [SUCCESS] Network
- [SUCCESS] Datadisk
All validation results were SUCCESS.Reusing VM template "gke-on-prem-admin-appliance-vsphere-1.10.3-gke.49" that already exists in vSphere.
Creating admin workstation VM "gke-admin-workstation"... DONE
Waiting for admin workstation VM "gke-admin-workstation" to be assigned an IP....
DONE
******************************************
Admin workstation VM successfully created:
- Name: gke-admin-workstation
- IP: 10.25.0.2
******************************************

You should now be able to SSH onto the Anthos admin workstation and proceed with the cluster setup.

Anthos Clusters

We are now at the step Create an admin cluster of the Anthos documentation. You can use the configuration files available in the repository.

Do not forget to update the config files with your settings before using them; they are provided in the repository with IPs and ports as shown in this article.

Admin cluster:

gkectl check-config --config admin-cluster.yaml
gkectl prepare --config admin-cluster.yaml
gkectl create admin --config admin-cluster.yaml

User cluster:

gkectl check-config --kubeconfig kubeconfig --config user-cluster.yaml
gkectl create cluster --kubeconfig kubeconfig --config user-cluster.yaml

After that, you can register the two Anthos clusters in the Google Cloud console:

Anthos cluster available in the GCP console
vSphere Anthos final deployment

You can use the kubeconfigs generated on the Anthos admin workstation during cluster creation to troubleshoot clusters

admin cluster kubectl
user cluster kubectl

Apigee Anthos Service Mesh (ASM)

The next step before installing the Apigee runtime is ASM. Since the Anthos admin workstation only has limited access to Google APIs through the VPN tunnel, we will build an offline bundle.

Again you can see instructions and files in the repository.

On a computer with internet access (Linux), install asmcli following the instructions in Install required tools and create the offline package:

./asmcli build-offline-package -D asm-files -v
tar -czvf asm-offline.tar.gz asm-files

Upload archive to Anthos admin workstation

scp asm-offline.tar.gz ubuntu@10.25.0.2

Back on the admin workstation, authenticate with a privileged service account on gcloud CLI:

gcloud config configurations create install
gcloud auth activate-service-account vmware-install@_YOUR_GCP_PROJECT_ID_.iam.gserviceaccount.com \
--key-file _YOUR_GCP_PROJECT_SA_KEY_.json \
--project=_YOUR_GCP_PROJECT_ID

Unarchive the offline bundle and install ASM with the provided file:

curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.13 > asmcli
chmod +x asmcli
tar -xzvf asm-offline.tar.gz./asmcli install \
--kubeconfig /home/ubuntu/gke-apigee-user-cluster1-kubeconfig \
--fleet_id apigee-hybrid-vmware \
--output_dir asm-files \
--platform multicloud \
--enable_all --ca mesh_ca \
--custom_overlay overlay.yaml \
--option legacy-default-ingressgateway \
--offline -v

Apigee runtime

At this point, you need to set up the Apigee management plane inside your GCP project. Follow Part 1: Project and org setup guide.

Let’s now jump to the Hybrid runtime setup.

Note: Cert-manager should already be installed as part of Anthos bootstrap.

You can use the repository as a boilerplate and follow all the steps in the GCP docs.

If everything goes well, you will have a working Apigee runtime inside your Anthos user cluster:

Time to test!

Deploy a hello world API Proxies in the Apigee cloud console

Helloworld proxy

Now from your workstation or any other device on your network, you can request the Apigee ingress on the VIP 10.26.0.51:

$ curl -k https://YOUR_DNS_DOMAIN/helloworld --resolve "YOUR_DNS_DOMAIN:443:10.26.0.51"
Hello world

If you followed this article up to this point and have a working setup, congratulations! If you have any questions, feel free to reach out to me.

As a next step, you could expose the Apigee ingress on the public WAN IP of your pfSense, using some more HAProxy configuration.

--

--