Deploy Apigee hybrid on-prem with VMware and pfSense
In this article, I will guide you through installing Apigee hybrid on-premise with a minimal set of hardware. The hybrid runtime needs a Kubernetes cluster, specifically an Anthos cluster on VMware (GKE on-prem).
Note: You can find more detail and instruction alongside configuration file examples in this GitHub repository.
Prerequisite
You will need admin access to a GCP account with Anthos/Apigee Hybrid Eval enabled. If it is not the case, contact the Google Cloud support team.
This guide assumes prior knowledge of:
- pfSense & network — routing, firewall, VLANs, VPN
- Virtualization Hypervisor and VMs management
- Kubernetes deployment and administration on bare metal
Before jumping into the installation, make sure that you are familiar with all the key points listed on the Anthos clusters on VMware installation overview.
Hardware
The CPU, RAM, and storage requirements for Anthos clusters on VMware and Apigee Hybrid are relatively high; you will need a pretty beefy workstation or multiple ESXi hosts.
I will be using a single Dell 7820 Workstation with the following specs:
- 2x Intel Xeon Gold 6150 CPU — 36 Cores, 72 Threads
- 128GB DDR4 ECC RAM
- 1.5 TB SSDs storage(1x NVMe and 2x SATA)
Local Network
Here is the network layout I will use:
The physical network is composed of four devices, interconnected with one ethernet cable (trunk):
- pfSense: Router, Firewall, Internet Gateway, Load balancer
- ESXi host: vCenter, Google DNS, Anthos WS & Clusters (Admin + User)
- managed switches: VLAN trunking
- workstation: your computer
Why do we need three subnets?
The Main subnet is freely NATed to the internet and we want to isolate it from the Anthos clusters we will deploy for Apigee.
The Admin & User subnets will be used for the Anthos admin and Apigee user clusters. Note that those two subnets will not be NATed to the internet. We will set up a site-to-site VPN tunnel between the pfSense router and the GCP account that will host the Apigee org. Only HTTPS traffic to Google private APIs IPs is allowed.
VPN Tunnel
The subnets hosting all VMs inside the VMware environment must talk to Google APIs for Anthos and Apigee management plane. Traffic will flow through a site-to-site IPsec VPN between Google Cloud and the on-prem cluster.
To achieve this, we need the following:
- GCP subnet with Private Google Access
- IPsec tunnel routing
- Google APIs DNS override
Setup local network
Now that our network layout is set let’s start the deployment. We will begin by defining the three subnets on the pfSense router.
VLANs
The LAN subnet should already be configured if you have a pfSense router deployed on your local network. So add the two new subnets with VLAN IDs 25 & 26. You can follow the Netgate docs about VLAN configuration.
Important: set tagged state for VLAN IDs 25 & 26 on your managed switch Ethernet ports. Check the documentation of your network switch to know how to do that.
IPsec Tunnel
You can find the Terraform code in the repository to deploy the required GCP resources. Generate a strong PSK for your tunnel. I chose 10.30.0.0/24 CIDR for the GCP VPC subnet.
On the pfSense side, you can follow the Netgate docs for IPsec Site-to-Site VPN Example with Pre-Shared Keys.
Firewall
To set up all the firewall rules, we need to define some static IPs and VIPs.
Next, declare the VIPs on pfSense:
Configure the firewall rules for the Admin subnet (VLAN 25) and User subnet (VLAN 26) as shown below:
Note that on the admin subnet, the Anthos admin workstation needs some additional rules for the installation. This VM will have the second IP of the subnet, 10.25.0.2 (pfSense uses the first).
Load Balancing
Anthos k8s clusters require some form of load balancing between nodes for the control planes and the ingress service. There are several possible implementations of load balancing available for the Anthos clusters.
Here I will use ManualLB for cluster load balancing with the HAProxy pfSense package.
Here is the expected resulting frontend on the pfSense side:
ESXi and vCenter
If you’ve not already done that, install VMware ESXi 7.0+ on your host(s). Your ESXi should now be available via the web interface on the main subnet from your workstation.
Next, we will deploy an embedded vCenter Server Appliance on the ESXi.
You can find the deployment template here:
mount VMware-VCSA-all-7.0.3-20150588.iso VMware
cd ./VMware/vcsa-cli-installer
cat ./templates/install/embedded_vCSA_on_ESXi.json
Copy the file and fill in the relevant configuration & credentials:
cd ./VMware/vcsa-cli-installer/lin64
./vcsa-deploy install embedded_vCSA_on_ESXi.json --accept-eula
At this point, you need to setup:
- ESXi: port groups — add the two subnet VLAN IDs 25 & 26
- vCenter: add your ESXi hosts
- vCenter: create a Datacenter & Cluster & Resource Pool
Google DNS
pfSense cannot serve different DNS Resolver configurations per interface/subnet, so we have to set up a custom resolver elsewhere.
I deployed a simple Ubuntu VM with unbound on the main subnet; it will be assigned IP 10.0.0.20 (as per our firewall rules).
Here is the minimal config:
local-zone: "googleapis.com" redirect
local-data: "googleapis.com A 199.36.153.8"
local-data: "googleapis.com A 199.36.153.9"
local-data: "googleapis.com A 199.36.153.10"
local-data: "googleapis.com A 199.36.153.11"local-zone: "gcr.io" redirect
local-data: "gcr.io A 199.36.153.8"
local-data: "gcr.io A 199.36.153.9"
local-data: "gcr.io A 199.36.153.10"
local-data: "gcr.io A 199.36.153.11"local-zone: "accounts.google.com" redirect
local-data: "accounts.google.com A 199.36.153.8"
local-data: "accounts.google.com A 199.36.153.9"
local-data: "accounts.google.com A 199.36.153.10"
local-data: "accounts.google.com A 199.36.153.11"
To check if your DNS server configuration is valid, this is the expected dig result from your workstation:
$ dig @10.0.0.20 +short googleapis.com
199.36.153.8
199.36.153.9
199.36.153.11
199.36.153.10
Anthos
At this point, your local network and VMware environment should be ready to begin the Anthos installation.
Admin Workstation
Before creating the admin workstation, enable the required APIs on your GCP account and prepare the service accounts and IAM roles for Anthos. You can use the Terraform code and configuration files in the repository.
Your workstation needs direct access to the following:
- VMware vCenter Server:
vCenter.local:443
- VMware ESXi host
- Anthos admin WS subnet (VM Net 25)
- Google APIs
Then create the VM:
./gkeadm create admin-workstation
Using config file "admin-ws-config.yaml"...
Running preflight validations...
- Validation Category: Tools
- [SUCCESS] gcloud
- [SUCCESS] ssh
- [SUCCESS] ssh-keygen
- [SUCCESS] scp- Validation Category: Config Check
- [SUCCESS] Config- Validation Category: SSH Key
- [SUCCESS] SSH key path- Validation Category: Internet Access
- [SUCCESS] Internet access to required domains- Validation Category: GCP Access
- [SUCCESS] Read access to GKE on-prem GCS bucket- Validation Category: vCenter
- [SUCCESS] Credentials
- [SUCCESS] vCenter Version
- [SUCCESS] ESXi Version
- [SUCCESS] Datacenter
- [SUCCESS] Datastore
- [SUCCESS] Resource Pool
- [SUCCESS] Folder
- [SUCCESS] Network
- [SUCCESS] DatadiskAll validation results were SUCCESS.Reusing VM template "gke-on-prem-admin-appliance-vsphere-1.10.3-gke.49" that already exists in vSphere.
Creating admin workstation VM "gke-admin-workstation"... DONE
Waiting for admin workstation VM "gke-admin-workstation" to be assigned an IP....
DONE******************************************
Admin workstation VM successfully created:
- Name: gke-admin-workstation
- IP: 10.25.0.2
******************************************
You should now be able to SSH onto the Anthos admin workstation and proceed with the cluster setup.
Anthos Clusters
We are now at the step Create an admin cluster of the Anthos documentation. You can use the configuration files available in the repository.
Do not forget to update the config files with your settings before using them; they are provided in the repository with IPs and ports as shown in this article.
Admin cluster:
gkectl check-config --config admin-cluster.yaml
gkectl prepare --config admin-cluster.yaml
gkectl create admin --config admin-cluster.yaml
User cluster:
gkectl check-config --kubeconfig kubeconfig --config user-cluster.yaml
gkectl create cluster --kubeconfig kubeconfig --config user-cluster.yaml
After that, you can register the two Anthos clusters in the Google Cloud console:
You can use the kubeconfigs generated on the Anthos admin workstation during cluster creation to troubleshoot clusters
Apigee Anthos Service Mesh (ASM)
The next step before installing the Apigee runtime is ASM. Since the Anthos admin workstation only has limited access to Google APIs through the VPN tunnel, we will build an offline bundle.
Again you can see instructions and files in the repository.
On a computer with internet access (Linux), install asmcli
following the instructions in Install required tools and create the offline package:
./asmcli build-offline-package -D asm-files -v
tar -czvf asm-offline.tar.gz asm-files
Upload archive to Anthos admin workstation
scp asm-offline.tar.gz ubuntu@10.25.0.2
Back on the admin workstation, authenticate with a privileged service account on gcloud
CLI:
gcloud config configurations create install
gcloud auth activate-service-account vmware-install@_YOUR_GCP_PROJECT_ID_.iam.gserviceaccount.com \
--key-file _YOUR_GCP_PROJECT_SA_KEY_.json \
--project=_YOUR_GCP_PROJECT_ID
Unarchive the offline bundle and install ASM with the provided file:
curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.13 > asmcli
chmod +x asmclitar -xzvf asm-offline.tar.gz./asmcli install \
--kubeconfig /home/ubuntu/gke-apigee-user-cluster1-kubeconfig \
--fleet_id apigee-hybrid-vmware \
--output_dir asm-files \
--platform multicloud \
--enable_all --ca mesh_ca \
--custom_overlay overlay.yaml \
--option legacy-default-ingressgateway \
--offline -v
Apigee runtime
At this point, you need to set up the Apigee management plane inside your GCP project. Follow Part 1: Project and org setup guide.
Let’s now jump to the Hybrid runtime setup.
Note: Cert-manager should already be installed as part of Anthos bootstrap.
You can use the repository as a boilerplate and follow all the steps in the GCP docs.
If everything goes well, you will have a working Apigee runtime inside your Anthos user cluster:
Time to test!
Deploy a hello world API Proxies in the Apigee cloud console
Now from your workstation or any other device on your network, you can request the Apigee ingress on the VIP 10.26.0.51:
$ curl -k https://YOUR_DNS_DOMAIN/helloworld --resolve "YOUR_DNS_DOMAIN:443:10.26.0.51"
Hello world
If you followed this article up to this point and have a working setup, congratulations! If you have any questions, feel free to reach out to me.
As a next step, you could expose the Apigee ingress on the public WAN IP of your pfSense, using some more HAProxy configuration.