Completely Private GKE Clusters with No Internet Connectivity

Andrey
Google Cloud - Community
7 min readMay 9, 2019

There are several reasons to isolate your Google Kubernetes Engine (GKE) clusters from internet access, the primary one being security. For many financial, government, and similar institutions this is a must to run their workloads on Google Cloud Platform (GCP).

This post will walk through the steps needed to stand up an isolated GKE cluster in a network that blocks any access to the internet.

TL;DR

  1. Create VPC and subnets for GKE cluster with private Google access enabled
  2. Lock down VPC with firewall rules blocking egress to 0.0.0.0/0, allowing ingress from Google health checks, and allowing egress to Google health checks, restricted APIs, and GKE private master ranges.
  3. Remove default route automatically created in VPC (0.0.0.0/0 with default internet gateway as next hop). Create route to reach Google restricted APIs (199.36.153.4/30) through a default internet gateway
  4. Make the following Cloud DNS changes and attach the zones to the VPC:
  • Create private DNS zone googleapis.com with a CNAME record to restricted.googleapis.com for *.googleapis.com and A record to 199.36.153.4/30 for restricted.googleapis.com
  • Create private DNS zone gcr.io with a CNAME record to gcr.io for *.gcr.io and A record to 199.36.153.4/30 for a blank gcr.io DNS name

5. Create GKE cluster with private nodes and private master

The Environment

Step by Step Deployment

Terraform Code Available Below

1. Create the VPC and Subnets

Before we create any clusters, there needs to be a VPC and subnets in the environment to use for the cluster. There are three subnets that go into each cluster:

  • Node Range: The GKE worker nodes live on this subnet
  • Cluster Range: GKE takes this range and divides it among the nodes. By default each node gets a /24 from this range, but that can be customized by using flex pod CIDR.
  • Services Range: This is used for cluster IP services running within the cluster

The cluster and services ranges are configured as secondary ranges to the node primary range. The secondary ranges are used to configure alias IP’s on the cluster, which makes for a better SDN integration. Using alias IP subnets is recommended over other methods of deployment for these reasons.

gcloud compute networks create gke-no-internet-network \
--subnet-mode custom \
--bgp-routing-mode global
gcloud compute networks subnets create priv-cluster-01 \
--network gke-no-internet-network \
--range 10.10.10.0/24 \
--region us-central1 --enable-flow-logs \
--enable-private-ip-google-access \
--secondary-range services=10.10.11.0/24,pods=10.1.0.0/16

2. Create Firewall Rules

With the VPC created let’s lock it down to prevent any type of access to the internet. Since by default all egress traffic is allowed, we will need to create a rule blocking this traffic.

gcloud compute firewall-rules create deny-egress \
--action DENY \
--rules all \
--destination-ranges 0.0.0.0/0 \
--direction EGRESS \
--priority 1100 \
--network gke-no-internet-network

Next let’s allow ingress/egress to Google IP’s that perform health checks. These need to be allowed for GCP to recognize that the nodes have spun up successfully and for any future load balancer or ingress services.

gcloud compute firewall-rules create allow-healthcheck-ingress \
--action ALLOW \
--rules tcp:80,tcp:443 \
--source-ranges 130.211.0.0/22,35.191.0.0/16 \
--direction INGRESS \
--network gke-no-internet-network
gcloud compute firewall-rules create allow-healthcheck-egress \
--action ALLOW \
--rules tcp:80,tcp:443 \
--destination-ranges 130.211.0.0/22,35.191.0.0/16 \
--direction EGRESS \
--network gke-no-internet-network

We’ll also need to allow traffic to the restricted Google APIs VIP. This is used to communicate with Google services and is required to reach the GKE API.

gcloud compute firewall-rules create allow-google-apis-egress \
--action ALLOW \
--rules all \
--destination-ranges 199.36.153.4/30 \
--direction EGRESS \
--network gke-no-internet-network

This next egress rule will allow traffic from the GKE worker nodes to the master node that will be created in step #5. We have to create this rule before creating the cluster, otherwise the worker nodes will never be able to reach the master node. Only an egress rule is needed here, the ingress rule will be automatically created by GCP during cluster creation.

gcloud compute firewall-rules create allow-master-node-egress \
--action ALLOW \
--rules tcp:443,tcp:10250 \
--destination-ranges 172.16.0.0/28 \
--direction EGRESS \
--network gke-no-internet-network

Please Note: This VPC currently blocks all egress to 0.0.0.0/0. Any traffic going to internal IP space (10.0.0.0/8, 172.16.0.0/20, 192.168.0.0/16) will need an additional egress rule to allow the cluster to reach those IP’s

3. Remove Default Route and Create Route to Google APIs

When we created the VPC, it automatically creates a default route to the internet (0.0.0.0/0) with next hop as default-internet-gateway. We will need to remove this route to clear any paths to the internet. This is easiest done through the GUI.

Next let’s create a route in the VPC to send traffic to the restricted Google APIs subnet. We have to set the next hop as default-internet-gateway, as per the requirement for accessing private Google APIs.

Google doesn’t actually send the traffic to the internet, even though it says “internet-gateway”. The traffic is routed to take a private internal path to the Google APIs thanks to the private Google access feature that we enabled on the subnet.

gcloud compute routes create google-apis \
--destination-range 199.36.153.4/30 \
--next-hop-gateway default-internet-gateway \
--network gke-no-internet-network

4. Create DNS zone for Google APIs and GCR.io

Next step is to customize the DNS in our VPC for googleapis.com. Since googleapis.com resolves to public IP’s, we have to use restricted.googleapis.com. This url resolves to a specific range, 199.36.153.4/30, that is accessible within the GCP network (this is the same range we created a route to in the previous step). Our GKE worker nodes will need to use restricted.googleapis.com instead of googleapis.com to successfully launch.

This can be enforced using the private Cloud DNS feature within GCP. First we create a private zone for googleapis.com in Cloud DNS. After the zone is created, add a CNAME record for *.googleapis.com that points to restricted.googleapis.com. We will need one more record to make this work, an A record for restricted.googleapis.com pointing to the restricted VIP IP’s.

gcloud dns managed-zones create google-apis \
--description "private zone for Google APIs" \
--dns-name googleapis.com \
--visibility private \
--networks gke-no-internet-network
gcloud dns record-sets transaction start --zone google-apisgcloud dns record-sets transaction add restricted.googleapis.com. \
--name *.googleapis.com \
--ttl 300 \
--type CNAME \
--zone google-apis
gcloud dns record-sets transaction add "199.36.153.4" \
"199.36.153.5" "199.36.153.6" "199.36.153.7" \
--name restricted.googleapis.com \
--ttl 300 \
--type A \
--zone google-apis
gcloud dns record-sets transaction execute --zone google-apis

We will need to make similar settings for the GCR.io domain, to send traffic to restricted Google APIs subnet for reaching the container registry. Without these changes, the GKE cluster won’t stand up successfully since it won’t be able to pull down all the necessary containers.

gcloud dns managed-zones create gcr-io \
--description "private zone for GCR.io" \
--dns-name gcr.io \
--visibility private \
--networks gke-no-internet-network
gcloud dns record-sets transaction start --zone gcr-iogcloud dns record-sets transaction add gcr.io. \
--name *.gcr.io \
--ttl 300 \
--type CNAME \
--zone gcr-io
gcloud dns record-sets transaction add "199.36.153.4" "199.36.153.5" "199.36.153.6" "199.36.153.7" \
--name gcr.io \
--ttl 300 \
--type A \
--zone gcr-io
gcloud dns record-sets transaction execute --zone gcr-io

5. Create GKE Cluster

Now that we have a network and some subnets, let’s create the private GKE cluster. There are many knobs to tweak when creating a cluster, but we only care about a select few to create the worker and master nodes with only private IP’s. Ensure the following settings exist:

  • Enable VPC-native
  • Enable private cluster nodes
  • Disable accessing master using its external IP address
gcloud beta container clusters create private-gke-cluster \
--zone "us-central1-a" \
--enable-private-nodes \
--enable-private-endpoint \
--master-ipv4-cidr "172.16.0.0/28" \
--enable-ip-alias \
--network "projects/<project_id>/global/networks/gke-no-internet-network" \
--subnetwork "projects/<project_id>/regions/us-central1/subnetworks/priv-cluster-01" \
--cluster-secondary-range-name "pods" \
--services-secondary-range-name "services" \
--enable-master-authorized-networks

Private Cluster Deployed!

If everything goes as planned, the cluster should spin up with all the workloads and services in a healthy state. If you have any issues with this setup, please reach out by posting a comment below or via twitter and I’ll try to help! @AndreyK_

For more of my readings, check out my blog at https://andreynocap.com

Terraform

If you’d like to skip the steps above and get everything spun up right away, there are terraform config files available in the following Github repo:

https://github.com/andreyk-code/no-inet-gke-cluster.git

--

--