IP address management strategy — a crucial aspect of running GKE

Published in

Google Cloud - Community

5 min readJul 12, 2022

Is your organisation spinning up a new GKE instances for every application/environment ?
Is your application team asking you for /18 or /16 or /14 subnet CIDRs for pods?
Do the pods communicate with your on premise applications ?

Then you will soon run into issues with IP exhaustion. Don’t think so ? Let me tell you how !

Any Kubernetes networking model relies heavily on IP addresses.
Services, Pods, Containers, and Nodes communicate using IP addresses and ports.

When you are spinning up a GKE cluster in a VPC, the following subnets are needed :

Primary Subnet Range for your Worker Nodes
Secondary Range (Alias IP range) for Pods and Services
Control Plane CIDR range — for the master control plane node IP addresses in a private cluster

Let us take an example to understand this better:

Application requirement:

Number of microservices : 50
Each microservice requires CPU — 0.5, Memory -1GB RAM and minimum 3 to maximum 10 pods

How do we size the GKE cluster?

We will go ahead with a Regional standard GKE cluster as it provides High availability and protection from zonal failures. Why is a regional cluster better? Read more here.

The region we want to deploy to is to ‘asia-southeast1’

Total Resources Required:

Min CPU: 0.5 x 50 x 3 = 75 ; Min Memory: 1 x 50 x 3 = 150 GB
Max CPU: 0.5 x 50 x 10 = 250 ; Max Memory: 1 x 50 x 10 = 500 GB

The minimum cluster size will be:

Each Node Size : 8 vCPU 16 GB RAM
Number of nodes : 12
Node Type : N2

We will enable node auto provisioning(NAP) to optimise node autoscaling as load increases (for more details on NAP refer documentation):

The approx. maximum cluster size might be:

Each Node Size: 6 vCPU 12 GB RAM
Number of nodes: 24
Node Type: N2

Therefore the subnets required for the GKE cluster will be follows:

Primary Subnet Range for Worker Node: <RFC IP address range>/24
Secondary Range (Alias IP range) for Pod: <RFC IP address range>/18
Secondary Range (Alias IP range) for Services: <RFC IP address range>/20
Control Plane CIDR range -<RFC IP address range>/28

(Will soon write a detailed blog explaining why the above IP subnet was chosen)

The problem:

In Google Cloud , the Alias IP range is routable i.e you can directly make a call to the pod using it’s IP in the cluster from within the VPC [IP addresses allocated to k8 services exposed as ClusterIP type services are still routable only inside the cluster]. So if Pods must communicate with an on-premises network, IP addresses can overlap.
Therefore, the network team will need to plan the CIDR before allocation and will need to be aware of used/unused CIDRs blocks across the organisation.

This is where the problem arises , reserving a /14 /16 /18 CIDR range required by each cluster from your datacenter is not optimal.

The solution:

To solve the problem of running out of IP ranges we recommend the following:

RFC vs Non RFC:

Choose your primary subnet range for worker node and control plane CIDR range from the RFC 1918 range
Choose your secondary subnet range for pod and services from non RFC 1918 range

Since both the ranges (RFC and Non-RFC) still get exposed by the Cloud Router to your on-premises datacenter we use the IP masquerade agent to SNAT pod IP range with Node IP. This will help change the source IP address of outbound packets sent from Pods (SNAT) to Node IP addresses.

To enable this in your GKE cluster perform the following steps:

Check if the ip-masq agent is already installed in the cluster:

kubectl get daemonsets/ip-masq-agent -n kube-system

If the ip-masq-agent DaemonSet exists, then the output is similar to the following:

NAME    DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEip-masq-agent 3  3    3       3          3         <none>    13d

If the ip-masq-agent DaemonSet does not exist, then the output is similar to the following:

Error from server (NotFound): daemonsets.apps “ip-masq-agent” not found

Create a file ipmasq-cm.yaml with the below content:

apiVersion: v1
data:
  config: |
    nonMasqueradeCIDRs:
      - CIDR_1
      - CIDR_2
    masqLinkLocal: false
    resyncInterval: SYNC_INTERVAL
kind: ConfigMap
metadata:
  name: ip-masq-agent
  namespace: kube-system

CIDR_1 and CIDR_2 are the CIDR of the destination where you do not want the SNAT to happen. For every other destination the pod IP will be replaced with Node IP as soure on egress.

SYNC_INTERVAL A number of intervals after which the pod will sync with the configuration.

Deploy the file in your cluster

kubectl create configmap ipmasq-cm --namespace=kube-system --from-file=config=ipmasq-cm.yaml

Create an the ipmasq-agent DaemonSet ‘ipmasq-agent.yaml’ and deploy in the cluster:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ip-masq-agent
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: ip-masq-agent
  template:
    metadata:
      labels:
        k8s-app: ip-masq-agent
    spec:
      hostNetwork: true
      containers:
      - name: ip-masq-agent
        image: k8s.gcr.io/networking/ip-masq-agent:v2.7.0
        args:
            # The masq-chain must be IP-MASQ
            - --masq-chain=IP-MASQ
            # To non-masquerade reserved IP ranges by default,
            # uncomment the following line.
            # - --nomasq-all-reserved-ranges
        securityContext:
          privileged: true
        volumeMounts:
          - name: config-volume
            mountPath: /etc/config
      volumes:
        - name: config-volume
          configMap:
            name: ip-masq-agent
            optional: true
            items:
              - key: config
                path: ip-masq-agent
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
      - key: "CriticalAddonsOnly"
        operator: "Exists"

For more information on configuring IP masquerade agent, refer to GKE public documentation.

Things to keep in mind:

This feature is not supported with Windows Server node pools
Not applicable for containers in pods with spec.hostNetwork: true
The cluster’s Pod IP address range should not match, or is not within 10.0.0.0/8
For Autopilot clusters, Egress NAT policy needs to be deployed.

Read more about the GKE IP management strategy for choosing alternative network models in GKE, or feel free to reach out to me on the subject!

Please follow the Google Cloud Community for more such insightful blogs. Happy hacking!