All About GKE – Part One | Building Block | Discovery | Planning

Omkar Nadkarni
Niveus Solutions
6 min readMay 16, 2024

--

GKE is one of the most popular services in GCP. GKE is a kubernetes platform offering from GCP. It is feature rich and has got a lot of plugins and external integrations.

Kubernetes in general is a microservice orchestration platform and widely used for many years now. GKE is a GCP’s Kubernetes platform offering. Below are the offerings from GCP in terms of GKE

GKE Standard :- Deploying and Managing worker nodes.

GKE AutoPilot :- Deploying and Managing only pods. Nodes are managed by GCP.

Network policy and Anthos Service Mesh to ensure security policies are defined.

Building Blocks of GKE

Reference for image :- https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture

Control Plane

Control plane is managed by GCP. We have the below options.

  • We can have a public or private endpoint for the GKE control plane or API server to connect. Recommended to have a private endpoint.
  • We can have HA enabled for the control plane.
  • Recommended to have an application secret encryption enabled so that etcd database is encrypted with a key from our kms project.
  • Maintenance of control plane nodes can be scheduled.

Data Plane

Data plane is something deployed in customers VPC. These are called worker nodes and created when we define and create node pools.

Node pool :- Node Pool are GKE worker nodes with same VM sizes. They can be part of one or more zones in one region. They can have node labels for scheduling pods on specific nodes.

Cluster auto scaling :- Cluster Auto scaling is possible with scaling by increasing or reducing the nodes of nodes as per the pod resource consumption.

Node auto provisioning:- Node Auto scaling creates a new node pool as per CPU and memory requirement.

POD auto scaler :- with HPA and VPA, pod can scale depending on the resource requirements.

Microservices

  • Stateful or Stateless application.
  • Applications should be containerized.

Networking CIDR Planning :-

GKE’s Networking is a VPC native networking which means, it needs to have the IP address as non-overlapping from other resources in the same VPC or connected network.

For Networking, we will need to plan for 4 separate subnets.

Node subnet :- This subnet is the primary subnet which provides IP address to the nodes and internal load balancer.

POD subnet :- This subnet provides an IP address for the pods running inside a GKE Cluster. This is a secondary subnet. This subnet is further divided and provided equally to each node to provide an IP address to the pods running on the nodes.

Service subnet :- This is a secondary subnet which provides an IP address to services running on the GKE Cluster.

Control plane subnet :- This is subnet which is required for peering with the control plane of the GKE.

One of the Savior while planning. CIDR ranges for GKE Cluster is this GKE Calculator.

While working with numerous clients, network planning has to be crucial and requires planning. For which we normally ask the queries as mentioned below however there a few critical one’s which decide :-

  • We need a super range CIDR.
  • We need to know how many pods should be running on each node.
  • This needs to be calculated in addition to the system pods that kubernetes needs to run like CNI pods, FluentD etc.
  • This also depends on the size of the node and the resource requirement of the machine. Larger the GKE node size more the no of pods.
  • HPA with the maximum number of pods to be running to take the peak traffic.
  • Need to add 25% to 35% for future growth of microservices.
  • Number of. services to be created.

Then we use the GKE calculator and decide the IP address.

Networking Ingress and Egress

For ingress,

  1. GKE has container native networking which is called as GKE ingress which helps expose services as ingress via application load balancer (L7). This is done through a network endpoint group.
  2. GKE ingress has to be defined with the routing path and if there are any additional features like TLS (Certificate), Backend timeout, CDN etc can be enabled.
  3. GKE can also work with nginx or other ingress controller.

For egress,

  • with VPC native cluster it can reach out to the internet directly using pod IPs. If there is a need for the traffic to come via node IP’s, it can be achieved via this snat (masquerade)feature.
  • For internet, it can leverage existing internet route either via cloud nat or via ngfw.

Cloud DNS Integration :- GKE can work with kube dns or it can integrate with cloud dns. Integration with cloud dns can help with scaling and also removes overhead of cluster wide hosted dns as it’s managed by google.

Storage :- There are storage CNI drivers which are supported by GKE. Depending on application requirement we can have :-

  • Persistent disk .
  • NFS.

Monitoring and logging :-

For Logging, all logs sent to std out from the containers are picked up by fluentd and sent to cloud logging.

For Monitoring, GCP has extensive dashboards which cover GKE node health, pod health, uptime etc.

Managed Prometheus can be used which can enable prometheus metrics and can be consumed via grafana.

Security :-

  • Application layer secret encryption will enable encryption of ETCD database with client KMS keys.
  • GKE Private nodes can ensure it is only accessible via jumphost post IAM access.
  • Gke control plane has authorized list which will ensure only allowed ranges can connect to it.
  • GKE Nodes have a vm feature enabled.
  • GKE nodes can be enabled with confidential vm for AMD processors.
  • GKE RBAC helps implement RBAC.
  • Workload identity helps map Kubernetes service account with IAM service account. Which enables applications running inside pods access google services using IAM service account.
  • Binary authorisation can enable operators to ensure only authorized images are deployed.
  • GKE Security posture can enable finding common misconfiguration and workload vulnerabilities.

Backup and Maintenance :-

Architecture Inputs :-

As architects, we try to ensure that GKE as a platform is going to provide all the necessary framework and foundations for running microservices in the most reliable, scalable and highly available manner along with security and SRE best practices baked into it.

In the endeavor to provide the best suite the client needs we will deep dive into the below things with the client. We would also try and understand the maturity level of the org in terms of their devops and SRE practices and build a custom solution for them.

Discovery Sheet :-

Below is a typical discovery sheet to understand the client requirement and map those with GKE.

  1. GKE Type
  • GKE Standard or Auto Pilot
  • Compliance requirement

2. Microservices

  • Microservices app code
  • Logging is sent to Std out or file
  • Stateful or stateless pods
  • Configmap
  • GKE Secret
  • Routing url and path
  • Certificate management
  • Healthcheck URL

3. Capacity Planning for Microservices

  • No of Microservices
  • Resource limits
  • HPA

4. Node Capacity (This will depend on the microservices planning as well)

  • Number of Node pools
  • Sizing – Machine types
  • Min and Max nodes
  • Spot instances
  • Node Taints and Tolerance

5. Network

  • CIDR planning
  • Ingress will be gke or nginx or any other
  • Routing paths and port number
  • Egress

CNI Driver Calico or Dataplane v2 (Cilicum) or any others

6. Security

  • GKE App Secret Encryption
  • CMEK or Google Managed Keys
  • Network Policy
  • RBAC

7. Storage Requirement

  • Persistent disk or Filestore
  • Storage class

8. Workload Planning

  • Namespaces
  • Replicas
  • Health Check URL’s
  • Single container or Multiple containers in pods.
  • Config Map
  • GKE Secrets or GSM (Google Secret manager integration)
  • Image Repository. Artefact repo or client has different choice
  • Daemonset or Agents
  • Deployment via Kubernetes Manifest or Helm Charts or Kustomize

9. CICD

  • Monitoring with Prometheus or Elasticsearch or Dynatrace or others?
  • Backup with backup for GKE or Velcro or other open source software

10. Alerting

  • Alerting with google cloud monitoring or with alert manager or elk for major events and incidents.

Conclusion

GKE has a lot of features and we have tried covering the basics of GKE, the discovery and the planning required and the best practices to ensure it works well.

Hope this was helpful.

Credits :- Thank you so much Ajay Anand for all the insight and review of the blog.

--

--

Omkar Nadkarni
Niveus Solutions

Omkar Nadkarni is a principal cloud architect passionate about technology and its impact on business. Has skillset around GCP, azure, AWS, devops and infra.