How We Manage Google Kubernetes Engine

Published in

Trendyol Tech

4 min readAug 24, 2022

Hello, this is Oguzhan from Trendyol; I am working as a Site Reliability Engineer. Today, We will be talking about how we create a Private Google Kubernetes Engine and connect through on-premise;

Motivation

If you missed the first part of the Trendyol GCP Network Transformation article, it would be good to read because I will be mention the network part frequently. Our main goals:

Managing Google Kubernetes Engine network by host-project (Known as Shared VPC).
Create Google Kubernetes Engine private cluster.
Managing Google Kubernetes Engine access permissions.
Access private cluster through Cloud VPN (From on-premise).

I am following the editorial guidelines from Google, Which you can access here. Following these guidelines is essential because there will be articles in Trendyol Tech -Medium about Google Cloud, and understanding all of these articles is critical. I want to give example:

More details can be found here.

1. Subnet for Google Kubernetes Engine From Shared VPC

Our last infrastructure looks like this, as you remember. So we need to new subnet for private Google Kubernetes Cluster.

Some of the datacenters are not supporting HA VPN, so that we had to create Classical VPN.

But there are lots of questions:

If I want to scale up the total number of nodes, what is the full limit of the node count? Is there any limitation?
How many pods and services will be the limit of the cluster?

For the first question, there is a table that shows the calculation of the total number by subnet range:

https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips

Also tables for service range and pod range

https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#cluster_sizing_secondary_range_pods

In summary, we need:

Primary IP address range for nodes [1].
Secondary IP address range for pods and services [1].

After adjusting the subnet, we need to attach our service project to the host project (Make sure you have sufficient permissions when trying to attach)[2].

2. Create Private Cluster

Our subnet is ready to go so that I can access the node’s internal IP from on-premise. So We can create a private Google Kubernetes Engine cluster[3], The essential points about a private cluster:

There is no public master endpoint.
There are no external IP addresses for nodes.
Internet access must go through Cloud NAT.
Master IPs must not overlap our on-premise environment.
The cluster network is coming from the host project through Shared VPC.
Must be open master authorization in the cluster and allowed on-premise range. (Allow for access from on-premise).

The Control plane is managed by Google’s separate VPC network than our VPC network. So Google automatically creates VPC peering network between managed environment and our environment. You can check this peering from the host project or with the gcloud:

gcloud container clusters describe CLUSTER_NAME

In this peering, we must be enable import and export custom routes for accessing from on-premise. [4]

3. Permissions

One of the critical points is giving permissions to the cluster to e-mail groups. We are managing permissions by e-mail groups. If we gave permission to personal emails instead of email groups, We had to give permission whenever a new co-worker wanted to access the cluster. If someone leaves, we had to unbind their permission from the project.

4. Access to Cluster From On-premise

Now we would like to access the cluster. But not each environment has the gcloud, and installing gcloud each time has a repetitive task, and we don’t want it. For that reason, we followed the best practice [5]. Then create a service account that has cluster access permission. Then make a kubeconfig so that we can easily access the cluster without the gcloud.

Bonus:

Somehow, may we need to ssh to nodes and check for network connectivity or other problems. As a default, Google Kubernetes Engine is using Container Optimized OS. In this OS you can not execute a command like different Linux versions. For example, when you try to execute ping command, command prompt will send an error. For debugging node issues, you can use toolbox. Toolbox has been pre-installed to nodes. The toolbox provides you a shell in a Debian chroot-like environment [6].
When test-team wanted to try locust on Google Kubernetes Engine they noticed that, network throughput not same as stand alone VM’s. Let’s clarify this problem in another article. So stay tuned!

Thanks for your reading.

[1]: https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips#user-managed

[2]: https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc

[3]: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#private_cp

[4]: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#cp-on-prem-routing

[5]: https://cloud.google.com/kubernetes-engine/docs/how-to/api-server-authentication#environments-without-gcloud

[6]: https://cloud.google.com/container-optimized-os/docs/how-to/toolbox