Trendyol GCP Network Transformation

Oguzhan Oyan
Trendyol Tech
Published in
6 min readMay 24, 2022

Hello, this is Oguzhan from Trendyol, I am working as Site Reliability Engineer. We will be talking about GCP Network Transformation:

Motivation

In this article, we will be talking about before and after network usage in Google Cloud Platform and how we fixed some of the problems. Our main goals were:

  1. Teams were using networks independently. So that each project has a default VPC network, and we wanted to manage all of VPC from one project as centralized.
  2. We did not want to use external traffic when teams wanted to connect Trendyol Datacenters.
  3. Each VPC has its own firewall rules; like VPC, we want to manage firewall rules as centralized.
  4. Extend the GCP network as a Datacenter extension.
  5. Serverless requirements for the future.
  6. Lastly and most importantly, all teams should be connected by a network.

I will explain our solution and approach by best use cases and requirements. However, before explanation, I would like to mention which tools we used in GCP.

Products and Approach We Used in GCP

  • Shared VPC
  • VPC Network Peering
  • Security Command Center
  • Organization Policy
  • Cloud VPN

1. Planning For Shared VPC

Our infrastructure includes 3 Data Centers, three different providers, four fabrics, seven regions, 300 cabinets, 4000 servers, 569 TB memory and 219 K CPU in use, 18058 VMs, 2150 Clusters, 3596 microservices, 1449 members, and one infrastructure. Also, you can check the details from here. We have started to work with the network team, and; We wanted to manage our network from our data center so that the network team has reserved subnets for GCP from an on-prem environment. But most important about network management is opening a new project for all connections, so what I mean:

We need a project for managing network infrastructure so that I will call this project as Host Project. Service projects are getting their subnets from Host project.

So, as the best use for using shared VPC, an essential requirement is a project for managing other projects network in an organization. But this network infrastructure is not sufficient. Every day Trendyol is growing in the Google Cloud Platform, and we need to take some action.

2. Organizational Policy

We need a policy because, as I said, every day, Trendyol is growing, and each project has its default VPC network. So, we decided to open an organizational policy; thus, each new project in the organization will have no default VPC network. We can force the team to use our network infrastructure.

But, this is not enough; why? Let’s think about that; we are managing network infrastructure, and one of the team members opened a VM through the host project, and this VM has external IP. The problem starts at this point because this external IP can lead to security vulnerabilities. On the other hand, most importantly, we move our data through the public internet. So, the first issue is that we can disable external IP creation through organizational policy, and we did:

Here is the link for all organization policies for the Google Cloud Platform. In addition, You can find the how-to guides at the end of the page.

PS: For organizational policy, you need to the organization in GCP, which means you have to open google workspace beside Google Cloud Platform. Thus you can manage your organization not only for GCP but also for other Google Products. These are the best benefits that I liked.

3. Cloud VPN

In chapter 2, I mentioned two issues. The first one was external IP and the second for public internet connection. We solved the first problem, but we had to create Cloud VPN for the second. But there was a problem because we had different 3 data centers. But there will be challenges because:

  • Team infrastructure is distributed. What do I mean? For example, one team can be located in three Data centers, so we have to connect this.
  • What about overlapping?
  • How can we divide the subnet for both existing and new projects?

For all of the reasons I mentioned above, We decided to open a VPN for each data center, and for all VPNs, we defined different subnets in the same region due to overlap.

So, what are the benefits of these infrastructure:

  • The first and most benefit is managing both on-prem and GCP networks from the host project.
  • We will have a chance to divide teams by data centers; for example: If one team is working on a just earth data center, this team will not interrupt other VPN networks. Also, We will have a chance to micromanagement. This means We don’t have to follow all networks, change anything, or route between networks.
  • We will have chance for VPC Peering.
  • Serverless VPC access. For example, We will have a chance to open a project for a serverless application and connect the Serverless VPC access network through cloud VPN network.

4. Shared VPC

Shared VPC is the hero of this article. After this network infrastructure, we can easily share the biggest subnet of the host project subnetwork with teams’ projects. If I can give an example, Let’s assume I have a 10.0.0.0/10 subnet for a VPN. Then, I can divide it into multiple subnets like:

Ref: https://www.davidc.net/sites/default/subnets/subnets.html

Any Cloud VPN and team network subnet will not be overlapping in that way; let’s check our diagram now:

5. VPC Peering

Last but not least, we had to consider team network communication. But we can solve this problem quickly, and our solution is VPC Peering. The teams that do not work in the same VPC network can easily pair VPC Peering:

Red dotted dash shows VPC peering

6. Firewall Rules

Finally, we have just 3 VPC networks, and any firewall rules easily will be inherited from other projects:

7. Routes

GCP can manage not only firewalls but also routes centrally. What is our cause? Some clouds may not support HA VPN, so we setup our tunnels with two different options;

Option 1: Configure two Classic VPN’s with setting the priority in static routing for HA.
Option 2: Configure HA VPN setup with BGP which GCP is supporting currently for the HA option.

So you have to open two Classic VPNs and then write a route to create a legacy HA.

7. Serverless VPC Access

Some of the teams use functions, and we wanted to manage serverless cloud functions networks, with that way, we can easily manage serverless VPC access through Shared VPC access. Special thanks Gökçe Sürenkök and Orkun Tasdan for helping and making it easy!

Thanks for your reading; stay tuned!

--

--