Cloud Architecture Framework:

5 min readJan 9, 2023

Optimize cost: Compute, containers, and serverless

Service-specific cost-optimization controls for Compute Engine, Google Kubernetes Engine, Cloud Run, Cloud Functions, and App Engine. Many people know if we compare on-premise and cloud environments AS IS — on-premise will be cheaper by 30–60% but transferring to cloud on-premise standard is a really bad practice. You can do it as the first step of migration if you want to do it ASAP, but anyway, after that, you have to transfer your cloud resources to more cloud-friendly technologies. It will give you the best effort in price and performance scopes. In clouds, you can find the next abstraction layers: IaaS (Infrastructure as a service), SaaS (Software as a service), and PaaS (Platform as a service). GCP compute gives as next type of resources:

Compute Engine — most standard and most flexible compute resources type

Pluses: you can use it like a standard physical or virtual server, and usually needs minimum changes in your program and infrastructure scheme when transferring it from platform to platform.

Spot VM (preemptible) — if your application has short works periods and is stable for unexpected VM stopping — this type can give you up to 91% server price discount
Autoscale MIG or scheduled stop/start of VM — very effective when your load strongly changed during
CUD, SUD — disclosed before
Custom machine type — GCP gives you the possibility to use many VM families like (N1, N2, C2, M2, T2, etc.) additionally in general-purpose families you can customize of vCPU count and RAM size depending on your needs, it also can be changed any time with short (really short in GCP) boot period. Cloud Operations (Logging, Monitoring, Trace) and Recommendations will help you find underutilized (or overutilized) instances and resize them properly
BYOL (bringing your own licenses) — you might be able to reduce costs by bringing your licenses. As you know the price of the OS can be more than 60% percent of the GCP instance price and sometimes transferring your license to GCP can be price-effective

Minuses: You need a strong DevOps group for effective usage of this technology. HA and DR and part of security should be realized by yourself. Autoscaling is usually less effective than container-based technology.

Google Kubernetes Engine (GKE) — K8S is one of the most popular technologies today

Pluses: this technology is really simple and can be transferred from cloud to cloud and even on-premise. Anthos can help you to do it more effectively

Node auto-provisioning to extend the GKE cluster autoscaler (CA), and efficiently create and delete node pools based on the specifications of pending pods without over-provisioning.
Vertical Pod Autoscaler (VPA) together with CA helps you effectively regulate used GCP resource size
Spot VMs for Kubernetes node pools when your pods are fault-tolerant and can terminate gracefully in less than 25 seconds.
Choose cost-efficient machine types (for example: E2, N2D, T2D), which provide 20–40% higher performance-to-price.
GKE Autopilot to let GKE maximize the efficiency of your cluster’s infrastructure and minimize your DevOps overhead
GKE usage metering to analyze your clusters’ usage profiles by namespaces and labels. This way, you can control spending for your groups or customers on under VM and shared resources level
SUD and CUD were effective for the GKE nodes.

Minuses: You have to transfer your application to microservices architecture. Usually, it’s a really good direction but needs additional knowledge and time. It’s less flexible and has some soft or hard limitations. You need a well-done architecture review before starting GKE product implementation. GKE clusters can be multi-zone, but not multi-region

Cloud Run — Build and deploy scalable containerized apps on a fully managed platform.

Pluses: application can be written in any language (including Go, Python, Java, Node.js, .NET, and Ruby)

Fast stop/start time reduce paid time lost
You can regulate the size of the instances (CPU/RAM)
Consider using Cloud CDN or Firebase Hosting for serving static assets.
For Cloud Run apps that handle requests globally, consider deploying the app to multiple regions, because cross-continent egress traffic can be expensive. This design is recommended if you use a load balancer and CDN.
Set a limit for the number of instances that can be deployed.
Purchase CUD, and save up to 17% off the on-demand pricing for a one-year commitment.
Free tire — First 180,000 vCPU-seconds free per month, First 360,000 GiB-seconds free per month, 2 million requests free per month

Minuses: cold start can add seconds to the startup time, which is also billable. You can configure a minimum number of instances for quick application answers. When these instances are idle, they are billed at a tenth of the price.

Instance time is more expensive than VM or GKE
Less effective (than GKE) for big applications with intensive internal connections and logic

Cloud Functions — Run your code in the cloud with no servers or containers to manage with our scalable, pay-as-you-go functions as a service (FaaS) product.

Very effective for really short operations and does not need DevOps support, but if your workloads run constantly, consider using GKE or Compute Engine to handle the workloads.

App Engine — Build monolithic server-side rendered websites. App Engine supports popular development languages with a range of developer tools.

AppEngine has two types of environments, each with some pluses and minuses. Check it before starting your development on the platform:

Standard environment: Application instances run in a sandbox, using the runtime environment

Specific versions only of the supported programming languages: Python, Java,Node.js, Go, Ruby, PHP
Startup time — Seconds (good for rapid scaling)
Scale to zero - Yes
Background processes - No
SSH debugging - No
Size limit (CPU/RAM) - from 600 MHz/128 MB to 4.8 GHz/1024 MB
Pricing - Based on instance hours

Flexible environment Application instances run within Docker containers on Compute Engine virtual machines

Any versions of the supported programming languages: Python, Java,Node.js, Go, Ruby, PHP, .NET, or Custom runtimes (Any software that can service HTTP requests)
Startup time — Minutes (consistent traffic, experience regular traffic fluctuations)
Scale to zero - No, minimum 1 instance
Background processes - Yes
SSH debugging - Yes
Size limit - CPU: The number of cores; it must be one, an even number between 2 and 32, or a multiple of 4 between 32 and 80. RAM: memory_gb = CPU * [1.0–6.5] — 0.4
Pricing — Based on usage of vCPU, memory, and persistent disks
Maximum instances based on your traffic and request latency. App Engine usually scales capacity based on the traffic that the applications received. You can control costs by limiting the number of instances that can be created.
Benchmark your App Engine workload in multiple programming languages. Some of them need fewer instances and lower costs to complete tasks
To balance performance and cost, run an A/B test by splitting traffic between two versions, each with a different configuration. Monitor the performance and cost of each version

App Engine Flex application can be easily transferred to Cloud Run or GKE

Adopt and implement FinOps and Monitor and cost control

To be continued:

In the next parts:

Optimize cost: Storage

Optimize cost: Databases and smart analytics

Optimize cost: Networking

Optimize cost: Cloud operations