Understand and automate the GCP’s Network Cost Intelligence

Evgeni Dimitrov
Costimize
Published in
7 min readOct 19, 2021

Some things are absolutely necessary if you want to “survive” in a specific domain. You cannot live on Earth without air and water. You cannot play basketball without being able to dribble with the ball. You get the idea.

One of those in GCP are the VPC networks. VPC stands for Virtual Private Cloud and it’s a private, isolated, virtual network. A VPC is very similar to a classic physical network, the only difference is that it’s virtualized. You can setup private communication between your virtual machines in the cloud, firewall rules, routing, reserve IP addresses etc.

Every project created in GCP starts with a single VPC called “default” and all resources you create in this project are, by default, assigned to that network. You can delete or edit the default VPC and/or create other VPCs.

What happens in my network

VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as Google Kubernetes Engine nodes. These logs can be used for network monitoring, forensics, real-time security analysis, and expense optimization.

VPC Flow Logs is a feature that is used to capture the information about the IP traffic going to and from network interfaces in the VPC

Configure VPC Flow logs

Go to VPC Networks and choose the particular VPC to configure

The column Flow logs shows if the logging is already enabled for the particular subnetwork or not. The Flow logs are disabled for all subnetwroks by default.

If you’re a command line person you can use the following gcloud command to list the subnetworks in the current project:

gcloud compute networks subnets list --project <my_project> 
--network=”<my_vpc_network>”
--format=”csv(name,region,logConfig.enable)”

Click on any subnetwork that you need logging for, click Edit and set the Flow logs to On

or if you prefer the terminal:

gcloud compute networks subnets update <my_vpc_network> 
--region <subnet region e.g. europe-west1>
--enable-flow-logs

Keep in mind that enabling the flow logs will result in more logs stored in your cloud and additional cost at the end of the month.

After expanding CONFIGURE LOGS GCP will show the estimated logs size that will be generated after enabling the logs.

Analyse the Flow logs

You can view flow logs in Cloud Logging, and you can export logs to any destination that Cloud Logging Router supports.

Flow logs are aggregated by connection from Compute Engine VMs and exported in real-time.

Flow logs queries

The GCP Logging explorer (https://console.cloud.google.com/logs/query) can be used to submit queries for the stored logs in your projects.

All VPC logs are stored in a log named “projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows”

log_name="projects/<my-project_id>/logs/compute.googleapis.com%2Fvpc_flows"
timestamp>="2021-08-18T04:39:00.045Z"
timestamp<="2021-08-19T04:41:00.045Z"

a specific time interval can be filtered using the timestamp filed (mind the timestamp format)

Anatomy of VPC flow logs

The most interesting parts of the log are the jsonPayload and the resource.

{
insertId: "1ehyxczfjs64ah"
jsonPayload: {...}
logName:
"projects/costimize-test/logs/compute.googleapis.com%2Fvpc_flows"
receiveTimestamp: "2021-08-19T04:40:56.830292697Z"
resource: {...}
timestamp: "2021-08-19T04:40:56.830292697Z"
}

The resource part identifies the subnetwork that manages the connection.

resource: {
labels: {
location: "us-central1-a"
project_id: "my_project_id"
subnetwork_id: "187394017666466****"
subnetwork_name: "my_vpc_name"
}
type: "gce_subnetwork"
}

jsonPayload holds data relevant to the particular traffic pattern (see gcp traffic patterns)

jsonPayload: {
bytes_sent: “144”
connection: {
dest_ip: “**.**.15.232”
dest_port: 22
protocol: 6
src_ip: “**.**.178.89”
src_port: 37374
}
dest_instance: {
project_id: “my_project_id”
region: “us-central1”
vm_name: “gke-cluster-c1e7c5b3-kkmr”
zone: “us-central1-a”
}
dest_vpc: {
project_id: “my_project_id”
subnetwork_name: “my_subnetwork”
vpc_name: “my_network”
}
end_time: “2021–08–19T04:40:50.517441169Z”
packets_sent: “4”
reporter: “DEST”
rtt_msec: “0”
src_location: {
asn: 15169
continent: “America”
country: “usa”
}
start_time: “2021–08–19T04:40:50.512970350Z”
}

Flow logs traffic patterns

There are a couple of network activity types reported in the flow logs.

You can check all traffic patterns here. We will take a look at the two most common:

  1. VM to VM

Internal traffic between two VMs in a single vpc is logged having the following fileds in the json payload:

src_instance.*
dest_instance.*
src_vpc.*
dest_vpc.*

You can filter only the VM to VM logs with:

logName:(projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows)
jsonPayload.bytes_sent > 0
jsonPayload.src_instance.vm_name:*
jsonPayload.src_vpc.vpc_name:*
jsonPayload.dest_instance.vm_name:*
jsonPayload.dest_vpc.vpc_name:*

or specify any of the parameters to narrow the results:

logName:(projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows)
jsonPayload.bytes_sent > 0
jsonPayload.src_instance.vm_name:*
jsonPayload.src_vpc.vpc_name:*
jsonPayload.dest_instance.vm_name: <my_vm_name>
jsonPayload.dest_vpc.vpc_name:*

2. VM to external location

For flows between a VM and an external entity, flow logs are reported from the VM only:

src_instance.*
src_vpc.*
dest_location.*

Get the logs with

logName:(projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows)
jsonPayload.bytes_sent > 0
jsonPayload.src_instance.vm_name:*
jsonPayload.src_vpc.vpc_name:*
jsonPayload.dest_location.region:*

or filter by specific field

logName:(projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows)
jsonPayload.bytes_sent > 0
jsonPayload.src_instance.vm_name:*
jsonPayload.src_vpc.vpc_name:*
jsonPayload.dest_location.region: Virginia

VPC logs analytics

The format of the VPC logs is not the most convenient for analytics. For example, if you want to determine the regions that are responsible for the most traffic to your GKE clusters, this requires additional grouping of the logs.

You can export the logs to BigQuery and run queries on the exported datasets.

In the Logs Explorer, click Actions -> Create sink or just go to Logs Router in the Logging menu and click CREATE SINK

In step 1 — Sink details: type name and optional description.

In step 2 — Sink destination: choose BigQuery dataset and choose an existing dataset.

In step 3 — Choose logs to include in sink: You can provide a filter that specifies which logs to export. e.g.

logName:(projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows)

add other restrictions to the filter if necessary.

Or use the command line alternative;

gcloud logging sinks create my-vpc-bq-sink bigquery.googleapis.com/projects/<my_project_id>/datasets/
<my_dataset> --log-filter=’logName:(projects/<my_project_id>/logs/compute.googleapis.com%2Fvpc_flows)’

After saving the Sink check the specified dataset. There should be a new table containing the exported logs. The schema of the table is very similar to the structure of the JSON from the Loggs Explorer.

For example, if you want to track the VM instances that generate the most internal traffic, you can use the following query:

SELECT CONCAT(jsonPayload.src_vpc.vpc_name, ".", jsonPayload.src_instance.zone, ".", jsonPayload.src_instance.vm_name,
" -> ",
jsonPayload.dest_vpc.vpc_name, ".", jsonPayload.dest_instance.zone, ".", jsonPayload.dest_instance.vm_name) AS route,
sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `my_project_id.my_dataset.compute_googleapis_com_vpc_flows`
WHERE jsonPayload.src_instance.vm_name != ""
AND jsonPayload.dest_instance.vm_name != ""
AND jsonPayload.src_location IS null
AND jsonPayload.dest_location IS null
GROUP BY route
ORDER BY bytes_sent DESC

Finding the VM that generates the most Egress traffic:

SELECT CONCAT(jsonPayload.src_instance.project_id, ".", jsonPayload.src_instance.zone, ".", jsonPayload.src_instance.vm_name,
" -> ",
jsonPayload.dest_location.asn, ".", IFNULL(jsonPayload.dest_location.continent, "-"), ".", IFNULL(jsonPayload.dest_location.country, "-"), ".", IFNULL(jsonPayload.dest_location.region, "-")) AS route,
sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `costimize-test.MyBillingAccount.compute_googleapis_com_vpc_flows`
WHERE jsonPayload.src_instance is not null
AND jsonPayload.src_instance.vm_name != ""
AND jsonPayload.dest_instance IS null
AND jsonPayload.src_location IS null
AND jsonPayload.dest_location is not null
AND jsonPayload.dest_location.country != ""
GROUP BY route
HAVING bytes_sent > 0
ORDER BY bytes_sent DESC

Costimize Network Cost Intelligence

Costimize is a Google Cloud cost-optimization and governance platform. It provides tools to actively monitor, analyze and optimize your cloud.
All of the above methods and techniques are implemented in Costimize’s Network Cost Intelligence tools, which, combined with the other governance instruments gives you the complete toolbox to deal with Network investigation and optimization.
Costimize helps you automate your network intelligence processes in three simple steps:
— its Automation module allows you to turn on or off VCP flow logs
— the AI anomaly detection engine monitors your network spend in real-time and notifies you about strange activity
— the Network Cost Intelligence tools visualize and lets you deep dive into your network spend and topography

Costimize provides you with the convenient option to see the flow logs status for all your subnetworks.

You have the ability to permanently or temporarily enable them.
This allows you to take sample data and ensure that the flow logs will be disabled after certain amount of time.
On top of that, you can automate the process using the Costimize’s Automation module and create a schedule — for example, enable flow logs for one day every Friday.

Costimize Ai anomaly detection engine works by constantly analyzing your activity and cloud spend. It works for all GCP services, but due to the unpredictable nature of the VPCs, it proves to be very effective. It is able to detect anomalies in your network traffic and spend and report to you in real-time.

Using these insights, you can jump into the Network Cost Intelligence tool and visualize your traffic and spend.
You can easily see and filter VM to VM and VM to external location traffic along with its cost. You can effortlessly determine the top spenders and identify network breaches, unwanted traffic to external locations caused by viruses, miners, or unwanted applications running inside your cloud.

--

--