Using a private network in Google Cloud VPC

This post is part of a series of posts covering security on Google Cloud for data engineers.

Why use a private network?

It is standard practice to not give public IP address to servers that don’t need to be exposed to the public internet. In all of the large customers I work with, enterprise security policies require the use of private networks. This is part of a defense in depth policy and is absolutely necessary when sensitive data is involved. Private networks enjoy the benefit of better performance when accessing Google Cloud Storage and other Google Cloud services. Traffic over a public IP address can result in unnecessary egress data charges.

There is a per-region external IP address quota that defaults to 8. While it’s possible to request this to be raised, IPv4 addresses are not an infinite resource so any serious usage of a large number of instances will use private addresses only.

How to use private networking

Google Compute Engine, Cloud Dataproc and and Cloud Dataflow all offer command-line flags to disable public addresses. You always need to specify a network and usually subnet. Cloud SQL instances run in a project managed by Google and private networking support is implemented by peering the Google-managed network with your own. In the future when other services need private addresses, they will use the same peered network.

Cloud BigTable, Google Cloud Storage and Google BigQuery are all cloud-scale services served from many IP addresses. Google’s software-defined networking stack automatically routes traffic to these services if you’ve enabled private Google access on your network. This works even when there is no route to the public internet.

Replacing the default network

Google cloud projects come with a default network which is typically not used in production. The default network comes with default subnet IP ranges. The default networks of two projects can’t ever be peered together because the default subnets overlap.

The first thing I do after creating a new project is delete the default network and create a new network with private access to Google services configured.

gcloud --project=myproject compute networks create data --subnet-mode custom

After that it’s necessary to create a subnet

gcloud --project=myproject compute networks subnets create data --network data --range 10.1.0.0/16 --enable-private-ip-google-access --region us-east1

This subnet has private IP Google access enabled. This is not enabled for the default network and is necessary to enable requests to Google Cloud Storage and other Google services to avoid being routed through the default gateway and through a public IP address which will incur egress charges. The subnet has 65532 usable IP addresses, which should be plenty to support several large Dataproc clusters.

Organization policy preventing public IP addresses

Most organizations apply an org policy that prevents users from assigning public IP addresses.

Launching GCE VM Instances with the private network

Each time you launch a GCE instance or Dataproc cluster, you’ll need to specify the correct network and subnet. Deleting the default network makes this a lot easier.

Large organizations always use Shared VPC

If you’re in a large organization, it’s virtually guaranteed that there is a network admin team that’s created what’s called a Shared VPC, meaning there is a GCP project that contains the network you’ll need to use when launching anything. The VPN connecting your corporate network to Google Cloud is only connected to the Shared VPC network, so you won’t be able reach any instances in other networks. This is yet another reason to delete the default network.

Other Posts in this Series

  1. Creating an SSH Bastion host in Google Cloud VPC
  2. Options for managing SSH Access on Google Compute Engine
  3. Using Cloud Dataproc with private networking
  4. Using a web portal or Google Cloud SDK wrapper library to launch Cloud Dataproc clusters and submit jobs
  5. Enabling Two-Factor Authentication for SSH on Google Compute Engine
  6. Enabling Kerberos on a Google Cloud Dataproc Cluster