Troubleshooting Cloud SQL/AlloyDB Creation Error: “Failed to create a subnetwork”
The Issue
This post addresses an error encountered when creating a Cloud SQL or AlloyDB instance, which is caused by the exhaustion of the allocated address space for Private Service Access (PSA).
And the error message that you see for Cloud SQL is:
Failed to create a subnetwork. Couldn’t find free blocks in allocated IP ranges. Please allocate new ranges for this service provider.
You encounter this error when attempting to create a Cloud SQL instance with a private IP in a VPC network (shared or otherwise) using Private Service Access and there are a few common scenarios to consider:
The size of the allocated IP range for the private connection is smaller than /24
This is due to Cloud SQL needing to allocate a /24 subnet on the private connection and private connection allowing an allocated range as small as /30. So the IP allocation CIDR range should at least be /24 even though the CIDR range of /30 is allowed.
Creating instances on the same private connection with size /24 in different regions when PSA
Each region requires its own subnet on the private connection of size /24.
For Example
- You configured PSA with IP allocation size /24.
- Then you create an Instance in region A which succeeds.
- Following that you create its replica or another Instance in region B, the creation of instance in new region will fail.
- That is because each region reserves /24 CIDR IP range and you allocated IP range for PSA is also /24. Therefore there is no space available create a new subnet for another region.
So, In case you want to create Instances in different regions, Please make sure that IP allocation size for Cloud SQL/AlloyDB PSA is at least /23.
As required by internal flexibility, Cloud SQL currently doesn’t allow IP usage rate to be more than 60% for every subnet. As a result, we will be requesting new subnet automatically once the threshold is hit in all existing usable subnets.
You can fix the above discussed scenarios using below methods
- You can add an IP address range allocation to service connection.
- You can expand an existing IP address range allocation.
Recommendations
GCP recommends a /20 allocation as the starting point (default value)
- It provides more than enough resources to cover different combinations of instance types and regions.
- You need a separate /24 block for each region where you create instances. For example, if you have one instance in us-west2 and another in us-central1, you’ll need two /24 blocks within your private service allocation.
- Upon instance creation, if you opt for automatic IP range allocation, Google will typically provision a /20 block from the 10.0.0.0/8 address space. This doesn’t apply to AlloyDB
- Ensure the allocated PSA IP range does not conflict with any existing subnets in the User Cloud SQL VPC network.
The concepts discussed in this post are equally applicable to AlloyDB, which also utilizes Private Service Access. However, the error message for AlloyDB differs in this scenario. If AlloyDB creation fails due to this issue, you will encounter a less informative message stating “an internal error has occurred.”
AlloyDB Caveat
As of August 2024, AlloyDB allocates a maximum of /24 subnet per region from a single PSA IP allocation, regardless of the total size of that allocation. This means that AlloyDB does not currently automatically create additional subnets within a region, even if there’s enough space to do so.
Therefore:
- If you need AlloyDB in just one region, allocate a /24 PSA range.
- If you need AlloyDB in two regions (e.g., one primary and one secondary), allocate a /23 PSA range.
- And so on — for each additional region you need AlloyDB in, increase the size of the allocated PSA range by /24 accordingly.
Ensure that IP address range allocations for PSA don’t overlap with dynamic routes
In addition to the usual causes discussed above, there’s a less common scenario that can also result in this error message. I recently helped a customer who was facing the error.
Failed to create a subnetwork. Couldn’t find free blocks in allocated IP ranges. Please allocate new ranges for this service provider.
As an initial investigation, I wanted to determine if this was one of the previously discussed scenarios.
- Upon checking the PSA (Private Service Access) IP allocation, I found it to be 172.16.0.0/16. This ruled out the first two cases, as this allocation is larger than a /23 block.
- The customer had approximately 60 Highly Available (HA) instances across three different regions. This implies three separate /24 subnets were already reserved for these regions.
- Based on a public document, each HA instance can use up to 5 IPs. Therefore, the maximum number of IPs they could be using is around 300.
I then discovered a public document about allocation IP address. It emphasized the importance of ensuring that these allocations don’t overlap with dynamic routes, as Google Cloud doesn’t automatically check for such conflicts. Dynamic routes are learned through BGP, utilized by Cloud Router.
This issue typically arises due to improper IP and network management.
Google Cloud doesn’t check if IP allocation overlaps with dynamic routes (routes learned through BGP, such as through Cloud Router). For example: If you have 172.16.0.0/16 as IP allocation and there is a dynamic route learned through BGP with destination range 172.16.0.0/16, 172.16.0.0/16 as IP allocation the Set intersection (∩) of both IP ranges is not usable and can’t be assigned.
This concept can be confusing initially, so let’s break it down with an example. To illustrate the concept more effectively, let’s explore a specific scenario that could lead to this issue.
You may skip to the “Solution” section if you already understand the issue and the cause of that.
Real World Example
Keep in mind that there are numerous possible situations that could result in a similar outcome, and this is just one example.
Use this strictly for learning purposes
- We will connect two Google VPC networks in two different projects i.e. A and B
- One project and VPC network (A) will be the used for deploying Cloud SQL
- Other project and VPC network (B) where we will create a static route.
- We will create site to site VPN (with BGP) for connecting these two VPC networks A and B
- Then we will create a static route in Client Project VPC network (B) which would have same destination CIDR range as the one used as IP allocation range for PSA in Cloud SQL VPC network (A). In this example this is a useless route just to replicate this issue.
- Then we try to create Cloud SQL instance with private IP and we observe the issue “Failed to create a subnetwork. Couldn’t find free blocks in allocated IP ranges. Please allocate new ranges for this service provider.”
Network B could be an on-premise network or any other public cloud network as well.
Assumptions
- You have created two GCP projects
- You have created two VPC networks, one in each GCP project
- You have privileges to create VPN configuration on both projects
- You have privileges to create Cloud SQL and PSA configuration on Cloud SQL project
Read the Inputs
read -p "region : " REGION
read -p "projectid_csql : " CSQL_USER_PROJECT
read -p "vpcnet_csql : " CSQL_USER_VPC
read -p "projectid_other : " OTHER_USER_PROJECT
read -p "vpcnet_other : " OTHER_USER_VPC
read -p "SECRET : " SECRET
REGION — region for VPN gateways
CSQL_USER_PROJECT — Cloud SQL Project ID
CSQL_USER_VPC — Cloud SQL VPC network
OTHER_USER_PROJECT — Other Project ID
OTHER_USER_VPC — Other VPC network
SECRET — Secret used for the
Create the HA VPN gateways
- Place one HA VPN gateway in each VPC network.
- Place both HA VPN gateways in the same Google Cloud region.
### Create VPN Gateway
gcloud compute vpn-gateways create gw-${CSQL_USER_PROJECT} \
--network=${CSQL_USER_VPC} \
--region=${REGION} \
--stack-type=IPV4_ONLY --project ${CSQL_USER_PROJECT}
gcloud compute vpn-gateways create gw-${OTHER_USER_PROJECT} \
--network=${OTHER_USER_VPC} \
--region=${REGION} \
--stack-type=IPV4_ONLY --project ${OTHER_USER_PROJECT}
- Since we are connecting two VPC networks together by using a single tunnel between HA VPN gateways, this type of configuration is not considered to have high availability and does not meet the HA SLA of 99.99% availability.
Create Cloud routers
- Create a Cloud Router in each VPC network in
REGION
. - Choose any private ASN (64512 through 65534, 4200000000 through 4294967294) that you are not already using.
gcloud compute routers create router-${CSQL_USER_PROJECT} \
--region=${REGION} \
--network=${CSQL_USER_VPC} \
--advertisement-mode=CUSTOM --set-advertisement-groups=ALL_SUBNETS \
--asn=64512 --project ${CSQL_USER_PROJECT}
gcloud compute routers create router-${OTHER_USER_PROJECT} \
--region=${REGION} \
--network=${OTHER_USER_VPC} \
--advertisement-mode=CUSTOM --set-advertisement-groups=ALL_SUBNETS \
--asn=64513 --project ${OTHER_USER_PROJECT}
Create VPN Tunnels
- Configure a tunnel on each interface of each gateway.
- Match the gateway interfaces such that the tunnel on
interface 0
of the first gateway must connect tointerface 0
on the second gateway.
gcloud compute vpn-tunnels create tunnel-${CSQL_USER_PROJECT} \
--peer-gcp-gateway projects/${OTHER_USER_PROJECT}/regions/${REGION}/vpnGateways/gw-${OTHER_USER_PROJECT} \
--region=${REGION} \
--ike-version=2 \
--shared-secret=${SECRET} \
--router=router-${CSQL_USER_PROJECT} \
--vpn-gateway=gw-${CSQL_USER_PROJECT} \
--interface=0 \
--project ${CSQL_USER_PROJECT}
gcloud compute vpn-tunnels create tunnel-${OTHER_USER_PROJECT} \
--peer-gcp-gateway projects/${CSQL_USER_PROJECT}/regions/${REGION}/vpnGateways/gw-${CSQL_USER_PROJECT} \
--region=${REGION} \
--ike-version=2 \
--shared-secret=${SECRET} \
--router=router-${OTHER_USER_PROJECT} \
--vpn-gateway=gw-${OTHER_USER_PROJECT} \
--interface=0 --project ${OTHER_USER_PROJECT}
Create BGP sessions
- For each HA VPN tunnel, create an IPv4 BGP session
- Configure Cloud Router interfaces and BGP peers
gcloud compute routers add-interface router-${CSQL_USER_PROJECT} \
--interface-name=r-if-${CSQL_USER_PROJECT} \
--ip-address=169.254.0.1 \
--mask-length=30 \
--vpn-tunnel=tunnel-${CSQL_USER_PROJECT} \
--region=${REGION} --project ${CSQL_USER_PROJECT}
gcloud compute routers add-bgp-peer router-${CSQL_USER_PROJECT} \
--peer-name=bgp-if-${CSQL_USER_PROJECT} \
--interface=r-if-${CSQL_USER_PROJECT} \
--peer-ip-address=169.254.0.2 \
--peer-asn=64513 \
--region=${REGION} --project ${CSQL_USER_PROJECT}
gcloud compute routers add-interface router-${OTHER_USER_PROJECT} \
--interface-name=r-if-${OTHER_USER_PROJECT} \
--ip-address=169.254.0.2 \
--mask-length=30 \
--vpn-tunnel=tunnel-${OTHER_USER_PROJECT}\
--region=${REGION} --project ${OTHER_USER_PROJECT}
gcloud compute routers add-bgp-peer router-${OTHER_USER_PROJECT} \
--peer-name=bgp-if-${OTHER_USER_PROJECT} \
--interface=r-if-${OTHER_USER_PROJECT} \
--peer-ip-address=169.254.0.1 \
--peer-asn=64512 \
--region=${REGION} --project ${OTHER_USER_PROJECT}
Set advertised route priority
gcloud compute routers update-bgp-peer router-${CSQL_USER_PROJECT} \
--peer-name=bgp-if-${CSQL_USER_PROJECT} \
--advertised-route-priority=0 \
--project ${CSQL_USER_PROJECT} --region $REGION
gcloud compute routers update-bgp-peer router-${OTHER_USER_PROJECT} \
--peer-name=bgp-if-${OTHER_USER_PROJECT} \
--advertised-route-priority=0 \
--project ${OTHER_USER_PROJECT} --region $REGION
Add a Static route in other VPC network and advertise this route over BGP
- Create a Static route for VPC network B such that the destination CIDR is 172.16.0.0/16
- Advertise this CIDR range for the BGP session
gcloud compute routes create custome-route-${OTHER_USER_PROJECT}1 \
--destination-range=172.16.0.0/16 --network=${OTHER_USER_VPC} \
--next-hop-gateway=default-internet-gateway --project ${OTHER_USER_PROJECT}
gcloud compute routers update router-${OTHER_USER_PROJECT} \
--add-advertisement-ranges=172.16.0.0/16=custom2 \
--project ${OTHER_USER_PROJECT} --region $REGION
Configure private service access
- Configure Private service Access for Cloud SQL project
- Choose 172.16.0.0/16 as the IP allocation range
gcloud compute addresses create google-managed-services-${CSQL_USER_VPC} \
--global \
--purpose=VPC_PEERING \
--addresses=172.16.0.0 \
--prefix-length=16 \
--network=projects/${CSQL_USER_PROJECT}/global/networks/${CSQL_USER_VPC}
gcloud services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--ranges=google-managed-services-${CSQL_USER_VPC} \
--network=${CSQL_USER_VPC} \
--project=${CSQL_USER_PROJECT}
Create Cloud SQL
- Run the below gcloud command to create a Cloud SQL Instance with Private IP
gcloud sql instances create test-csql-instance-$(date +%d%m%Y) \
--project ${CSQL_USER_PROJECT} --network=$CSQL_USER_VPC \
--database-version=POSTGRES_16 --cpu=2 --memory=6GiB \
--zone=${REGION}-a --root-password=$PASSWORD \
--availability-type=ZONAL \
--database-flags=cloudsql.enable_auto_explain=on,max_prepared_transactions=3000,cloudsql.enable_pg_hint_plan=on,cloudsql.enable_pg_wait_sampling=on,cloudsql.pg_authid_select_role=postgres,pg_stat_statements.track=all,effective_io_concurrency=200,random_page_cost=1.1,autovacuum_naptime=1,autovacuum_vacuum_cost_delay=0,autovacuum_vacuum_cost_limit=1000,log_statement=ddl,track_io_timing=on,log_lock_waits=on,cloudsql.iam_authentication=on \
--enable-point-in-time-recovery --no-deletion-protection \
--edition=enterprise --storage-size=50GiB --no-assign-ip
- It will fail with “Failed to create a subnetwork. Couldn’t find free blocks in allocated IP ranges. Please allocate new ranges for this service provider.”
Solution
We can solve this issue using the below ways:
- Delete the route from the source VPC network (B) if those are unnecessary.
- If the route is required on the Other VPC network (B) and can not be deleted, make sure its is removed from the BGP session. In this case you can use below command.
gcloud compute routers update router-${OTHER_USER_PROJECT} \
--remove-advertisement-ranges 172.16.0.0/16 \
--project ${OTHER_USER_PROJECT} --region $REGION
After that, delete the failed Instance and create the Cloud SQL Instance again.
- While adding a new IP address range allocation to the service connection and expanding an existing IP address range allocation is possible solution, it may lead to inefficient use of IP space.
Important Note
Consider a scenario where you’ve already set up a PSA allocation and have a Cloud SQL instance running in your Cloud SQL VPC network. If you then create the problematic route in VPC network B, would you encounter the same error when adding a new Cloud SQL instance?
Not necessarily. If the new Cloud SQL instance is in a region where you already have a Cloud SQL instance, the existing PSA peering in the Cloud SQL VPC would have a corresponding route. This route would override/suppress any conflicting dynamic routes, preventing the issue.
However, if you try to create a Cloud SQL instance in a new region, it would fail. This is because the new subnet needed for the instance can’t be created if there’s a dynamic route that overlaps with the IP allocation for PSA.
Here is an example:
If you are not able to resolve Cloud SQL or AlloyDB creation issue with the discussed solutions, please create a support case with Google Cloud Support for further investigation.