Network Visibility : Understanding ‘GCP to Internet’ Latency

Gauravmadan
Google Cloud - Community
9 min readJun 29, 2022

One of the most crucial aspects of running and maintaining a service in a cloud environment is the ‘network’ . Especially these days, when the revenue for many companies is more dependent than ever on reliable networks , network latency has become a common talking point. In a simple language , latency is the time it takes for data to pass from one point on a network to another. Most often, latency is measured between a user’s device (the “client” device) and a hosting server ( typically in a data center / public cloud ). This measurement helps application developers understand how quickly a web page or application will load for users. Technically, latency between a point A and point B is based on the light signal propagation time over a fiber link and additional delays that may be introduced by communication protocols, routing, encryptions, data transformation, etc.

If you are wondering how Internet latency can impact your user experience to access applications Or if you are a Network Operations engineer looking for a tool / dashboard that can report internet latency numbers in cloud world Or if you are a application developer who is not sure where to host the application such that users gets the best experience over internet ; then this blog will be a good read for you

When it comes to public clouds like GCP , a workload can be accessed using a Private network from on-premise using interconnects Or can be accessed over the Internet . The latency numbers from source location to workload location over open internet is always in discussion because of the nature of the Internet in various geographies . Google cloud platform has recently introduced an excellent way to report ‘ Google cloud to Internet endpoint ’ round trip time numbers for workloads hosted in GCP. This feature is present under ‘Network Intelligence Centre’ → performance dashboard and is explained in this blog .Public documentation can be found at this link. Let’s take a deep-dive into this . If you missed my last blog on “GCP performance Dashboard’’ , please feel free to have a look at this link .

The Google cloud to Internet endpoint RTT dashboard is located under “Network Intelligence” → Performance dashboard →Latency. Steps to browse this are shown below -

Image1 : Browse to ‘Google Cloud to Internet endpoint RTT’ on GCP dashboard

There are 2 variations of ‘Internet to Google cloud ‘ latency numbers -

  1. Statistics specific to customer project
  2. Statistics for all of Google cloud

Both the dashboards [ project specific as well as ALL Google cloud ] allows the users to select

  • Upto 5 regions where customer workloads are located
  • Networking Tier : standard Or premium
  • Duration of statistics (upto 6 week historical duration can be selected)

Hence in the drop down menu of “regions” under project specific selection , a customer will only see regions where he has active workloads deployed . Example -

Image2 : Customer workloads GCP locations

Now let’s say I need to see the internet latency from the United States and India to my workloads hosted in “Asia-east1” . I should be taking following steps -

Image3 : Filtering GCP to Internet RTT for USA
Image4 : Filtering GCP to Internet RTT for USA

Let’s have a deeper look into the data shown for Country = India

Image5 : Details for GCP to Internet RTT for India

The above snapshot reflects following -

  1. In the last 7 days , the workload in asia-east1 has received active user traffic of type TCP from India . If this was not the case ; the customer would not have seen any row with country = India
  2. The workload is accessed from 8 cities in the India.

Let’s look at detailed chart shown under “View chart”

Image6 : RTT details between GCP asia-east1 and India over Internet

The chart above shows the latency in the last 30 days . From this we can see that , there were a few durations where the workload received no user traffic (and hence no latency stats are recorded) . Also we can see that if we remove the exceptional spikes , the approximate latency is ~100 msec. However , there was an exceptionally high round trip time reported on one particular day.

As a application owner , if I need to find out which GEOs have accessed my workload in last few hours and what was approximate Round trip time , I can quickly make use of this dashboard as follows -

Image 7 : Which countries accessed application in europe-west1

Methodology followed

For the readers who are wondering the methodology used by GCP in reporting these statistics ; please read the following points to get your answers

  • GCP is not doing active probing for external latency but monitors real customer traffic (passive sampling). Therefore the points on the globe aren’t fixed and always depend on existing connections. Example — if there is a application hosted in asia-south2 , and accessed from 10 locations across the globe ; the statistics will report latency numbers from all these 10 locations to asia-south2 region of GCP
  • Since the reported numbers are for real user traffic , the number of rows in ‘Internet to GCP’ latency for a selected destination GCP region will signify the number of source countries accessing your application .
  • For Internet to GCP latency ; GCP passively monitors TCP traffic. i.e. it depends on the existence of TCP traffic between to end points to have a measurement. Hence if a tester is sending a lot of ping packets to a web server in GCP ; such flows will not be reported by ‘Internet to GCP’ latency statistics .
  • Since the RTT measurements are based on TCP traffic, a user has to send quite a few number of TCP packets per minute in order for latency to be reported .Sending one or two TCP packets won’t get the list populated
  • The numbers reported are RTT (round trip time)from VM->internet->VM

Let’s look at main customer use cases for Internet latency statistics reported by Performance dashboard for GCP

UseCase#1 : Day 0 : Chose between Standard Vs Premium network for workloads

Most of you might be aware that GCP offers its customers a flexibility to choose between “Premium Networking” tier and “Standard Networking” tier . Network Service Tiers lets you optimize connectivity between systems on the internet and your Google Cloud instances. Premium Tier delivers traffic on Google’s premium backbone, while Standard Tier uses regular ISP networks.

The Networking tier can be selected while creating a new GCP project Or the same can be applied at resource level (like VM , load balancer) . The network tier that you specify for a resource always takes precedence over the default network tier that you define for your project. For example, if your project’s default network tier is Premium, you can still create an instance or a load balancer in Standard Tier.

More details about GCP Network Tiers can be found at https://cloud.google.com/network-tiers/docs/overview

Figure 8: GCP Standard routing path
Figure 9: GCP Premium routing path

One of the points that helps in making a decision about Premium Vs Standard is the fact that how much better will the response be (in terms of latency) if a user decides to spend extra by choosing Premium Tier. Lets see how Performance Dashboard → Internet to Google cloud latency numbers helps in making this decision .

Let’s take an example . I am assuming that my customer’s workloads hosting web services will be hosted in europe-west2 (London). Customers are trying to balance cost Vs performance and hence are looking for latency numbers in case they opt for Premium Networking tier Vs standard networking tier . To unfold this story ; lets see the performance numbers for “all of Google cloud”. Execute following steps

  1. Select performance Dashboard . Navigate to Latency
  2. Change scope to “all of Google cloud”
  3. Traffic type = “ Google cloud to Internet ”
  4. Regions = “Europe-west2”
  5. Select Network tier as “Premium”
  6. Filter source city as “New Delhi” (example)

This reflects that latency numbers from Delhi to London on a Premium networking tier is ~180 msec in last 7 days

Image 10 : GCP Premium routing path statistics

Let’s do the same thing for the standard tier now . The statistics are as follows

Image 11 : GCP Standard routing path statistics

This reflects that latency numbers from Delhi to London on a Standard networking tier is ~200+ msec in last 7 days

Similarly the customer can see the results from cities where they are expecting major end clients .These statistics can therefore be useful in deciding which tier is appropriate to host the web-services VM on.

UseCase#2 : Day 2 : Monitor and Troubleshoot latency issues between Internet users and GCP workloads

The Internet to GCP latency dashboard has 2 great selections –

  1. User can select between your project Or all of Google cloud
  2. User can go back in history and check if these stats have been significantly deviated at a particular time of day

The below sample shows trend of Internet to GCP us-central1 region latency for last one hour from 2 countries for whole of Google Cloud

Image 12 : Stats for ‘whole of Google cloud’

Project specific stats for us-central1 workload access from same 2 countries -

Image 13 : ‘Project Specific’ Stats

Comparing the project specific data with all Google data can also help conclude if the reported latency is only specific to customer projects Or entire Google cloud projects reported the similar behavior . A look at two performance dashboards shown above for usa-central reflect that there is nothing abnormal in global GCP to Internet RTT numbers and the same numbers reported for customer specific project .Hence if a Network Administrator or Network Operations center is troubleshooting issues reporting high latency for internet users ; the similar graph can help conclude if there was an significantly deviated situation at reported time/day of incident

UseCase#3 : Quick stats reporting the geographies accessing customer workloads

As discussed earlier in this blog ; Latency is measured using TCP packets. For the per-project Performance Dashboard, latency is based on a sample of your actual traffic. For the Global Performance Dashboard, the latency data represents the median across all projects

Hence the project specific numbers will also be a good indicator of geographies which are accessing customer’s applications .See example below

Image14 : Snapshot of all geographies that accessed customer application over Internet

This shows that in the last 7 days , my resources in “asia-east1” are receiving traffic from shown GEOs / cities . Therefore , this is a good indicator to application owners in case they want to disable / restrict the access from a particular GEO using other GCP services like Cloud Armor

Closing Notes:

Internet round trip time from server to client plays an important role in determining the actual end user experience . Not everyone works in office and access GCP workloads using a private / dedicated connection and hence internet RTT is one of crucial factor that helps organizations determine where to host workloads and how much round trip time that can be called as ‘good enough’ for applications to be providing expected experience . GCP Network intelligence Centre’s feature “Google Cloud to Internet endpoint RTT” provides great insights , which not only help Network Administrators , but also DevOps , application developers & Cloud architects. This simple looking dashboard provides powerful insights which helps at each stage of the GCP journey .

Disclaimer: This is to inform readers that the views, thoughts, and opinions expressed in the text belong solely to the author, and not necessarily to the author’s employer, organization, committee or other group or individual.

--

--