Enabling NetOps with GCP Network Topology

Gauravmadan
Google Cloud - Community
7 min readMar 15, 2022

--

If you are a Network Engineer and wondering how to adapt to new ways of network management in Google cloud platform Or If you are leading Network operations and wondering how to prove ‘networks’ as innocent Or if you are exploring tools that can help network troubleshooting in GCP Or if you are a Devops engineer who hate going to Network admins for small issues ; this blog is surely going to help you .

Time is the a precious commodity, and in this era of transformation ,we should be using precious hours of Network engineers towards solving larger business problems. So instead of leaving them struggling with multiple third party tools and eventually relying on their past experiences to solve network problems , we should be empowering them with appropriate tools that give them right insights into what’s going on in their network , which not only reduces mean time to resolve a issue but also help in taking proactive measures to avoid issues happening at first place .

Figure 1 : Common issues faced by a Network Engineer

At a very high level , if we look at entire tool sets which are used in Network operations , we can categorize them in 4 broad categories :

  1. Network Visibility Tools
  2. Network Diagnostics Tools
  3. Network Insights Tools
  4. Network Root Cause Analysis Tools

This blog is focused on ‘Network Visibility’ subject and how visibility tools come handy in honoring SLAs related to incident resolution etc. ‘Network Visibility’ refers to being aware of everything within and moving through your network. Hence a network visibility tool should be able to deliver on key tasks of keeping a constant eye on network traffic coming and leaving your network. It is correct to say that visibility tools by themselves do not directly solve network issues similar to monitoring / alerting tools , rather these are components that can help us when we monitor & troubleshoot. These visibility tools can also be a strong tool in the hands of DevOps / Application owners to visually get a lot of information of how network traffic is flowing to an application and hence reduce the dependency on Network Admins.

In Google Cloud Platform , network visibility, monitoring, and troubleshooting comes under a common solution named “Network Intelligence Centre” . Network Intelligence Centre (a.k.a. N.I.C.) is like a swiss army knife and has multiple modules to address common tasks of visibility , troubleshooting and monitoring specifically for Network. At the time of this writing , 4 modules under N.I.C. are generally available to all Google cloud customers . These are shown in below snip from GCP console-

Figure 2 : GCP ‘Network Intelligence Centre’

For the purpose of this blog , we will focus only on the ‘Network Topology’ module under N.I.C. Network Topology is a visualization tool that shows the topology of Virtual Private Cloud (VPC) networks, hybrid connectivity to and from your on-premises networks, connectivity to Google-managed services, and the associated metrics. Hence a user of Network topology module should get insights into following -

  • Actual traffic flows between various networking nodes and associated metrics
  • Evolution of network topology over time
  • Metrics charts over 6 weeks time period

Lets look at top use-cases which Network Topology module addresses for end customers -

Use-Case # 1 : Network Visibility

It is often said that a picture is worth a thousand words . In Networks ,a topology diagram is worth collecting data by issuing hundreds of commands. Think of a situation when as a network engineer / Application owner ; you wondered -

  • How is my traffic going out through Google Cloud Load balancer (GCLB) ?
  • Are all my EMEA users being served through GCE instances in EMEA regions? Or is traffic landing in the wrong region ?
  • Can I Understand topology and actual traffic flow?
  • How is my on-premise traffic coming to Google cloud ?

A Network Topology can give its user a high level view of how the entire network looks like and source and destinations of traffic flows.

Figure 3: High level view of GCP network topology

A high level network topology shown above tell us following -

  1. Users from all over the world are accessing workloads
  2. Users come to load balancer , which sends traffic to backend instances in GCP region
  3. The customer is connected from on-premise to GCP using a VPN
  4. Internal load balancing is used to send traffic from one GCP instance in one region to another GCP instance in another region.

This is a good visualization to start looking at how topology is structured . Let’s take this one step deeper . I wish to see traffic load balancing in action and confirm that end user traffic is landing in the correct region . The below snap confirms that amount of traffic coming from Americas is correctly load balanced to us-central1 workloads and traffic coming from EMEA is correctly sent to europe-west1

Figure 4: Insights into how global traffic is served by GCP workloads

Similarly , if I am troubleshooting a VPN issue and need to determine if all users of VPN are impacted or there are users successfully using VPN and sending traffic to GCP workloads , I can make use of same network topology to troubleshoot the VPN element as follows -

Figure 5: Insights into how on-premise traffic is landing in GCP project via VPN

The above topology is good enough to conclude that mentioned upstream / downstream data volume from on-premise is successfully landing on db-instance in one of the USA regions of GCP. Hence the effort can be appropriately directed towards troubleshooting why a few users are impacted rather then spending efforts on troubleshooting tasks like VPN tunnel negotiations , checking tunnel health etc.

Use-case # 2 : Network Troubleshooting

Network topology can be a good tool to address statements like “ something was working fine a day back , but now it is not working “ . To address these statements , we can check if network topology has changed in the last few hours / days / weeks. Since Network topology can show last 6 weeks of data, it can very quickly help to find out if user traffic flows / user traffic volume has shown significant change in last few days and hence can help in troubleshooting the issue faster

Also , the metrics shown by network topology can come handy as additional information during such troubleshooting cases. For example , during troubleshooting , if a user needs to see the trend of VPN traffic coming from on-premise / leaving GCP ton conclude if there was a sudden dip in traffic coming via VPN tunnel , the same can be shown on same network topology view -

Figure 6: Traffic statistics view

Use-case # 3 : Architecture and Cost Optimization

A lot of time , we need to address the needs of business to optimize the spend done on cloud. Network topology can come handy here to answer questions like -

  1. Are there infrastructure elements that are not getting any user hit ?
  2. Are there infrastructure elements that are generating a lot of cross-region traffic ?
  3. Can we look at localizing some of elements in case the majority of traffic is coming from one Geography only.

Answers to these questions not only address the cost optimization aspect of business need , but also present an opportunity to optimize the overall application architecture based on user traffic patterns seen for multiple weeks.

Use case # 4 : Compliance

Lot of time , NOC / SOC needs to submit topology views for a given time duration to meet audit requirements . Network Topology can be used to export the ‘state of network’ anytime in the last 6 weeks to meet such stringent tasks without relying on a 3rd party service . For example , a view of how much data traversed from each of region to ‘logging’ and ‘monitoring’ Google service can be shown as follows

Figure 7: Google services view in ‘Network Topology’

If auditor needs to see the detailed view of which service were accessed by GCP workloads in last few weeks , the same can be shown in a more detailed view as follows -

Figure 8: Breakup of Google services accessed by regional workloads

Closing Notes:

Network is one of the most crucial pillars that enables element to talk within GCP and also for the outside world to communicate with GCP . Hence this is one area , where we can’t be blindfolded and it will be in the interest of customers ,operators to get the maximum visibility of what’s going on in their Google cloud network. Who better can give you that visibility than a module of GCP itself . And guess what , no additional configurations or agents are required to use Network Topology. Hence it makes all possible sense to consume the network related information on a single dashboard and save a lot of time in viewing multiple logs or using third-party tools. In networks , time is of essence as every second of downtime is an impact to business and Network Topology enables us with required visibility to keep our house in order !

Stay tuned for next blog for one more powerful tool in hands of GCP Network operators

Disclaimer: This is to inform readers that the views, thoughts, and opinions expressed in the text belong solely to the author, and not necessarily to the author’s employer, organization, committee or other group or individual.

--

--