Unveiling GCP Asset Inventory: A Comprehensive Guide to Querying VMs, Disks, and Source Images

Utkarsh Sharma
Google Cloud - Community
6 min readMay 16, 2024

Introduction

Google Cloud Platform empowers organizations with a suite of tools to manage cloud infrastructure efficiently. A cornerstone of this suite is the Asset Inventory, a powerful resource that provides a holistic view of your cloud assets, encompassing virtual machines (VMs), disks, and the source images they are derived from. In this guide, we’ll delve into the art of harnessing GCP’s Asset Inventory to query VMs, disks, and source images using distinct tables, offering insights into your environment and aiding in tasks like cost analysis, security audits, and resource optimization.

Understanding Asset Inventory

Before embarking on our querying journey, let’s solidify our understanding of the Asset Inventory. GCP’s Asset Inventory is a centralized repository of metadata about your cloud resources. It serves as a single source of truth, allowing you to discover, track, and analyze your cloud assets across projects, organizations, and even folders. By aggregating metadata, the Asset Inventory enables comprehensive reporting, risk assessment, and informed decision-making.

Why Query VMs, Disks, and Source Images?

The ability to query these interconnected resources opens up numerous possibilities:

  • Cost Optimization: Analyze which VM types consume the most resources and identify opportunities to downsize or switch to more cost-effective options.
  • Security Audits: Track the lineage of VM images to ensure they originate from trusted sources and haven’t been tampered with.
  • Capacity Planning: Assess the distribution of VM sizes and associated disks to plan for future growth or consolidation.
  • Troubleshooting: Quickly pinpoint VMs running on outdated or potentially vulnerable source images.

Step-by-Step Guide: Querying VMs, Disks, and Source Images

Accessing Asset Inventory:

  • Log in to your GCP Console.
  • Navigate to “IAM & Admin” -> “Asset Inventory” or simply search for it in the search bar.
  • Click on “Asset Query” to enter the querying interface.

Exploring the Tables

  • Familiarize yourself with the “Virtual Machines,” “Disks,” and “Images” tables in the Asset Inventory dashboard.
  • Each table offers a unique set of columns containing valuable information.
  • Use the _schema field to get detailed information about the columns and their data types.

Querying VM Instances

Navigate to the “Virtual Machines” section or table in the Asset Inventory dashboard. Here, you’ll find a comprehensive list of all VM instances within your GCP project. You can filter and sort these instances based on attributes like name, zone, and status. Additionally, examine the schema to gather more detailed information.

Retrieving Disk Information

After identifying the VM instance, proceed to the “Disks” section or table in the Asset Inventory dashboard. Here, you can access information about associated disks and query their details.

Writing Queries

Now that you’re familiar with the tables and schema in the asset inventory, let’s begin writing queries to obtain the desired outcome.

Our goal is to gather details about all VMs within the GCP Organization, including their associated disks and utilized source images. We’ll exclude default images and only show running VMs.

To begin, navigate to your GCP Organization, locate “Asset Inventory,” and access “Asset Query.” Then, proceed to paste the provided queries and click “Run” to execute them.

Step 1: Listing the VMs along with the attached disk

This query will display all VMs along with their attached disks retrieved from the instance table.

SELECT
name,
ARRAY(
SELECT AS STRUCT
d.source
FROM
UNNEST(resource.data.disks) AS d
) AS source
FROM
`compute_googleapis_com_Instance`

Step 2: Listing the disk and the source image

This query will display all disks along with the source images utilized, retrieved from the disk table.

SELECT name, resource.data.sourceImage
FROM compute_googleapis_com_Disk

Step 3: Listing the disk and the source image and exclude image with prefix

In this step, we aim to generate a query that will identify all disks along with their corresponding source images from the disk table. Additionally, we’ll filter out any source images that have a specific prefix, such as “abc”.

SELECT name, resource.data.sourceImage
FROM compute_googleapis_com_Disk
WHERE NOT CONTAINS_SUBSTR(resource.data.sourceImage, "images/abc")

Step 5: Use Joins to list Instance, Disk and the Source Image from two different tables

This query retrieves compute instances from compute_googleapis_com_Instance and their boot disk sources. It then joins this information with disk data from compute_googleapis_com_Disk to get the source image of each boot disk. The WHERE clause filters out instances whose boot disk source images contain the string "images/abc".

SELECT vms.name, vms.disk_source, disks.resource.data.sourceImage as source_image
FROM (
SELECT name, source[ordinal(1)] as disk_source
FROM (
SELECT
name,
ARRAY(
SELECT AS STRUCT
d.source
FROM
UNNEST(resource.data.disks) AS d
) AS source
FROM
`compute_googleapis_com_Instance`
)
) as vms
LEFT JOIN compute_googleapis_com_Disk AS disks

ON array_reverse(SPLIT(vms.disk_source.source,'/'))[ordinal(1)]=array_reverse(SPLIT(disks.name,'/'))[ordinal(1)]
WHERE NOT CONTAINS_SUBSTR(disks.resource.data.sourceImage, "images/abc")

Step 6: Use Joins to list Instance, Disk and the Source Image from two different tables and remove jargons

This query extracts project, VM name, disk name, and source image information for Compute Engine instances in a Google Cloud project.

  • It joins instance data with disk data to link the boot disk to its source image.
  • The WHERE clause filters out instances with source images containing “images/abc”.
  • The REGEXP_REPLACE function cleans up the source image URL by removing a standard prefix.
  • The SPLIT and ARRAY_REVERSE functions are used to parse resource names and extract relevant parts.
SELECT vms.name, vms.disk_source, disks.resource.data.sourceImage as source_image
FROM (
SELECT name, source[ordinal(1)] as disk_source
FROM (
SELECT
name,
ARRAY(
SELECT AS STRUCT
d.source
FROM
UNNEST(resource.data.disks) AS d
) AS source
FROM
`compute_googleapis_com_Instance`
)
) as vms
LEFT JOIN compute_googleapis_com_Disk AS disks

ON array_reverse(SPLIT(vms.disk_source.source,'/'))[ordinal(1)]=array_reverse(SPLIT(disks.name,'/'))[ordinal(1)]
WHERE NOT CONTAINS_SUBSTR(disks.resource.data.sourceImage, "images/abc")

Step 6: Final Query to list the Running Instances their Disk and the Source Image and exclude some images

This query generates a report of GCE instances in your project, listing their project, name, boot disk name, and the source image of the boot disk. It excludes instances with source images containing “images/abc,” allowing you to focus on specific types of VM configurations or deployments.

SELECT
SPLIT(vms.name, '/')[OFFSET(4)] AS project,
ARRAY_REVERSE(SPLIT(vms.name, '/'))[OFFSET(0)] AS vm_name,
ARRAY_REVERSE(SPLIT(vms.disk_source.source, '/'))[OFFSET(0)] AS disk_name,
REGEXP_REPLACE(disks.resource.data.sourceImage, r"https://www.googleapis.com/compute/v1/", "") AS source_image
FROM (
SELECT name, source[ORDINAL(1)] AS disk_source
FROM (
SELECT
name,
ARRAY(
SELECT AS STRUCT
d.source
FROM
UNNEST(resource.data.disks) AS d
) AS source
FROM
`compute_googleapis_com_Instance`
)
) AS vms
LEFT JOIN compute_googleapis_com_Disk AS disks
ON ARRAY_REVERSE(SPLIT(vms.disk_source.source,'/'))[ORDINAL(1)]=ARRAY_REVERSE(SPLIT(disks.name,'/'))[ORDINAL(1)]
WHERE NOT CONTAINS_SUBSTR(disks.resource.data.sourceImage, "images/abc");

This SQL query is designed to retrieve information about Google Compute Engine virtual machine instances and their associated disks within a Google Cloud Project.

Here’s a breakdown of its components and functionality:

1. Data Sources:

  • compute_googleapis_com_Instance: This table represents the inventory of GCE virtual machine instances in your Google Cloud Project.
  • compute_googleapis_com_Disk: This table stores information about persistent disks associated with your GCE instances.

2. Subquery (vms):

  • The innermost subquery (SELECT...FROM UNNEST) extracts the name of each VM instance and the source (a URI indicating the source image or snapshot) of its attached disks. It does this by unnesting the resource.data.disks array, which typically contains multiple disks per VM.
  • The next level of the subquery (SELECT name, source[ORDINAL(1)] AS disk_source) selects the VM's name and the source of its first (boot) disk, labeled as disk_source.

3. LEFT JOIN:

  • The main query uses a LEFT JOIN to combine information from the vms subquery with the compute_googleapis_com_Disk table. This join is based on matching the last part of the disk_source.source (disk name) with the last part of the disks.name. A LEFT JOIN is used to ensure that even VMs without matching disk entries in the compute_googleapis_com_Disk table are included in the result.

4. WHERE Clause:

  • The WHERE NOT CONTAINS_SUBSTR clause filters out rows where the source image (disks.resource.data.sourceImage) of the boot disk contains the substring "images/abc". This is presumably used to exclude specific types of images or snapshots from the results.

5. SELECT and Output:

  • The final SELECT statement extracts and formats the desired information:
  • project: The Google Cloud project where the VM resides.
  • vm_name: The name of the VM instance.
  • disk_name: The name of the VM's boot disk.
  • source_image: The source image or snapshot used to create the boot disk, with a standardized format by removing the "https://www.googleapis.com/compute/v1/" prefix using REGEXP_REPLACE.

Conclusion

GCP’s Asset Inventory is a powerful ally in managing your cloud resources. By querying VMs, disks, and source images, you gain insights to optimize costs, enhance security, plan capacity, and troubleshoot efficiently. With a little practice, you can turn the Asset Inventory into your compass for navigating the cloud landscape. Remember, its a dynamic tool that evolves with your infrastructure. Keep exploring and refining your queries to make the most of it.

Let's connect on LinkedIn for more opportunities to learn together!
https://www.linkedin.com/in/utkarsh6/

--

--

Utkarsh Sharma
Google Cloud - Community

Senior Solutions Consultant @ Google | Talks about AWS | GCP | Azure | K8s | IaC | Terraform | CI/CD | Docker| Helm | Migration