Managing your GCP inventory with Cloud Asset API
Using Cloud Functions with Cloud Asset API and BigQuery to keep track of your Google Cloud Platform (GCP) inventory.
The purpose of this article is to promote good housekeeping habits in the cloud. I am referring to making sure resources are cleaned up, especially in lower environments that sometimes get left behind and forgotten about once a project has moved onto production. Admittedly, housekeeping tasks are the not-so-sexy parts of cloud; engineers and architects just want to build things and play with new toys — not track resources they have provisioned.
I hope to show how you can call the Cloud Asset API using Cloud Functions and export the data to BigQuery which you can visualize using various BI tools. Cloud Scheduler can be used to invoke the Cloud Function at regular intervals and over time you can build a sufficient enough dataset that can help detect anomalies.
To accomplish this, we will require the following APIs to be enabled:
- Cloud Asset API
- Cloud Functions API
- Cloud Build API
- Cloud Pub/Sub API
- Cloud Scheduler API
- BigQuery API
Since we will be running a Cloud Function which uses the default App Engine service account (PROJECT_ID@appspot.gserviceaccount.com), you will to add the following IAM roles to the service account’s permissions:
We will also create a BigQuery dataset to export the Cloud Asset data to:
bq mk \
--data_location northamerica-northeast1 \
And because we will be using a Pub/Sub trigger, I will create a topic:
gcloud pubsub topics create cf-cloudasset-trigger
Create a folder with the following two files (I will be using the Python runtime):
I could not find the documentation for Python package which explicitly stated how to set the partition key, but it is in the google-cloud-asset package code and also in the Ruby documentation. I highly recommend setting a partition key to take advantage of partitioned tables in BigQuery (the alternative would be to have a separate table for each day). For exporting, I will be calling the exportAssets method (which can also export to a GCS bucket as a JSON file instead if you wish).
asset_types, if left unspecified will pull all asset types, however you may not want to pull everything and so you can specify a list of asset types instead which also supports regular expressions as I have in my example. For example: “.*.googleapis.com.*Instance” will match compute engine instances, Cloud SQL instances, Filestore instances, etc.
Deploy the function with
gcloud and set the Pub/Sub topic we created earlier as the triggering topic:
gcloud functions deploy export-cloudasset-to-bq \
--runtime python39 \
--entry-point export_tasks \
--trigger-topic cf-cloudasset-trigger \
--region northamerica-northeast1 \
We are going to want our Cloud Function to run on a schedule and to do that we will use Cloud Scheduler which is very easy to use and supports scheduling in cron format, which is very comforting for someone like me who comes from a systems administrator background.
gcloud scheduler jobs create pubsub daily-cf-cloudasset-export-job \
--location northamerica-northeast1 \
--schedule "0 6 * * *" \
--topic cf-cloudasset-trigger \
--message-body "Cloud Asset to BigQuery"
By default time is in UTC time, but you can specify a different time zone if you wish.
NOTE: I am exporting my Cloud Assets data to BigQuery once per day, but you may want to run it every 12hours instead (i.e.
0 */12 * * *). If you decide to export at a higher cadence, you will want to set
output_config.bigquery_desintation.force = False so that it will not overwrite any existing data for that day (remember it is date partitioned!).
Other use cases
The Cloud Asset API can also be used to locate resources that are inactive and can probably be deleted/cleaned up. To do this, I will be using the searchAllResources method — ideal for small/specific searches in mind as results are paginated and can result in a high volume of API calls if you happen to be querying a large dataset. The following example will return any global or regional address that have been reserved but not in use (as Google will bill you for that):
The above is just an example but there are many options/conditions you can pass to your query to help you discover wasted resources in your project or across your organization.
If you would like to learn more, see the examples in the links below:
Visualizing the data
I would recommend looking into using a BI tool such as Data Studio or Looker to visualize the Cloud Assets data that is being exported to BigQuery. A picture is worth a thousand words, and a bar or line graph can help you quickly detect any anomalies or spikes in your GCP inventory.
Unfortunately the visualization aspect is not my forte, but here is a sample time series chart I created in Data Studio on my sample project:
It only took me 5 minutes to create and I am sure that someone with more experience would be able to something more visually appealing.
NOTE: The table at the top shows the total record count across the 7 days which data was collected while the time series chart graphs using the count of each day.