Guide to Monitoring GCP Ops Agents Across Projects

Utkarsh Sharma
Google Cloud - Community
4 min readApr 24, 2024

Google Cloud Platform (GCP) offers a plethora of tools and services to streamline operations and enhance efficiency for businesses of all sizes. Among these, Operations Agents play a vital role in monitoring, logging, and managing various aspects of your infrastructure. However, keeping track of these agents across multiple projects can be challenging. In this guide, we’ll explore how you can efficiently monitor Ops agents across projects on GCP.

We recently encountered a compliance requirement from one of our clients, necessitating the tracking of Operations Agents on all virtual machines (VMs) across projects within the organization. They also requested a report highlighting VMs lacking Operations Agents to facilitate their installation. However, we discovered that there isn’t a direct option available for such tracking. Nevertheless, there are several alternative options we can explore to fulfill this requirement effectively. Let’s delve into each available option.

Option 1

Google Cloud Monitoring offers built-in dashboards that enable checking VMs within a project with or without Operations Agents. Follow these steps:

  1. Navigate to the Monitoring section.
  2. Select Dashboard.
  3. Search for “VM instances” and click on it.
  4. Choose the Table view.

This displays all VMs in the project, indicating those with and without Operations Agents. However, to extend this tracking to all projects within the organization, we can include all projects in the metrics scope. Follow these steps to add projects to the metrics scope:

  1. Visit the Monitoring section.
  2. Access Metrics Scope.
  3. Choose “Metrics monitored by this project”.
  4. Click on Add Projects and include all desired projects.

This action adds the projects to the metric scope of the main project, allowing visibility of VMs within the “VM Instances” dashboard.

These steps have provided us with a list of VMs in a dashboard view, which unfortunately cannot be downloaded or accessed via CLI/API. This limitation means that we cannot generate a CSV report using this method. The “VM Instances dashboard” is a prebuilt default dashboard provided by GCP, and we can only download custom dashboards.
Option 2 offers automated solutions to obtain a CSV report:

Option 2

Another approach to identify VMs lacking the Ops Agent is by utilizing MQL queries in the Metrics Explorer, employing outer_join’s default values:

  1. Navitage to the Monitoring section.
  2. Access Metrics Explorer.
  3. Select for MQL in the language section.
  4. Insert the provided query.
fetch gce_instance
| { metric 'compute.googleapis.com/instance/uptime'
; metric 'agent.googleapis.com/agent/uptime' | filter metric.version =~ 'google-cloud-ops-agent-metrics/.*' }
| group_by [resource.project_id, resource.zone, resource.instance_id, metadata.system.name], .count()
| outer_join 0, 0
| filter val(1)==0

5. Download the resulting CSV.

This option will also compile a list of all VMs without the Ops Agent across all projects configured within the metrics scope.

The generated CSV output is non-standard and requires modification to make it readable. To achieve this, utilize the provided shell or Python script. The script is designed to take the downloaded CSV file, convert it into a readable format, and eliminate any unnecessary information.

#!/bin/bash

# Define filenames
input_file="csv.csv"
output_file="output_$(date +"%Y%m%d%H%M%S").csv"

# Print the first 5 lines of the CSV file
head -n 5 "$input_file" > temp.csv

# Convert rows to columns
awk -F, '
{
for (i=1; i<=NF; i++) {
if (NR == 1) {
col[i] = $i
} else {
col[i] = col[i] "\t" $i
}
}
}
END {
for (i=1; i<=NF; i++) {
print col[i]
}
}
' temp.csv > "$output_file"

# Remove temporary file
rm temp.csv

echo "Output written to $output_file"
#!/usr/bin/env python3
import csv
import sys
from datetime import datetime

def main():
# Define filenames
input_file = "csv.csv"
output_file = f"output_{datetime.now().strftime('%Y%m%d%H%M%S')}.csv"

# Read the first 5 lines of the CSV file
with open(input_file, 'r') as file:
reader = csv.reader(file)
first_5_lines = [next(reader) for _ in range(5)]

# Transpose the data
transposed_data = list(map(list, zip(*first_5_lines)))

# Write the transposed data to a new CSV file
with open(output_file, 'w', newline='') as file:
writer = csv.writer(file, delimiter='\t')
writer.writerows(transposed_data)

print(f"Output written to {output_file}")

if __name__ == "__main__":
main()

Run the Script

Execute the Bash Script and get modified CSV with Timestamp
Execute the Python Script and get modified CSV with Timestamp

CSV Output

I trust this will assist you in monitoring the status of Ops Agents for your specific use case. Thank you for taking the time to read through this, and stay tuned for future updates.!!

--

--

Utkarsh Sharma
Google Cloud - Community

Senior Solutions Consultant @ Google | Talks about AWS | GCP | Azure | K8s | IaC | Terraform | CI/CD | Docker| Helm | Migration