Google Cloud Platform Technology Nuggets — June 1–15, 2025
Welcome to the June 1–15, 2025 edition of Google Cloud Platform Technology Nuggets. The nuggets are also available in the form of a Podcast.
AI and Machine Learning
We have a couple of solution guides published in this period. The first one outlines how to build multimodal AI agents for object detection using specific tools. It explains how to combine Gemini models with open-source frameworks like LangChain and LangGraph to create agents that can identify objects across various data types such as images, audio, and video. Check out the guide.
Next up is a solution guide to building a production-ready multimodal fine-tuning pipeline on Google Cloud using Axolotl, which provides a streamlined fine-tuning framework. The guide details a five-component pipeline including model selection, data preparation, configuration, infrastructure orchestration, and production integration, with a practical hands-on example of fine-tuning Gemma 3 on the SIIM-ISIC Melanoma dataset. Check out the blog post.
Gen AI Evaluation Service in Vertex AI lets you evaluate any generative model or application and benchmark the evaluation results against your own judgment, using your own evaluation criteria. It is a critical part of your journey to developing apps powered by the models, to ensure that you are a staying on track and are getting the results that you are expecting. Based on customer feedback, the Gen AI Evaluation Service has added new features that includes the following:
- Scaling your evaluation processes via the new batch evaluation.
- You can now evaluate your autoraters quality and align it.
- Rubrics driven evaluation, where the service automatically generates a unique set of rubrics, which you can review and customize. The autorater uses these custom-generated rubrics to assess the AI’s response.
- You can now get detailed insight into evaluating your agents response by getting a peek into its reasoning process, the sequence of calls, tools that it has used and more.
Check out the blog post for more details
Looking to deploy Meta’s Llama4 and DeepSeek AI’s DeepSeek models on Google Cloud’s AI Hypercomputer platform? Its not that easy a task and how do you even go about this. The blog post provides some solid details on doing that as it highlights support for various Llama4 models, including Scout and Maverick, and DeepSeek models and how they can be served using JetStream, MaxText, Pathways, and vLLM on Trillium TPUs and A3 GPUs.
Gartner® has named Google as a Leader in the 2025 Magic Quadrant™ for Data Science and Machine Learning Platforms report (DSML). The blog post highlights key products in the suite that help position Google in the leaders quadrant. Download the complimentary 2025 Gartner Magic Quadrant™ for Data Science and Machine Learning Platforms.
Containers and Kubernetes
Its time to understand a new feature in GKE called GKE Volume Populator. This is a new feature designed to streamline data transfers for AI/ML workloads on Google Kubernetes Engine (GKE). Think of this as a single place where you put your data and expect this service to move it around to other target sources with flexible configurations. This tool addresses the complexity of moving data between different storage systems, such as Cloud Storage and specialized accelerators like Hyperdisk ML. Check out the blog post for more details and the specific example of moving data from a Cloud Storage bucket to a Hyperdisk ML instance to accelerate the loading of model weights, scale up to 2,500 concurrent nodes, and reduce pod over-provisioning.
Identity and Security
Google Cloud Security Command Center, which works to ensure that your cloud workloads are secure and recommends fixes, has got key new capabilities announced. These include:
- Agentless scanning for Compute Engine and GKE: This enables discovery of software and OS vulnerabilities in virtual machine instances and GKE clusters without requiring software deployment on each asset.
- Artifact Analysis integration: This feature finds vulnerabilities in container images by supporting vulnerability scanning for images stored in Artifact Registry and deployed to GKE clusters, Cloud Run, or App Engine.
- Threat detection for Cloud Run: This integrates specialised detectors that continuously analyse Cloud Run deployments for potentially malicious activities
- Foundational log analysis: This uncovers network anomalies by automatically detecting connections to known bad IP addresses through direct, first-party access to internal network traffic logs
We envision a world full of agents. But what is our approach to securing them? Is there a difference between securing general AI and agents or do we need a different approach towards securing agents. Check out this central theme, in the first Cloud CISO Perspectives for June 2025.
In 2016, a large Distributed Denial-of-Service attack took down Brian Kreb’s website, KrebsOnSecurity for four days. That incident led to Brian adopting Project Shield that was able to successfully defend this site from one of the largest Distributed Denial-of-Service (DDoS) attacks ever observed, peaking at 6.3 terabits per second. To imagine the scale, this is roughly 63,000 times the speed of broadband internet in the U.S. and was 10x times the attack that was faced in 2016. Project Shield operates as a reverse proxy, leveraging Google Cloud’s robust networking services like Cloud Load Balancing, Cloud CDN, and Cloud Armor to filter malicious traffic and serve cached content. Check out the blog post to understand how the service works to make this happen.
Data Analytics
The Google Cloud blog is fast introducing “Whats new in XYZ” bulletins that might make my job of writing these nuggets obsolete. Despite that, I am going to highlight good summaries that might save you some time and give you a snapshot of “Whats new with Google data cloud?”. Check it out.
A few services in Google Cloud are not too interesting for a lot of folks coming into the platform but they have been the backbone / glue across several Google Cloud services. One such workhorse is Cloud Pub/Sub, the fully-managed messaging platform at global scale. Typically, Cloud Pub/Sub helps to send messages from one system to another and often as you know, the data might not be in a format that is immediately consumable to the other side or some filtering needs to be done before you can let the recipient have a sniff of it. These transformations were typically done in either of the sending/receiving systems via other Google Cloud services and as is obvious, introduce their own complexity and latency. Enter Single Message Transforms (SMT), whose purpose is to making it easy to validate, filter, enrich, and alter individual messages as they move in real time. The blog post states that “The first SMT is available now: JavaScript User-Defined Functions (UDFs), which allows you to perform simple, lightweight modifications to message attributes and/or the data directly within Pub/Sub via snippets of JavaScript code.” Check it out.
If BigQuery is central to your cloud workloads, it is essential that you have control over how BigQuery resources are allocated and consumed. Towards this, you would be familiar with slot management, reservations, spend-based commitments and other such terms. So as the blog post states, BigQuery workload management is a suite of features that allows you to prioritize, isolate, and manage the execution of queries and other operations (aka workloads) within your BigQuery project and then highlights several updates that have been added to BigQuery workload management especially around Reservations: fairness, predictability, flexibility and securability. Check it out.
“The thinking was clear: BigQuery was for analysis, and you used something else for dynamic data manipulation.” That is clearly changing, as the blog post on Transactional features in BigQuery states. The post is fairly detailed as it highlights 3 key features:
- How BigQuery now handles targeted UPDATEs, DELETEs, and MERGEs with significantly improved performance and resource efficiency.
- how BigQuery can now capture the granular history of UPDATEs and DELETEs, providing a detailed audit trail of data within your tables.
- how you can apply UPDATE, DELETE, and MERGE operations directly to data as it streams into BigQuery
Google Cloud Serverless for Apache Spark within BigQuery is in general availability. This provides a unified developer experience in BigQuery Studio with significant benefits like reduced total cost of ownership, enhanced performance via Lightning Engine, open standards support, enterprise features and Gemini-powered assistance. The post also highlights how to get started with Spark in BigQuery.
Check out the blog post for more details.
When you use a managed service on Google Cloud, one of the things that quickly hits you is to understand ways to optimize its usage with cost effectiveness and efficiency in mind. Google Cloud has published a hands-on guide to optimize your Managed Service for Kafka deployments for throughput and latency. It discusses key parameters that affect Kafka Consumers (fetch-size) and Producers (acks, batch.size, linger.ms and compression) along with a benchmarking test to help you understand better. Check it out.
Databases
If you are working with databases, you would have seen the increasing capabilities in SQL dialects when it comes to working with AI infused features in the databases. Not just that but even the move towards using more of natural language queries instead of SQL to interact with the database instance, tables and more. The rise of Model Context Protocol (MCP) tools brings additional capabilities to MCP supported client IDEs and you can now be in your favourite IDE or tool that supports MCP and be able to interact with a wide range of databases via natural language. The MCP Toolbox for databases makes it trivial to integrate this feature into modern IDEs and tools that support MCP now. You could be within tools and do queries like these:
Check out the blog post that highlights how MCP Toolbox for databases enabled AI-assisted developments.
Bigtable’s single-row read throughput has seen some significant performance improvements. The blog post highlights the 70% increase that now supports up to 17,000 point reads per second at no extra cost to users. Fundamental to this performance improvement has been the Block cache, a low-level block cache to store frequently accessed data in DRAM and now they have aded a new row cache, that builds upon this, that caches data at row granularity.
Continuing with BigTable, check out a blog post that highlights the Bigtable Spark connector, that allows you to directly read and write Bigtable data using Apache Spark in Scala, SparkSQL and DataFrames. The post highlights how this connector can help accelerate data science and can make BigTable serve as a low-latency mechanism to do lookups against large-scale datasets.
Developers & Practitioners
Cloud Run continues to build features that make it a more compelling service in the age of AI applications. How about cost effective AI inferencing for everyon? That is now possible with NVIDIA GPU support for Cloud Run, which is generally available. You get the goodness of pay-per-second billing, scale to zero, rapid startup and scaling, full streaming support, SLA, multi-regional GPUs and more. Check out the blog post or one of the quickstarts.
Infrastructure
Compute Engine M4, the most performant memory-optimized VM with under 6TB of memory, is now in general availability. Powered by Intel’s 5th generation Xeon processors, these are best suited to workloads like SAP HANA, SQL Server, and in-memory analytics that benefit from higher memory-to-core ratio. Check out the blog post.
G4 VMs based on NVIDIA RTX PRO 6000 Blackwell Server edition are now available in preview, expected to be globally available by the end of the year. These VMs, integrated into Google Cloud’s AI Hypercomputer system, are designed to significantly boost performance for AI, graphics, and gaming workloads. Check out the blog post.
Learn more about Google Cloud
In this edition, we focus on Hyperdisk. Choosing the right block storage for your workload is crucial and specifically, Hyperdisk is Google Clouds’ workload-optimized block storage that’s designed for the latest VM families (C4, N4, M4, and more), delivers high-performance storage volumes that are cost-efficient, easily managed at scale, and enterprise-ready. What are the different kinds of Hyperdisk flavors available in the platform and scenarios in which they are best used, check out this blog post.
Write for Google Cloud Medium publication
If you would like to share your Google Cloud expertise with your fellow practitioners, consider becoming an author for Google Cloud Medium publication. Reach out to me via comments and/or fill out this form and I’ll be happy to add you as a writer.
Stay in Touch
Have questions, comments, or other feedback on this newsletter? Please send Feedback.
If any of your peers are interested in receiving this newsletter, send them the Subscribe link.