Google Cloud Platform Technology Nuggets — May 16–31, 2025
Welcome to the May 16–31, 2025 edition of Google Cloud Platform Technology Nuggets. The nuggets are also available in the form of a Podcast.
Agent Development Kit Hackathon
If you are into developing agents using Agent Development Kit (ADK), you should consider taking part in the Google Cloud sponsored hackathon with $50,000 total prize money. Hurry up since the deadline is June 24, 2025. Check out the blog post for more details.
AI and Machine Learning
We have mentioned this before that if you are invested in Google Cloud AI, there is a monthly summary being published that highlights all that has been announced in Google AI. It is an excellent summary and the second edition is here.
It’s just been a few weeks since Cloud Next ’25 happened and we came across media model updates around Imagen, Veo and more. The pace at which things are moving and with expectations through the roof in terms of what consumers and developers are expecting, we now have the next set of updates to the media models:
- Imagen 4 text-to-image generation on Vertex AI in public preview.
- Veo 3 , the latest state-of-the art video generation model from Google DeepMind.
- Lyria 2, Google’s text-to-music model.
Check out the blog post that covers a range of prompts and the lovely media outputs that these models produce now.
Gemini 2.5 Pro and Flash models have got extended capabilities:
- You now have access to Thought summaries, which help you validate how the model has gone about servicing the prompt. If you haven’t taken a look at these summaries, they are interesting to watch and learn from.
- Deep Think mode, an enhanced reasoning mode is now available for complex tasks like coding and math.
- Additionally security enhancements to prevent prompt injection attacks.
It all begins with a prompt, isn’t it? Sure but the experience still matters. Vertex AI Studio has got a revamp and it looks a lot similar to the one in AI Studio. The revamped interface provides a solid experience in terms of helping to refine prompts, prompt gallery, model cards, sensible defaults for most model settings, grounding your responses to Google Search, RAG Engine and more. That’s quite a list and I suggest you check out the blog post or better still, directly jump to Vertex AI Studio.
While we are speaking of the new experience that Vertex AI Studio brings to the table, Colab Enterprise does not want to get left behind too. Multiple enhancements there too with Gemini assistance available now (code completion, code generation, code explanation, fix errors), to sample notebooks and a revamped experience. Check out the blog post for more details.
Enterprises are rushing to developing AI applications that can help search and surface the right results from their corpus of data. There are still challenges to get the right results and multiple pre and post techniques are in use to improve the results. Google Cloud has launched a new state-of-the-art Vertex AI Ranking API to boost the precision of information surfaced within search, agentic workflows, and retrieval-augmented generation (RAG) systems. But how does a ranking system work and where does it fit in? A ranking system is a refinement layer, that takes the candidate list from your existing search or retrieval system and re-orders it based on deep semantic understanding as shown below. This is to ensure better results.
To do that, Vertex AI Ranking API has released two ranking models: semantic-ranker-default-004 and semantic-ranker-fast-004 . Check out the blog post for details on how to get started, a demo and more.
Cloud Run is probably the one of the easiest ways to host your container applications at scale. It is well positioned to become the environment for hosting your AI applications too and it might win that battle with the minimal steps needed or shall we say just a click away from multiple surfaces i.e. across Google Products. You can deploy your Agent app directly to Cloud Run, from AI Studio, from Vertex AI Studio and more. Not just that, but Cloud Run now has a MCP server, so that you can work with your AI agents to deploy them with a few commands. Check out the couple of blog posts that cover this in detail: blog 1 and blog 2.
Machine Learning on mobile devices is where one expects that a vast majority of consumers, with ever powerful mobile phones, will prefer running it. But how does your model currently perform on a mobile device, given the vast range of mobile devices out there. Enter Google AI Edge Portal in private preview, Google Cloud’s new solution for testing and benchmarking on-device machine learning (ML) at scale. Check out the blog post.
Vertex AI Model Garden has Anthropic’s newest generation of the Claude model family: Claude Opus 4 and Claude Sonnet 4. These are generally available as a Model-as-a-Service (MaaS) offering. Check out the blog post for more details and getting started with these models.
And that’s not all. We have Mistral AI’s Le Chat Enterprise, a generative AI work assistant, now available in the Cloud Marketplace and Mistral OCR 25.05, a leading OCR model, available in Vertex AI Model Garden. Check out the blog post.
Containers and Kubernetes
llm-d is a Kubernetes-native high-performance distributed LLM inference framework. Google has joined a community effort to make this project widely available. As per the blog post, llm-d has 3 key innovations:
- Instead of traditional round-robin load balancing, llm-d includes a vLLM-aware inference scheduler, which enables routing requests to instances with prefix-cache hits and low load, achieving latency SLOs with fewer hardware resources.
- To serve longer requests with higher throughput and lower latency, llm-d supports disaggregated serving, which handles the prefill and decode stages of LLM inference with independent instances.
- llm-d introduces a multi-tier KV cache for intermediate values (prefixes) to improve response time across different storage tiers and reduce storage costs. llm-d works across frameworks (PyTorch today, JAX later this year), and both GPU and TPU accelerators, to provide choice and flexibility.
GKE Data Cache, a new feature for Google Kubernetes Engine designed to significantly improve the performance of read-intensive applications by leveraging high-speed local SSDs as a cache layer, is now in General Availability. It works with existing Persistent Disk or Hyperdisk volumes, automatically caching frequently accessed data to reduce read latency and increase query throughput. Check out the blog post.
Identity and Security
There are a couple of CISO bulletins in this edition. The first one highlights what does it take to focus on the goal of making Google Cloud the most secure cloud. The bulletin discusses how integrating security early in the development process, known as “shifting left,” is crucial for building secure products along with how threat intelligence, AI and data science pitch in through the process.
The second installment in the month focuses on how government agencies can leverage Artificial Intelligence (AI) to improve threat detection and simultaneously reduce costs.
Data Analytics
The integration of BigQuery with Vertex AI models opened up multiple opportunities of utilizing the power of the models within the database layer itself. One such area is combining unstructured data with structured data with the models to produce data that would have been too complex or required multiple hoops to do so. Consider the following, you have a bunch of images stored in Google Cloud Storage and have configured an external table in BigQuery to reference that. You can now use AI functions inside of BigQuery that can fire prompts over the unstructured data (“which city is this picture from?”) and generate additional data in a structured way. As the blog post states “a new BigQuery feature called AI.GENERATE_TABLE(), allows you to automatically convert the insights from your unstructured data into a structured table within BigQuery, based on the provided prompt and table schema.”
Apache DataSketches is an open-source library of sketches, specialized streaming algorithms that efficiently summarize large datasets. BigQuery has announced availability of Apache DataSketches functions. Check out the blog post to understand what sketches, sample sketches, how customers are using it and how to get started.
Google Data Cloud lakehouse architecture has announced several updates, some of which include:
- BigLake Iceberg native storage now leverages Google Cloud Storage.
- BigLake-managed Iceberg data can now be managed via BigQuery (GA) and AlloyDB for PostgreSQL(Preview).
- A new Lightning Engine (Preview) that boosts Apache Spark performance.
Check out the blog post for more details.
Speaking of Lightning Engine, it is a high-performance query engine for data, that pushes Spark performance to new limits. As the blog post states, “For example, at a 10TB dataset size, Lightning Engine accelerates Spark query performance by 3.6x on TPC-H-like workloads when compared to open source Spark running on similar infrastructure.” Lightning Engine is available in preview in both Google Cloud Serverless for Apache Spark and Dataproc on Google Compute Engine premium tiers.
Databases
Database Center is an AI-assisted dashboard that gives you one centralized view across your entire database fleet. A blog post highlights what Database Center brings to the table and stresses on its focus to provide you with enhanced performance and health monitoring for all Google Cloud databases, including Cloud SQL, AlloyDB, Spanner, Bigtable, Memorystore, and Firestore.
When it comes to working and querying with databases, it is a given that Natural Language queries are seeing a surge and it is expected that those will be supported by all databases and translation to the SQL will do the needful behind the scenes. NL2SQL is supported across Google Cloud Database products like BigQuery, AlloyDB and also with across the board Vertex AI Integration. The first part in a new series, aims to explore the technical internals of Google Cloud’s text-to-SQL agents. It will highlight approaches to context building and table retrieval, how to do effective evaluation of text-to-SQL quality with LLM-as-a-judge techniques, the best approaches to LLM prompting and post-processing and more. Check out the blog post.
With the rise of more capable AI models with multi-modal support, it is assumed that queries that involve both structured and unstructured data will get more prominent and databases will be expected to support them well. AlloyDB has introduced enhancements to AlloyDB AI’s ScaNN index to improve performance and quality of search over structured and unstructured data. The blog post highlights filtered vector search, a technique that combines traditional database filters with vector search to improve relevance. These filters are applied based on their selectivity i.e. a filter matches a large or small percentage of the data — using pre-filtering, post-filtering, or inline filtering.
Looking to migrate from Cassandra to Spanner, check out the blog post that highlights Spanner’s native support for the Cassandra Query Language (CQL) API. This enables organisations using Apache Cassandra to migrate their applications to Spanner with minimal code changes and downtime.
Developers & Practitioners
There has been a fair bit of action on the Java front, to enable the vast pool of Java Developers to start building out Agentic applications in their favorite language. Spring Framework is vastly popular and the good news is that Spring AI 1.0 is out, and as the blog post states “it is a strategic move to position Java and Spring at the forefront of the AI revolution.” Check out the blog post that dives into what is available today, getting started and lots of configuration and code snippets.
Infrastructure
The AI Hypercomputer ecosystem is a bit difficult to understand at times in terms of what it offers. A recent blog post aims to highlight the various developer experience enhancements to help communicate an improved AI developer experience. Key improvements include Pathways on Cloud for seamless scaling of interactive notebook workflows to vast numbers of accelerators, and Xprofiler for in-depth performance analysis and debugging. The blog highlights pre-built container images and optimized recipes for popular frameworks like PyTorch and JAX to simplify setup and boost training efficiency.
DevOps and SRE
We had talked about Personalized Service Health a while back. As the service states, it provides you visibility into disruptive events impacting Google Cloud products and services relevant to your projects. All events are available in the Google Cloud console and a variety of integration points, including custom alerts, API, and logs. So while there is a dedicated section in the Google Cloud console that gets you this information, the easiest way to interact with Google Cloud with the agentic era upon us, is via chat. And when it comes to chat, we are talking about Google Cloud Assist. Personalized Service Health is now integrated within Google Cloud Assist and you can execute queries like:
- Tell me more about the ongoing Incident ID [X] (Replace [X] with the Incident ID)
- Show me the details of Incident ID [X].
- Can you guide me through some troubleshooting steps for [impacted Google Cloud product]?
Check out the blog post for more details.
Write for Google Cloud Medium publication
If you would like to share your Google Cloud expertise with your fellow practitioners, consider becoming an author for Google Cloud Medium publication. Reach out to me via comments and/or fill out this form and I’ll be happy to add you as a writer.
Stay in Touch
Have questions, comments, or other feedback on this newsletter? Please send Feedback.
If any of your peers are interested in receiving this newsletter, send them the Subscribe link.