Google Cloud Platform Technology Nuggets — January 1–15, 2025 Edition
Welcome to the January 1–15, 2025 edition of Google Cloud Platform Technology Nuggets. The nuggets are also available in the form of a Podcast.
I wish you all a Happy New Year 2025 and here’s to bringing you another year with exciting Google Cloud Platform updates. Its been a light start to 2025 and the updates are trickling in, so this edition is not that long.
If one of your resolutions this year and its a good one if you ask me, is to write and share your Google Cloud expertise with your fellow practitioners, consider becoming an author for Google Cloud Medium publication. Reach out to me via comments and/or on LinkedIn and I’ll be happy to add you as a writer.
Machine Learning
RAG (Retrieval Augmented Generation) is one of the well known techniques while designing Generative AI applications, that need to be grounded to data sources specific to the organization. Google Cloud has announced a General Availability (GA) of Vertex AI RAG API, a fully managed service, that helps you build and deploy RAG implementations with your data and methods. The Vertex AI RAG API supports various Data sources like Google Drive, Google Cloud Storage along with multiple file formats (PDF, Docs, etc) and a choice of Vector Stores to automatically do the grunt work of an ingestion pipeline that will parse, chunk and generate vector embeddings to get the data ready to be retrieved. It does that via the concept of a corpus of documents, that can be managed by the organization via an API. Retrieval, Ranking and Serving APIs are also available for integrating into client applications along with using the RAG Corpus as one of the Tools while using a Gemini model directly.
Check out the blog post for the details on the service, high level code and more.
Mistral AI’s newest models, Mistral Large 24.11 and Codestral 25.01, are now available on Vertex AI Model Garden. Codestral 25.01, 2.5 times faster than its predecessor, is specifically is suited for coding assistance with support for over 80 programming languages and developer tasks like code completion, fill-in-the-middle (FIM), and testing.
If you are looking to take the model for a spin, try out the sample code. Check out the blog post for more details.
When it comes to Large Language Models and the need to utilize them for task-specific activities for which you have specific date, tuning those models might be necessary. One of the approaches to that is that of Supervised Fine-Tuning (SFT), which is ideal when you have a specific task in mind and possess labeled data to guide the model. In the next part of their series, the authors discuss how to prime your SFT task for success and this includes model selection, a high quality dataset and evaluation methods. Check out this deep dive blog post that covers each of the above areas in terms of what is available today and best practices around them. And in case, you have missed the first article, that covered SFT and when you should employ it, do check it out here.
Users are placing a high premium on interacting with search applications and are becoming conscious of the search quality. In a world where efficiency is paramount, providing good search results to your users is critical. The way users expect to interact with the search applications is not just related now to keywords, but with natural sounding conversational text and even with images. How do we build out multimodal search at scale? Are there some interesting techniques that we can explore? An approach that combines Vertex AI Search and vector search, using an ensemble method with weighted Rank-Biased Reciprocal Rank (RRF) is proposed by the authors. In simpler words, the ensemble approach uses both text and image capabilities to search and then combine the results via something like this : Text search with Vertex AI Search, Image Search with image embeddings in vector search and combining results with weighted RRF. Check out the blog post for more details.
If you have made solid progress in your Generative AI applications and are looking at MLOps, one of the areas, which could benefit from efficiency would be the preprocessing cylce, where the data is made ready in way for the models to get trained. The article proposes a distributed data preprocessing pipeline that leverages the power of Google Kubernetes Engine (GKE), a managed Kubernetes service, and Ray, a distributed computing framework for scaling Python applications.
Retailers are slow in building out their AI foundation, primarily due to a lot of their existing investment in resources that are localized and inside the store fronts. The article looks at “inference at the edge”, a technique that runs AI-optimized applications on local devices without relying on distant cloud servers, can transform retail assets into powerful tools. It starts off with various digital assets that could be tapped into and employees accesses this wealth of information via digital conversational agents. Then it explores the CPU v/s GPU question and more. Check out this guide that will provide a number of ways for retailers to get started.
Identity and Security
Security is a shared responsibility between the Cloud provider and the Customer. One expects that the Cloud provider continues to keep adding features that help the customer get notified about things that should not be happening at the earliest. And best, if existing tools are upgraded with the capability to do so. One such feature recently announced is Google Cloud Abuse Event Logging. Any important abuse and security notifications, like malware, leaked service account keys, crypto mining incidents, etc were being tracked but customers were being notified via email. This has now got augmented with Google Cloud Abuse Event Logging, which will log Abuse Events using Cloud Logging, which means that you can see them in your Google Cloud Logs Explorer, Automate reporting by tracking specific events and see historical trends. Check out the blog post to understand this and the Log Event format for Abuse Events.
There is a new series on security titled “How Google Does It”, that will cover insights, observations, and top tips about how Google approaches some of today’s most pressing security topics, challenges, and concerns. In the first episode of the series, the topic is that of Threat Detection and how Google makes it high-quality, scalable and modern. The article is at a high level but stresses on best practices like Automation Everywhere, building an asset inventory and more.
While Google Cloud Next 2025 is around 3 months away, the security team has taken the lead and highlighted whats in store at the event when it comes to Google Cloud and Security. Check out the details. If you are looking for Google Cloud Next registrations, check out the main site.
Customer Stories
Deutsche Börse Group began developing a new cloud-native, purpose-built trading platform that can trade in all types of assets, from equities to ETFs. Depending on the pattern of the markets, there could be different types of investors and the key thing would be various connectivity capabilities that they would have or need to connect to and use the platform. The architecture used direct ingress to the Google Cloud’s platform, and leveraged a Global External Proxy Network Load Balancer (GEPNLB) for traffic from both TCP/IP sockets and WebSocket clients. Each market environment utilizes its own set of Network Endpoint Groups (NEGs) and Google Kubernetes Engine clusters. Check out more on the architecture in this blog post.
Management Tools
What is a “blast radius”? It is a term used to describe the potential damage or impact of an explosion or security breach. Since we are talking about the extent of damage, it is a good way to look at minimizing the impact of any new changes that are being rolled into production, from affecting as few users as possible. We know that any changes (even the smalles of configuration changes) are likely to result in some issues and which means a loss of functionality or even full-blown access to the application for your users. You want to minimize that to a large extent as possible. Google Workspace Site Reliability Engineering team approach “is to reducing the risk of global outages is to limit the ‘blast radius,’ or extent, of an outage by vertically partitioning the serving stack. The basic idea is to run isolated instances (“partitions”) of application servers and storage’. Check out the blog post for more details on this.
Developers and Practitioners
When creating a conversational agent, our mental model is more around prompts or queries being received from the user and the agent responding to that. While we do understand that the responses could be multi-modal in nature, here is an interesting post that uses the response from the agent as a way to drive i.e. update the web user interface. So imagine a scenario, where you have embedded a chat functionality in your web page and the user asks for a product recommendation. The response can be used to drive sections on the page to get updated with the web results. And how about doing all this via Dialogflow and integrated via the Dialogflow Messenger component in your web page. Check out the post for more details.
What do we mean by identifying natural breaks via Scene change detection technology in a video? If we could do that, would advertisers be better off in placing ads at natural places in the video to boost user engagement? That’s the premise of this article titled “Enhance viewer engagement with gen AI-powered scene detection for ads”.
As you can see from the metadata above, we can ask Gemini to understand the nuances of video content and generate very granular contextual metadata. The article provides a proof of concept architecture for this scenario along with a notebook for video analysis.
Learning: JAX from the lens of a PyTorch developer
If you are a PyTorch developer, familiar with its building blocks but itching to get started with the JAX framework, here is a tutorial that helps you understand the JAX construct from a PyTorch lens. The example in the blog post is about training a simple neural network in both frameworks for the classic machine learning (ML) task of predicting which passengers survived the Titanic disaster. If you would directly like to jump in all the code samples, check out this Kaggle notebook.
Stay in Touch
Have questions, comments, or other feedback on this newsletter? Please send Feedback.
If any of your peers are interested in receiving this newsletter, send them the Subscribe link.