Google Cloud Platform Technology Nuggets — November 1–15, 2024 Edition
Welcome to the November 1–15, 2024 edition of Google Cloud Platform Technology Nuggets.
The nuggets are also available in the form of a Podcast. Subscribe to it today.
Containers and Kubernetes
We got several updates in the Containers and Kubernetes area in this edition.
A new DNS-based endpoint for GKE clusters, is available today on every cluster, regardless of version or cluster configuration. This simplifies the methods and overcomes various challenges associated with Kubernetes control plane access, that included setting up a proxy/bastion host, complex firewall/IP configurations and more. The new DNS-based endpoint for GKE provides a unique DNS or fully qualified domain name (FQDN) for each cluster control plane. applies security policies to reject unauthorized traffic, and then forwards traffic to your cluster.
Check out the blog post for more details and steps to get started.
Consider the following requirements: managing multiple Kubernetes clusters across different environments and/or even cloud providers. Check out this blog post that presents a solution for this using:
- Google Kubernetes Engine (GKE) fleets
- Argo CD, a declarative, GitOps continuous delivery tool for Kubernetes
- Incorporated with Connect Gateway and Workload Identity
Check out the blog post for a step by step process to do that, include source code/scripts that you can use.
When it comes to AI training, some of the data indicates that training these large models on modern accelerators already requires clusters that exceed 10,000 nodes. Google Kubernetes Engine (GKE) has decided to push the limits, it has moved up its support from a limit of 15,000 nodes to 65,000-node clusters. There are a number of features introduced over the last several months that have added up and played a role in making this possible. Check out the blog post for details and a good roundup of several innovations introduced in last few months like Secondary boot disk, Custom compute classes, Hyperdisk ML and more.
Consider the following diagram that shows the flow of an AI inferencing model being started up in Kubernetes and getting ready to serving inference requests. Existing inference servers such as Triton, Text Generation Inference (TGI), or vLLM are packaged as containers and take their time to initial load and start the container, then moving into the model pull. How do we keep optimizing these steps and accelerate data loading for both inference serving containers and downloading models + weights. That is exactly what this blog post is about.
Infrastructure
You’ve heard about Trillium, the sixth-generation Tensor Processing Unit (TPU). The first MLPerf training benchmark results for Trillium have been released by Google Cloud. As the blog post states, “The MLPerf 4.1 training benchmarks show that Trillium delivers up to 1.8x better performance-per-dollar compared to prior-generation Cloud TPU v5p and an impressive 99% scaling efficiency (throughput).” The blog post dives into the technical details of Trillium’s performance, which metrics they are considering, measurements and then comparing it against its predecessor, Cloud TPU v5p.
Networking
Google Cloud had recently announced Service Extensions plugins for Application Load Balancers in Preview. What does this mean? A Service Extension is a custom piece of code, built using WebAssembly (Wasm), that can be injected into the load balancer’s data processing path to modify or enhance the behavior of incoming requests and responses. Think of it as a way to customize the application load balancing process (Header addition/manipulation, security, custom logging, etc) at the edge of the network with custom logic. Service Extensions plugins are now available as part of the existing traffic extension for the global external Application Load Balancer. Check out the blog post for more details.
You’ve heard about Cloud Next-Generation Firewall (NGFW) and it has been stressed to move on from legacy VPC firewall rules to Cloud NGFW’s powerful and flexible firewall policies. To make this process easier, a migration tool has been developed. Check out this detailed blog post that highlights 3 scenarios (simple, medium and advanced complexity) in doing this migration. The tool is integrated and available from the Google Cloud CLI.
Identity and Security
We knew this would happen someday. Multi-factor authentication (MFA) is coming to Google Cloud, who currently sign in with just a password and will be rolled out to all users in a phased manner across 2025. There has been strong evidence pointing towards the need to do this, including one by the Mandiant Threat Intelligence team, that says that phishing and stolen credentials remaining a top attack vector.
If you’d like to do the 2-Step Verification setup for your Google Account, there are steps in the blog post.
The first Cloud CISO Perspectives for November 2024 is available here. Organizations continue to use legacy technology and potentially are incurring several costs when it comes to security. This edition of the CISO Perspective looks at “Confronting the high security cost of legacy tech” and mitigating some of the threats that come along with it.
The Q4 Security Talks 2024: Defender’s Advantage, a dedicated day-long virtual event is here. Sign up now. If you miss the event, you can always sign up and get access to some of the recordings.
Machine Learning
Looking to deploy a Large and open AI models on GKE? How about Meta’s Llama 3.1 with 405 billion parameters? Check out this step by step article that starts with the Model Garden to locate the specific model, quick deployment on GKE and the fact that due to the sheer size of the model, multi-host deployment and serving is the only viable solution, where it uses LeaderWorkerSet with Ray and vLLM to deploy over GKE.
Moving on from GKE to Cloud Run now. With support for GPUs now, the blog post looks at deploying the Meta Llama 3.2 1B Instruction model on Cloud Run.
Vertex AI wants to make prompting easier and more accesible to developers with two new features: Generate Prompt and Refine Prompt. Generate Prompt works on the basis of you defining your objectives and then it will suggest a Prompt for you. The Refine Prompt works the other way i.e. takes your prompt and provides multiple options to improve upon it. The tools by itself are a good way to evaluate some prompts that you write and get various versions that you could possibly use to talk to your models. Check out the blog post for more details.
Databases
While we did cover this roundup in the last edition, its worth a reminder again that if you are looking for a single post that rounds up all Google Cloud Databases news in the month of October, here is the October 2024 Google Cloud Database news roundup.
What is a Translytical Database? I sure had to look up that definition. One definition said “A translytical database is a unified database that supports transactions, analytics, and operational insights in real-time retaining transactional integrity, efficiency, and scale.” Google was named a Leader in The Forrester Wave™: Translytical Data Platforms, Q4 2024 report. AlloyDB was the key differentiator over here. Check out the blog post for the details and link to download the report.
Data Analytics
Do you think NL2SQL is possibly the future when it comes to interacting with systems that need you to write SQL. What are the current challenges that are there with translating a Natural Language Query into SQL? What are the Google Cloud solutions out there today that help you take a look at NL2SQL and its current state. What is a possible solution (using BigQuery, Gemini, a compute offering) using various Google Cloud services to do this today? Check out this blog post.
Processing a PDF document and being able to answer some queries on it, has become one of the most common use cases of Generative AI solutions. But this is not an easy task and developers are currently grappling with challenges to not just get accurate results, but also how to ingest and make these documents available for querying as they come in. BigQuery in combination with Document AI’s document processing capability wants to change that. Check out this blog post that presents a RAG solution using BigQuery.
Customer Stories
Online E-commerce marketplace for handmade, vintage, and unique items, Etsy, has built their own service platform running on Google Cloud Run, which is a customized platform built on Google Cloud Run that streamlines the development, deployment, and management of microservices. Check out this story of what went into building this Toolbox, their choice of Cloud Run, challenges and the road ahead.
In our second story, check out how Verve, a leading Digital Advertising Solutions provider utilized some of the new offerings from Google Cloud this year: C4 Machines, GKE Gateway, and Custom Compute Classes, to tackle costs, latency and more.
Application Modernization
We’ve all played Pinball ! But not many of us have attempted to connect a Pinball machine to the cloud? This hardware solution was one of the big attractions at Innovators Hive at Cloud Next ’24. At that point, the solution demonstrated how the machine was integrated with Google Cloud to capture events and more. And what’s coming next ? A feature to analyze your game, use AI to give you recommendations and more. Check out this blog post that shows how an existing Pinball machine with a legacy stack was used to develop a custom Pub/Sub solution on Google Cloud to bring it online and what’s next in store for it.
Learn Google Cloud
An ebook — Building a Secure Data Platform with Google Cloud — is available for download. The book is a good overview of key Google Cloud services that help to secure data across different usage patterns. Its a good refresher of how services like Identity and Access Management (IAM), VPC Controls, BigQuery services to share datasets, Sensitive Data Protection for PII Data and more. Check out the blog post for an overview and to download the eBook.
Stay in Touch
Have questions, comments, or other feedback on this newsletter? Please send Feedback.
If any of your peers are interested in receiving this newsletter, send them the Subscribe link.