Google Cloud Platform Technology Nuggets — March `16`–31, 2024 Edition

Romin Irani

Published in

Google Cloud - Community

12 min readMar 31, 2024

Welcome to the March 16–31, 2024 edition of Google Cloud Platform Technology Nuggets.

Please feel free to give feedback on this issue and share the subscription form with your peers.

Google Cloud Next 2024

We are less than 10 days away from the biggest Google Cloud Event of the year and the excitement is building up. This year, the event is expected to have a large number of technical sessions, based on the feedback received last year.

Several blog posts have started to highlight key sessions to attend from the respective areas: Networking, Dev Practitioners and more. Even if you are not there at the conference, these posts are good reference points to start building out sessions that you would like to watch once they are posted online.

Here are some of them:

Dev Connect at Next ’24: This is one of the key themes of the conference and the post highlights some specific sessions, areas to hang out with fellow practitioners and this is not just for Google Cloud but how all other services Firebase, Android and more come together.
Networking Sessions: 12 must attend networking and network security sessions at Next ’24. Check out the post.

If you are into managing IT, then there is a constant pressure to streamline your infrastructure, manage it seamlessly and keep costs at a minimum. Right, isn’t it? An interesting post highlights top 5 questions IT pros have been asking ranging from reducing costs, evaluating reliability of cloud providers, AI infrastructure, scalability and control requirements and more. The post further highlights the sessions where these questions are likely to get answered. Build out that agenda, I tell you.

Infrastructure

Forrester Research has recognized Google as a Leader in The Forrester Wave™: AI Infrastructure Solutions, Q1 2024. Google received the highest scores of any vendor evaluated in both Current Offering and Strategy categories in the report. Check out the post and download the report.

NVIDIA NeMo is an open-source, end-to-end platform purpose-built for developing custom, enterprise-grade generative AI models. Looking to train models on Google Kubernetes Engine (FGKE) using NVIDIA accelerated computing and NVIDIA NeMo framework? Check out the blog post also discusses a reference architecture that highlights the major components, tools and common services used to train the NeMo large language model using GKE.

Google Cloud VMware Engine is now integrated with Google Cloud NetApp Volumes. This availability enables customers to resize volumes without interruption. Separating out the scaling of compute and storage gives lots of flexiblity and cost control too. Check out the post for more details.

Persistent Disk Asynchronous Replication (PD Async Replication) provides low recovery point objective (RPO) and low recovery time objective (RTO) block storage replication for cross-region active-passive disaster recovery (DR). It is a storage option that provides asynchronous replication of data between two regions. It can be used to manage replication for Compute Engine workloads at the infrastructure-level, instead of the workload-level. Check out the blog post with an example of how it works.

Customers

VPC service Controls (VPC-SC) is a foundational security control that creates an isolation perimeter around managed cloud resources and networks. Via granular ingress and egress rules, you can selectively approve access across perimeter boundaries and play a key role in preventing data exfiltration. Consider CommerzBank, a leading German bank, that is a trusted partner to approx. 26000 corporate client groups and 11 million private and small business customers. With the shift from IP addresses to API endpoints, a new approach was needed to address data sharing and movement needs. Especially around data sharing, CommerzBank had some clear criteria to evaluate any solution and VPC-SC met all these requirements. Check out the post to learn more.

In another interesting customer case study, consider that of Palo Alto Networks. Their solid growth coupled with mergers and acquisitions had lead to more than 170,000 projects on Google Cloud. They did a large exercise a few years back around labelling that helped that identify the team, owner, cost center and environment for these projects (95% coverage). But the final 5% proved to be a challenge till they achieved that with BigQuery ML, which is the built-in machine learning feature in BigQuery. Check out the post.

Containers and Kubernetes

Managing the growth of your GKE clusters required that you had insights into each specific limits like Nodes per cluster, Noder per node pool, pods per cluster and more. Your task just got easier with the introduction of directly monitoring and setting alerts for crucial scalability limits. Check out the post for more details.

Ray, an open-source Python framework designed for scaling and distributing AI workloads is gaining widespread acceptance. While you can run Ray deployments on VMs, the traditional challenges that come with running workloads on your own on VMs i.e. resource efficiency and managing infrastructure come up. A suggested alternative is to deploy Ray on GKE with KubeRay, an open-source Kubernetes operator that simplifies Ray deployment and management. Check out this post that dives into the details on why running Ray on GKE would be the best way for you to go forward.

Continuing with Ray, are you running Ray on Google Kubernetes Engine (GKE)? Here is an essential blog post to read to run Ray securely on GKE. The post delves into areas that you need to address to harden your Ray installation on GKE and a solid summarization of best practices vis-a-vis Kubernetes and GKE constructs like namespaces, RBAC, NetworkPolicy and more. Safer defaults for running Ray with Kubernetes using KubeRay is a focus area and Terraform templates are available to spin up a multi-team environment with sample security configurations.

The final piece on Ray is that of Kueue, a cloud-native queueing system that provides advanced scheduling for Ray applications on GKE. Check out this blog post that shows how KubeRay and Kueue work together to achieve the same.

Speak of training AI models and the need for NVIDIA GPUs comes up regularly. GKE is probably one of the best choices available to deploy, scale and manage custom ML platforms. In an added boost, GKE can now automatically install NVIDIA GPU drivers. This process was manually done before and the automatic installation now even allows for the drivers to be precompiled for the GKE node, which can reduce the time it takes for GPU nodes to startup. Check out the post for more details.

Stanford’s Brain Inferencing Laboratory explore motor systems neuroscience and neuroengineering applications. The relevant data for their research is obtained from experiments on preclininal models and human clinical studies. This data is handled via a complex platform that they have setup for standardized analyses and adhoc analyses. The components of the architecture include Containers, Git, CI/CD and compute clusters, specifically GKE running in Autopilot mode. Check out the blog post for more details.

Identity and Security

This edition is heavy with security updates and let’s begin with zero-day vulnerabilities but what does that mean? A zero-day vulnerability is a security flaw in an application or operating system that has not been discovered, and there is no defense or patch for it. Did you know Google’s Threat Analysis Group (TAG) and Mandiant showed 97 zero-day vulnerabilities were exploited in 2023. Is that an improvement over 2022 or 2021. Find about this and more in this post.

You must have heard about Assured Workloads, which allows companies to run regulated workloads in several Google Cloud’s global regions. Consider the requirement then for your organization to adhere to compliance requirements in more than one geographic region, how do you then use Assured Workloads to create regulatory boundaries using a folder structure? Check out this post.

In Google Cloud, we have the Organization resource right at the top in hierarchy. With the creation of this resource, you get access to Organization Policy Service. As part of a new release, a set of organization policies have been released that enforce the need to fix potentially insecure postures. These policies scan across IAM, Storage and Essential Contacts. For e.g. a few of the policies at the IAM level include disabling service account key creation, disabling service account key upload and more. Check out the post to learn more of these policies that are available to all customers who are creating organization resources and for existing customers, who already have done that, these policies are available with no change required.

Cloud Armor plays an important role in enabling organizations to create a comprehensive DDOS mitigation strategy for their applications. One of the capabilities that plays a key role is that of rate-limiting via which you can curtail traffic to backend resources based on request volume. Check out this post that highlights the rate-limiting features of Cloud Armor, the two types of actions available for rate-based rules (Throttle, Rate-based ban), planning your rate limiting deployment and more.

Handling sensitive data vis-a-vis security, privacy and compliance is an essential requirement for any application handling customer data. Discovery Service, that is part of Google Cloud’s Sensitive Data Protection helps to identify where Sensitive Data resides, which is the first step. Cloud SQL is now supported by the Discovery Service. Earlier it supported BigQuery and BigLake. Check out the post for more details.

Software Supply Chain attacks are common and organizations need to bring in strict controls, especially when they modern applications have heavy dependency on open source software. Check out this joint blog post from Citi and Google, on common open source attack vectors and 3 main criteria to evaluate an OSS vendor.

Security will continue to be a big topic at Cloud Next ’24. The 2nd CISO bulletin for March 2024 drops a hint on key security topics and discussion to expect at the conference. In an earlier edition, we had covered the intersection of Gen AI and Security and the post seems to point towards that and of course, multiple other security areas that will receive attention during the event. The first CISO bulletin for March this month, focused on psychological resilience in cybersecurity leadership.

To end the security updates, a slightly different area this time but nevertheless an important one. Data shows that Knowledge workers spend an average of 63% of their productive time in the browser and nearly half (48%) of all business-critical applications are now browser-based and more. Given this, organizations would be served well to consider an enterprise ready browser like Chrome Enterprise. Check out a recent report by Enterprise Strategy Group, titled “ Assessing Enterprise Browser Market Dynamics: Why Organizations Are Turning to Enterprise Browsers to More Effectively Secure Modern Work Styles,” and the blog post for more details.

Machine Learning

The popular Anthropic’s Claude 3 family of models has started to be available on Vertex AI Model Garden. Claude 3 Sonnet and Claude 3 Haiku are generally available to all customers on Vertex AI. Check out the post on the details and steps to getting started.

Storage, Databases and Data Analytics

BigQuery’s data processing has been extended to Apache Spark. Now announced is the general availability (GA) of Apache Spark stored procedures in BigQuery. It brings Spark together with BigQuery under a single experience, including management, security and billing. Spark procedures are supported using PySpark, Scala and Java code. Check out the post for more details.

Two new SQL features (windowing and gap filling) are now available in preview in BigQuery that simplify time series analysis. The RANGE data type and supporting functions are also available now to complement the analysis. Check out the post for more details.

What are Dataflow streaming modes? Do exactly-once and at-least-once-processing mode ring a bell? Both of these modes are supported now and understanding them is key to addressing scenarios that need low latency , overall cost and more. Check out the post for more details.

High-volume data feeds coming in which need data enrichment to be actionable? A Bigtable and Dataflow combination as illustrated in this blog post can address this requirement. Check out the post.

If you’d like to read reports, TechTarget’s Enterprise Strategy Group (ESG) did an extensive study to compare the quantitative and qualitative benefits that organizations can realize with Google Cloud BigQuery when compared with alternative solutions. Download the full report here and read the post here.

Datastream, the change data capture (CDC) service now supports SQL Server data sources. The addition of this support now provides a way to replicate data from a range of relational sources to several Google Cloud services, such as BigQuery, Cloud Storage, AlloyDB, and Spanner. Check out the post for more details and various possibilities with the new feature.

Finally, cross-cloud functionality of BigQuery Omni and data sharing capabilities of Analytics Hub has made possible access to data stored in Salesforce Data Cloud and combine it with data in Google Cloud in a secure and zero ETL fashion. Check out the post for more details.

Developers and Practitioners

Cloud Run keeps getting better and better. Now available in preview is volume mounts, which enables access to shared data stored in a local file system, across applications. Check out the blog post that highlights how to mount volumes using gcloud commands and scenarios (load vector database, serve static website, etc) where this feature is very useful.

If you are using the popular NoSQL database Couchbase, here is an update that you can now integrate the database to the wider range of Google Cloud services via the Couchbase connector inside of Google’s Integration-Platform-as-a-Service (iPass) solution. Check out this post that highlights how you can use this connector include key features, using other Google Cloud services to analyse Couchbase data and more.

Learn Google Cloud

If you are taking your first steps with Data on Google Cloud or even otherwise and if PostgreSQL is your database of your choice, you have to choose between Spanner, AlloyDB and Cloud SQL. How do you chose one of these services. Check out this post, that focuses on AlloyDB and Spanner and determine if you’d like to chose a specific one or in combination too!