Democratizing AI: How GKE Makes Machine Learning Accessible

Abdellfetah SGHIOUAR
Google Cloud - Community
5 min readDec 21, 2023

--

Image of a robot with GKE written on it’s chest
Image generated from Vertex AI Studio using Imagen 2

Generative AI has kept the GKE product team busy over the last year. We put together this article with a curated list of many of the new features we have released on GKE especially useful for Machine Learning, Artificial Intelligence and Large Language Models. We also listed some Open Source and community projects that work well on GKE.

This article is largely based on content authored originally by Nathan Beach with the help of Marcus Johansson.

GPUs

Graphics Processing Units are a very common type of Hardware Accelerators used to perform resource-intensive tasks, such as Machine learning (ML) inference and training and Large-scale data processing. In GKE Autopilot and Standard, you can attach GPU hardware to nodes in your clusters, and then allocate GPU resources to containerised workloads running on those nodes.

  • A3 VM, powered by NVIDIA H100 GPUs, is generally available The A3 VM is optimised for GPU supercomputing and offers 3x faster training and 10x greater networking bandwidth compared to the prior generation. A3 is also able to operate at scale, enabling users to scale models to tens of thousands of NVIDIA H100 GPUs.
  • G2 VM with NVIDIA L4 GPUs offers great inference performance-per-dollar The G2 VM became GA earlier this year, but we recently announced fantastic MLPerf results for the G2, including up to 1.8x improvement in performance per dollar compared to a comparable public cloud inference offering.
  • GPUs slicing on GKE: When using GPUs with GKE, Kubernetes allocates one full GPU per container even if the container only needs a fraction of the GPU for its workload, which might lead to wasted resources and cost overrun. To improve GPU utilisation, multi-instance GPUs allow you to partition a single NVIDIA A100 GPU in up to seven slices. Each slice can be allocated to one container on the node independently.
  • GPU dashboard available on the GKE cluster details page: When viewing a specific GKE cluster details in the Cloud Console, the Observability tab of the GKE cluster now includes a dashboard for GPU metrics. This provides visibility into utilisation of GPU resources, including utilisation by GPU model and by Kubernetes node.
  • Autopilot now supports L4 GPUs in addition to existing support for NVIDIAs T4, A100, and A100–80GB GPUs.
  • Automatic GPU driver installation is available in GKE 1.27.2-gke.1200 and later, which enables you to install NVIDIA GPU drivers on nodes without manually applying a DaemonSet.

TPUs

TensorFlow Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. Compared to GPUs which are general purpose processing units that support many different applications and software. TPUs are optimised to handle massive matrix operations used in neural networks at fast speeds. GKE supports adding TPUs to nodes in the cluster to train machine learning models.

Orchestration and resource management

Ray on GKE

Ray.io is an open-source framework to easily scale up Python applications across multiple nodes in a cluster. Ray provides a simple API for building distributed, parallelized applications, especially for deep learning applications.

Recently published Resources & Tutorials

Visit g.co/cloud/gke-aiml for helpful resources about running AI workloads on GKE.

--

--

Abdellfetah SGHIOUAR
Google Cloud - Community

Google Cloud Engineer with a focus on Serverless, Kubernetes, and Devops Methodologies. A supporter and contributor to OSS. Podcast Host @cloudcareers.dev