Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Meet the New GPU-Enabled Era of Google Cloud Run

2 min readAug 22, 2024

--

Google Cloud is breaking new ground by introducing NVIDIA L4 GPU support to Cloud Run, now available in public preview. This powerful addition allows developers to deploy AI inference workloads and other compute-intensive applications with unmatched efficiency, offering low serving latency and rapid deployment times.

With this launch, Google Cloud solidifies its position as the only public cloud offering a managed serverless platform powered by GPUs.

But that’s not all.

On the same day, Cloud Functions will be rebranded as ‘Cloud Run functions’ and will inherit all the capabilities of Cloud Run, including GPU support. This significant upgrade is also entering public preview, giving developers access to a more robust and versatile platform.

What’s New with Cloud Run and GPUs?

With the introduction of GPU support, Cloud Run enables developers to run AI inference workloads, including open-source large language models (LLMs), directly on the platform. Initially, the public preview will be available in the us-central1 region, with plans to expand to other regions by Q4 2024, when it reaches general availability (GA).

Here’s what you can expect:

  • Real-Time Inference and Prediction: Utilize lightweight variants of open-source models like Gemma 2B/7B, Llama3–8B, and Mistral-8x7B for real-time predictions.
  • Custom Model Serving: Easily serve custom fine-tuned large language models.
  • Compute-Intensive Applications: From image recognition to video transcoding and streaming, Cloud Run is now equipped to handle a wide range of demanding applications.

Why Choose Cloud Run on GPU?

  • Performance: Cloud Run minimizes infrastructure latency, ensuring your models are served efficiently. It can scale GPU instances within seconds to handle sudden spikes in traffic effortlessly.
  • Cost Efficiency: With Cloud Run, you only pay for what you use. The platform automatically scales down to zero when there’s no incoming traffic, meaning no charges for idle CPU or GPU resources.
  • Developer Velocity: Cloud Run’s serverless architecture and rapid deployment capabilities empower developers to iterate quickly, making testing, releasing, and updating applications faster than ever.

Read the Cloud Run GPU Blog

Read the Cloud Run functions Blog

Sign up for Cloud Run GPUs here

Subscribe to The Cloud Pilot

Follow me on LinkedIn: Udesh Udayakumar

Thanks for reading! The Cloud Pilot, signing off…

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Udesh Udayakumar
Udesh Udayakumar

Written by Udesh Udayakumar

The Cloud Pilot | Google Cloud If you like my articles, - Buy Me a Pizza https://www.buymeacoffee.com/thecloudpilot

No responses yet