The Modern AI Stack: Ray

An open-source distributed framework used by Uber, Spotify, Instacart, Netflix, Cruise, ByteDance, and OpenAI

Benedict Neo
bitgrit Data Science Publication
6 min readSep 24, 2024

--

image by author

History

In 2016, students and researchers worked on a class project in Berkeley Rise Lab to scale distributed neural network training and reinforcement learning.

That gave rise to the paper Ray: A Distributed Framework for Emerging AI Applications.

Today, Ray is an open-source distributed computing framework for productionizing and scaling Python ML workloads simply.

It solves three key challenges in distributed ML.

  • Remove compute constraint: remote access to virtually infinite compute
  • Fault-tolerant: automatically reroutes failed tasks to other machines in the cluster
  • State management: share data between tasks and coordinate across data

But before we go any deeper, let’s understand why we need Ray.

Hardware crisis

With the LLM and GenAI Boom, there is a growing gap between the demand and supply of computing.

Let’s look at this graph below.

Ray, a Unified Distributed Framework for the Modern AI Stack

We see that compute demands for training ML systems have been growing 10x every 18 months.

There is a huge gap between training SOTA models and the performance of a single core. Even though specialized hardware provides an impressive increase in performance, it still falls short of satisfying the computing demands, and this gap will only grow exponentially.

Even if model sizes stopped growing, it will take decades for specialized hardware to catch up.

The best solution now is to distribute AI workloads.

But that comes with its challenges.

Challenges of an AI Application

Building AI applications requires developers to stitch together workloads from data ingestion, pre-processing, training, finetuning, prediction, and serving.

This is challenging as each workload requires different systems, each with its own APIs, semantics, and constraints.

Ray, a Unified Distributed Framework for the Modern AI Stack

With Ray, you have one system to support all these workloads

Stack of Ray libraries — unified toolkit for ML workloads.

From their docs:

Each of Ray’s five native libraries distributes a specific ML task:

  • Data: Scalable, framework-agnostic data loading and transformation across training, tuning, and prediction.
  • Train: Distributed multi-node and multi-core model training with fault tolerance that integrates with popular training libraries.
  • Tune: Scalable hyperparameter tuning to optimize model performance.
  • Serve: Scalable and programmable serving to deploy models for online inference, with optional microbatching to improve performance.
  • RLlib: Scalable distributed reinforcement learning workloads.

How companies are using Ray

Ray, a Unified Distributed Framework for the Modern AI Stack

OpenAI used Ray to coordinate the training of ChatGPT.

Cohere used Ray to train their LLMs at scale along with PyTorch, JAX and TPU.

Below is an image of Alpa, using Ray to schedule GPUs for distributed training.

How Ray Solves Generative AI & LLM Infrastructure Challenges

Ray solves the two most common challenges for distributed training of generative models:

  • How to effectively partition the model across multiple accelerators?
  • How to set up your training to be tolerant of failures on preemptible instances?

Companies like Shopify, Spotify, Pinterest, and Roblox, all use leverage Ray to scale their ML infrastructure.

Shopify uses Ray in its Merlin platform to streamline ML workflows from prototyping to production, utilizing Ray Train and Tune for distributed training and hyperparameter tuning.

Spotify employs Ray for parallel model training and tuning to optimize its recommendation systems, while Pinterest utilizes Ray for efficient data processing and scalable infrastructure management.

At Roblox, Ray facilitates large-scale AI inference across hybrid cloud environments, enabling robust, scalable ML solutions

The Core of Ray

Minimalist API

The core API of Ray is just 6 calls.

ray.init()

@ray.remote
def big_function():
...

futures = slow_function.remote() # invoke

ray.get(futures) # return an object
ray.put() # store object in object store
ray.wait() # get objects that are ready

ray.shutdown()

Here’s an example of the Python Counter class becoming an asynchronous function.

Ray, a Unified Distributed Framework for the Modern AI Stack

Giving Ray a go

After looking at what Ray can do, I decided to give it a try.

Say we have this function is_prime which computes the sum of all prime numbers up to n.

def is_prime(n):
if n < 2:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True

def sum_primes(limit):
return sum(num for num in range(2, limit) if is_prime(num))

Let’s test normal Python for the sum up to 10 million, and perform this calculation 8 times.

%%time
# Sequential execution
n_calculations = 8
limit = 10_000_000
sequential_results = [sum_primes(limit) for _ in range(n_calculations)]

# CPU times: user 12min 31s, sys: 7.56 s, total: 12min 39s
# Wall time: 12min 58s

This took 13 minutes in total!

Let’s see how much Ray can speed this up.

%%time
# Parallel execution
futures = [sum_primes.remote(limit) for _ in range(n_calculations)]
parallel_results = ray.get(futures)

# CPU times: user 477 ms, sys: 366 ms, total: 843 ms
# Wall time: 4min 2s

It’s a 3x improvement!

If you click on the dashboard, you can see the progress of jobs.

what it looks like in the dashboard

I’m just getting started with Ray.

Stay tuned for my next article on computing embeddings with Ray Data!

Want to learn more about Ray?

Watch this 20 min talk Ray, a Unified Distributed Framework for the Modern AI Stack by Ion Stoica, co-founder of Anyscale and DataBricks.

Deeper into the Rabbit Hole 🕳️🐇

If you want a deep dive into the architecture of Ray, check out their whitepaper

The Anyscale blog has a ton of interesting use cases and tutorials.

Here is a list of talks from Ray Summit 2023.

A few of my favorites are

Speaking of the summit,

Join Ray Summit 2024!

https://raysummit.anyscale.com

Ray Summit 2024 is happening next week from Sep 30-Oct 2 in San Francisco!

If you’re in SF and considering attending, grab a ticket with the promo code RodolfoY15 for a sweet discount!

I’ll attend, so please say hi if you catch me there!

Thanks for reading

Be sure to follow the bitgrit Data Science Publication to keep updated!

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit below to stay updated on workshops and upcoming competitions!

Discord | Website | Twitter | LinkedIn | Instagram | Facebook | YouTube

--

--

bitgrit Data Science Publication
bitgrit Data Science Publication

Published in bitgrit Data Science Publication

We’re democratizing AI with our online competition platform — bitgrit.net. On our publication, we publish only high-quality data science-related topics. Become a writer by emailing us at: info@bitgrit.net