The Modern AI Stack: Ray
An open-source distributed framework used by Uber, Spotify, Instacart, Netflix, Cruise, ByteDance, and OpenAI
History
In 2016, students and researchers worked on a class project in Berkeley Rise Lab to scale distributed neural network training and reinforcement learning.
That gave rise to the paper Ray: A Distributed Framework for Emerging AI Applications.
Today, Ray is an open-source distributed computing framework for productionizing and scaling Python ML workloads simply.
It solves three key challenges in distributed ML.
- Remove compute constraint: remote access to virtually infinite compute
- Fault-tolerant: automatically reroutes failed tasks to other machines in the cluster
- State management: share data between tasks and coordinate across data
But before we go any deeper, let’s understand why we need Ray.
Hardware crisis
With the LLM and GenAI Boom, there is a growing gap between the demand and supply of computing.
Let’s look at this graph below.
We see that compute demands for training ML systems have been growing 10x every 18 months.
There is a huge gap between training SOTA models and the performance of a single core. Even though specialized hardware provides an impressive increase in performance, it still falls short of satisfying the computing demands, and this gap will only grow exponentially.
Even if model sizes stopped growing, it will take decades for specialized hardware to catch up.
The best solution now is to distribute AI workloads.
But that comes with its challenges.
Challenges of an AI Application
Building AI applications requires developers to stitch together workloads from data ingestion, pre-processing, training, finetuning, prediction, and serving.
This is challenging as each workload requires different systems, each with its own APIs, semantics, and constraints.
With Ray, you have one system to support all these workloads
From their docs:
Each of Ray’s five native libraries distributes a specific ML task:
- Data: Scalable, framework-agnostic data loading and transformation across training, tuning, and prediction.
- Train: Distributed multi-node and multi-core model training with fault tolerance that integrates with popular training libraries.
- Tune: Scalable hyperparameter tuning to optimize model performance.
- Serve: Scalable and programmable serving to deploy models for online inference, with optional microbatching to improve performance.
- RLlib: Scalable distributed reinforcement learning workloads.
How companies are using Ray
OpenAI used Ray to coordinate the training of ChatGPT.
Cohere used Ray to train their LLMs at scale along with PyTorch, JAX and TPU.
Below is an image of Alpa, using Ray to schedule GPUs for distributed training.
Ray solves the two most common challenges for distributed training of generative models:
- How to effectively partition the model across multiple accelerators?
- How to set up your training to be tolerant of failures on preemptible instances?
Companies like Shopify, Spotify, Pinterest, and Roblox, all use leverage Ray to scale their ML infrastructure.
Shopify uses Ray in its Merlin platform to streamline ML workflows from prototyping to production, utilizing Ray Train and Tune for distributed training and hyperparameter tuning.
Spotify employs Ray for parallel model training and tuning to optimize its recommendation systems, while Pinterest utilizes Ray for efficient data processing and scalable infrastructure management.
At Roblox, Ray facilitates large-scale AI inference across hybrid cloud environments, enabling robust, scalable ML solutions
The Core of Ray
Minimalist API
The core API of Ray is just 6 calls.
ray.init()
@ray.remote
def big_function():
...
futures = slow_function.remote() # invoke
ray.get(futures) # return an object
ray.put() # store object in object store
ray.wait() # get objects that are ready
ray.shutdown()
Here’s an example of the Python Counter class becoming an asynchronous function.
Giving Ray a go
After looking at what Ray can do, I decided to give it a try.
Say we have this function is_prime
which computes the sum of all prime numbers up to n.
def is_prime(n):
if n < 2:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
def sum_primes(limit):
return sum(num for num in range(2, limit) if is_prime(num))
Let’s test normal Python for the sum up to 10 million, and perform this calculation 8 times.
%%time
# Sequential execution
n_calculations = 8
limit = 10_000_000
sequential_results = [sum_primes(limit) for _ in range(n_calculations)]
# CPU times: user 12min 31s, sys: 7.56 s, total: 12min 39s
# Wall time: 12min 58s
This took 13 minutes in total!
Let’s see how much Ray can speed this up.
%%time
# Parallel execution
futures = [sum_primes.remote(limit) for _ in range(n_calculations)]
parallel_results = ray.get(futures)
# CPU times: user 477 ms, sys: 366 ms, total: 843 ms
# Wall time: 4min 2s
It’s a 3x improvement!
If you click on the dashboard, you can see the progress of jobs.
I’m just getting started with Ray.
Stay tuned for my next article on computing embeddings with Ray Data!
Want to learn more about Ray?
Watch this 20 min talk Ray, a Unified Distributed Framework for the Modern AI Stack by Ion Stoica, co-founder of Anyscale and DataBricks.
Deeper into the Rabbit Hole 🕳️🐇
If you want a deep dive into the architecture of Ray, check out their whitepaper
The Anyscale blog has a ton of interesting use cases and tutorials.
- Fine-tuning Llama-3, Mistral, and Mixtral with Anyscale
- Building RAG-based LLM Applications for Production
- RAG at Scale: 10x Cheaper Embedding Computations with Anyscale and Pinecone
- Lessons from pre-training Stable Diffusion v2 models on 2 Billion Images
Here is a list of talks from Ray Summit 2023.
A few of my favorites are
- Heterogeneous Training Cluster with Ray at Netflix
- Inference Graphs at LinkedIn Using Ray-Serve
- Build Instacart Training Platform on Ray
- Building Samsara’s Machine Learning Platform with Ray
- Ray Train: A Production-Ready Library for Distributed Deep Learning
Speaking of the summit,
Join Ray Summit 2024!
Ray Summit 2024 is happening next week from Sep 30-Oct 2 in San Francisco!
If you’re in SF and considering attending, grab a ticket with the promo code RodolfoY15 for a sweet discount!
I’ll attend, so please say hi if you catch me there!
Thanks for reading
Be sure to follow the bitgrit Data Science Publication to keep updated!
Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!
Follow Bitgrit below to stay updated on workshops and upcoming competitions!
Discord | Website | Twitter | LinkedIn | Instagram | Facebook | YouTube