Scaling Python with Ray

Sigal Shaharabani
Israeli Tech Radar
Published in
2 min readNov 10, 2022

Lately I’ve been working on an application based on Celery, our motivation was to create distributed workflows whose scale can be controlled by code. While working on this project (which was successfully completed) I stumbled upon Ray, which made me think about how I may write scalable workflows in a year or two.

In this post, I hope to open a view at Ray.

Disclaimer: This is not an attempt to belittle or criticize Celery or any other tool to scale Python (or any other language) code.

Magnify
Photo by Emiliano Vittoriosi on Unsplash

What is it?

From the Ray website:

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for simplifying ML compute.

In my words it is another framework to build Python workloads that run in scale. In my perspective it competes with Celery and Dask. The reason I think it is worth my look and perhaps yours is:

  1. It allows specifying required resources for tasks
  2. It seems to be AI and ML oriented
  3. More focused on parallelizing Python

Ray libraries

  • Core — Scale general Python applications
  • Datasets — Scale data loading & processing
  • Train — Scale deep learning
  • Tune — Scale hyperparameter tuning
  • Serve — Scale model serving
  • RLib — Scale reinforcement learning

Key concepts

Tasks

Asynchronous Ray functions are called “tasks”. Ray enables tasks to specify their resource requirements in terms of CPUs, GPUs, and custom resources.

To create a task decorate your function with:

@ray.remote

Actors

Actors extend the Ray API from functions (tasks) to classes. An actor is essentially a stateful worker. Like tasks, actors support CPU, GPU, and custom resource requirements.

To create an actor decorate your class with the same decorator:

@ray.remote

Objects

Tasks and actors create and compute on objects.

Note: For more specific details about the decorators, I recommend checking out the core API documentation.

Running Ray

While it is always fun to run our code on our local machine with:

ray.init()

We are more interested in running the application on a Ray cluster, options are available to deploy it yourself, to popular cloud platforms, and also with Kubernetes.

Full details are available here.

Integrations

Ray offers support for a variety of libraries mostly focusing on Machine learning. To learn more see here.

Resources

--

--