Scaling Python with Ray
Lately I’ve been working on an application based on Celery, our motivation was to create distributed workflows whose scale can be controlled by code. While working on this project (which was successfully completed) I stumbled upon Ray, which made me think about how I may write scalable workflows in a year or two.
In this post, I hope to open a view at Ray.
Disclaimer: This is not an attempt to belittle or criticize Celery or any other tool to scale Python (or any other language) code.
What is it?
From the Ray website:
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for simplifying ML compute.
In my words it is another framework to build Python workloads that run in scale. In my perspective it competes with Celery and Dask. The reason I think it is worth my look and perhaps yours is:
- It allows specifying required resources for tasks
- It seems to be AI and ML oriented
- More focused on parallelizing Python
- Core — Scale general Python applications
- Datasets — Scale data loading & processing
- Train — Scale deep learning
- Tune — Scale hyperparameter tuning
- Serve — Scale model serving
- RLib — Scale reinforcement learning
Asynchronous Ray functions are called “tasks”. Ray enables tasks to specify their resource requirements in terms of CPUs, GPUs, and custom resources.
To create a task decorate your function with:
Actors extend the Ray API from functions (tasks) to classes. An actor is essentially a stateful worker. Like tasks, actors support CPU, GPU, and custom resource requirements.
To create an actor decorate your class with the same decorator:
Tasks and actors create and compute on objects.
Note: For more specific details about the decorators, I recommend checking out the core API documentation.
While it is always fun to run our code on our local machine with:
We are more interested in running the application on a Ray cluster, options are available to deploy it yourself, to popular cloud platforms, and also with Kubernetes.
Full details are available here.
Ray offers support for a variety of libraries mostly focusing on Machine learning. To learn more see here.
- https://www.ray.io/ Ray website
- https://github.com/ray-project/ray Ray code
- https://www.youtube.com/watch?v=LmROEotKhJA Excellent lecture about Ray