Ray Tutorial

Seçil Şen
Codable
Published in
6 min readOct 12, 2022

This post aims to give a brief introduction of Ray, how it works and why it might be intriguing for the AI community. We’ll build a simple NLP application and distribute its tasks across multiple clusters. Prior knowledge in NLP and transformers are required to properly understand this post.

Introduction.

Ray is an open-source framework that enables the distribution of python applications. It is tailored considering the needs of AI/ML systems that have tasks like data preprocessing, hyperparameter tuning, model training, and serving. All those tasks are compute-intensive; require physical resources, preferably remote resources. Thus, the idea of distributing those tasks across clusters is attractive for most AI researchers. However, distributing pure python applications without any hands-on tool sounds compelling because the researcher has to schedule the processes, develop a failure management mechanism and handle interprocess communication while dealing with AI tasks’ inner challenges. Here, Ray comes into play. It addresses all those challenges of AI systems. Moreover, it is very simple; with few lines of API calls, one can distribute, manage and serve an application across clusters without considering distributed system management. Ray has different libraries that target all those challenges, Ray Tune for hyperparameter tuning, Ray Serve for serving a python application, and RLib for special needs of Reinforcement Learning. To use the aforementioned libraries, your python application must be decorated with Ray Core API. In this tutorial, we’ll go through the details of Ray Core API.

Before we start, I’ll introduce the Ray Core API components. Ray uses the actor model to tackle state management. Each Ray actor can be considered as a stateful component. Every Python class can be converted into a service by only adding the “@ray.remote” decorator at the beginning of the class declaration. While creating an object from the decorated class, simply calling theremote()” function turns that object into an actor and each component of that object can be considered as stateless. Ray itself handles thread safety and schedules depending tasks across clusters.

An Example.

As mentioned, we will build a simple transformer-based NLP system where the logical relationship of a pair of two sentences will be determined. NLP community calls it a Natural Language Inference (NLI) task where those sentences are named as hypothesis and premise respectively and the relationships are categorized into three types:

  • Entailment: the hypothesis can be inferred from the premise.
  • Contradiction: the negation of the hypothesis can be inferred from the premise.
  • Neutral: all the other cases.

So our system will read two different sentences, feed them to an NLI model that will predict the relationship type, and output its prediction (entailment, contradiction, or neutral). For the sake of our example’s simplicity, we’ll use a pre-trained NLI model from huggingface transformers library. So, make sure that you have already installed PyTorch, Ray, and Transformers. If the installations are successful, you’re all set.

First of all, we have to start the Ray.

Now we run Ray on our local machines. If we want to run on a cluster, we need to pass the cluster location as a parameter to the “init()” function.

The NLI model expects two different sentences. Let’s say we have two different text resources and we would like to assign different workers responsible for reading those lists sequentially. Since those workers will perform identical operations, we can create an actor class called ReaderWorker and instantiate two different objects from that class.

Here, our text resources are simple python lists. The reader can read batches of sentences. As mentioned, we define a simple python class and by only adding the “@ray.remote” decorator, we turn that class into a service. Now we can create two different actors from that service:

The difference between object creation in Python and actor creation in Ray is obvious here: the “remote()” function call. The logic of passing parameters while initializing an actor is noticeable as well; simply pass desired parameters to the “remote()” function in the same order as the “__init__” function, “remote()” function handles the rest.

Now we need a service that loads the transformer model, takes two sentences, passes them to the model, and outputs the result respectively. Let’s name it as NLIWorker class and create an actor from that.

We are using ‘cross-encoder/nli-distilroberta-base’ model from huggingface as the NLI model. Now our “nli_worker” actor is ready for the prediction. However, the system has to somehow beware of the states of two reader actors since the NLI model might work with batches, meaning that the system has to wait for two reader actors’ output at each step before feeding the next batch into the NLI worker. Otherwise, the NLI model might throw an error. This is a common problem while working with clusters in distributed systems because each cluster might work on resources with different computational powers and solving it might require communicating with the OS in the current machine manually. As AI researchers, we don’t want to waste our time with complicated OS operations. Moreover, we might want to run our workers in different machines with different internal states and it sounds so hard to maintain. Thanks to Ray (again), this problem is solved by utilizing the “ray.wait()” function. Below code shows how the “ray.wait()” function works:

Each reader worker has “get_next_batch()” function that reads the given list. The functions of service classes can be called by the “remote()” function in the same manner. However, when a function of an actor is called with the “remote()” function, “remote()” returns an object id pointing to that operation. This way, Ray identifies and communicates with that operation if requested. For each batch call, we create object ids of workers and pass them as parameters to the “ray.wait()” function to make Ray wait for that operation before starting the next operation in the queue. “ray.wait()” returns two lists: “ready” and “not_ready”. Those lists contain the “object_ids” that have been terminated and not terminated yet, respectively. Checking those lists’ contents might be crucial for the sake of such applications but we skip that part for now. We only check the length of the “ready” list here. If both operations are terminated, their output will be fed through the NLI model. The list “ready” has the outputs of the workers in the same order as passed in the parameters of “ray.wait()”. So reader[0] and reader[1] refer to the object ids of the produced batch of workers 1 and 2 respectively. Beware that, it is not the value of the batches, the object ids of the batches which represent those batches in the Ray Object Store since Ray can recognize its objects only with their ids. It is similar to the pointer mechanism but more readable. If we call “ray.get(object_id)”, Ray checks its object store and matches the values with the object ids. For more information on Ray Object Store, please refer to this tutorial series.

The output of the whole example is shown below:

Premise list for batch #0:
['A man inspects the uniform of a figure in some East Asian country.', 'An older and younger man smiling.']
Hypothesis list for batch #0:
['The man is sleeping.', 'Two men are smiling and laughing at the cats playing on the floor.']
Results for batch #0:


Premise list for batch #1:
['A black race car starts up in front of a crowd of people.', 'A soccer game with multiple males playing.']
Hypothesis list for batch #1:
['A man is driving down a lonely road.', 'Some men are playing a sport.']
Results for batch #1:


(NLIWorker pid=304472) ['contradiction', 'neutral']
(NLIWorker pid=304472) ['contradiction', 'entailment']

Conclusion.

As we can see from our example, Ray’s abstraction mechanism is simple yet powerful. If you want to learn more about Ray, the tutorials listed below might be useful:

Hope you enjoyed!

--

--

Seçil Şen
Codable
Writer for

Ongoing PhD in Computer Engineering, Boğaziçi University