Rayvens: event sources, sinks, and event-driven programming with Ray

Olivier Tardieu
CodeFlare
Published in
3 min readJun 22, 2021

Next-gen cloud-native solutions will tightly couple AI with business logic and events. Ray is a fast-growing platform and ecosystem to power AI applications. With Ray, data scientists can elaborate and train sophisticated machine learning models with a few lines of Python code. Ray offers a serverless experience. Ray programs can run anywhere and seamlessly scale from a single machine to large clusters.

Rayvens is a new Python library that augments Ray with events. With Rayvens, Ray programs can subscribe to event streams, process, and produce events. Moreover, Rayvens can tap into hundreds of data services — event sources and sinks — with little effort by leveraging the entire catalog of Apache Camel components.

Example: Tracking the AAPL stock price

Suppose we built a model of the stock market with Ray. With Rayvens, we can feed market data into our model in real time, and publish insights to a medium of choice such as a Slack channel.

Rayvens requires Ray 1.3. To install Rayvens run:

pip install rayvens

First, we need to initialize Ray and Rayvens:

With Rayvens, we can periodically fetch the AAPL stock price from a public REST service:

We can initialize a stream to output messages to Slack:

Using source.send_to and sink.append we can publish the raw AAPL quotes to Slack:

Every incoming event will result in an invocation of the lambda function, publishing the raw quote to Slack.

To process the quotes, we can use a Ray actor in place of the lambda function:

This code implements a stateful stream operator. The actor instance
remembers the AAPL quote across invocations and compares the live quote with the last quote, publishing the result of the comparison to Slack.

The full example code parses the command-line arguments to obtain the Slack channel name and webhook.

For simplicity, we directly publish to Slack from the model code itself. This is arguably not a proper modular design. We should rather build a model first, then connect this model to a source and a sink. We will illustrate the better design in a future blog post. From there, it is easy to replace our naive market model with any machine learning model built with Ray.

Running Rayvens programs

To run the full example including the Slack notifications, set the environment variables SLACK_CHANNEL and SLACK_WEBHOOK respectively to the desired Slack channel name and corresponding webhook, or omit this step to only run the source component and simply log the AAPL quotes to the console.

To run the example in docker, run:

curl -Lo aapl.py \
https://raw.github.com/project-codeflare/rayvens/main/examples/aapl.py
docker run --rm -i -t -v "$PWD"/aapl.py:/home/ray/aapl.py \
quay.io/ibm/rayvens \
python aapl.py "$SLACK_CHANNEL" "$SLACK_WEBHOOK"

It is also possible to run the example directly on the host if Rayvens is installed as well as a Java 11 JDK, Maven, and the Camel-K client. Run:

python aapl.py "$SLACK_CHANNEL" "$SLACK_WEBHOOK"

To try the example on a Ray cluster, ensure that the Ray cluster.yaml configuration uses the Rayvens image fromquay.io/ibm/rayvens for both head and worker nodes and that Ray nodes request at least 2G memory. Replace the ray.init() statement with ray.init(address='auto'). Run:

ray submit cluster.yaml aapl.py "$SLACK_CHANNEL" "$SLACK_WEBHOOK"

We will review installation, deployment options, and configuration parameters for Rayvens in a future blog post. For now, see here for details.

Conclusions

The Rayvens library makes it easy to work with event and data streams in Ray programs, for instance feeding machine learning models with real-time data and publishing insights to a Slack channel on the fly.

Rayvens is not a competitor to Ray Streaming. The focus of Rayvens is
integration. With Rayvens, Ray programs can leverage the entire catalog of Camel components. To process events, Rayvens relies on Python functions, Ray tasks, and actors.

Like Ray, Rayvens is meant to run anywhere and take care of everything automatically. Rayvens can run on a single machine or on a Ray cluster using a Ray+Rayvens container image. Rayvens transparently manages the Camel integrations that implement the external event sources and sinks.

Rayvens is an open-source project with an Apache 2.0 license. We welcome
feedback and contributions. Stay tuned for more!

Links

--

--