Distributed Ray as a Native App in Snowpark Container Services

· Introduction
· Quick primer on Native Apps within Snowpark Container Services
· Show me the proof of the Ray cluster
· Provider and Consumer Workflows
As a Native App Provider
As a Native App Consumer (separate snowflake account)
· Ray tests on the Native App
Test 1: Image Classification Batch Inference with PyTorch
Test 2: LLM serving using Ray Serve
· Summary
· Considerations

Introduction

If you have seen my prior blogs (blog 1 and 2), I am a fan of the Ray framework and have spoken about the great possibilities the framework offers for distributed data engineering, ML model training, Large Language Model (LLM) fine tuning, LLM serving and several others.

In this blog, I present a Ray Native app on Snowpark Container Services (SPCS) that can just be shared from a producer Snowflake account into a consumer Snowflake account to get started with a full blown multi-node, multi-GPU Ray cluster in minutes, sitting right next to the data in Snowflake Data Cloud without ever leaving Snowflake Governance boundary. From a consumer point of view, there is no need for any python libraries to install Ray, no need for any docker builds or pushes; and lastly, no need to stitch together infrastructure to setup Ray and associated services. Just get the Ray app from Snowflake private listing (free), give the application a few privileges and configure a Ray cluster from a simple Streamlit app inside Snowflake. That’s it.

See a short video below that shows what a native app consumer needs to do to get that Ray cluster working, in a matter of minutes. The same thing is also shown in Figure 1, which presents a configuration screen that is shown to a Snowflake consumer for setting up a Ray cluster on SPCS.

Ray Native App on Snowpark Container Services — Consumer Workflow
Figure 1: As a consumer — Streamlit app inside SPCS native app for setting up a distributed multi-node, multi-GPU heterogenous Ray cluster on SPCS

Quick primer on Native Apps within Snowpark Container Services

Before I go too deep, here’s a quick primer on Native Apps within Snowpark Container Services.

A Snowflake Native App with Snowpark Container Services (app with containers) is a Snowflake Native App that runs container workloads in Snowflake. Container apps can run any containerized service supported by Snowpark Container Services.

Apps with containers leverage all of the features of the Snowflake Native App Framework, including provider IP protection, security and governance, data sharing, monetization, and integration with compute resources. Figure 2 shows an overview of Native Apps within Snowpark Container Services. To learn more about Native Apps within Snowpark Container Services, you can do this tutorial.

Figure 2: Native Apps in Snowpark Container Services

Show me the proof of the Ray cluster

Figure 3 shows a proof of a Ray cluster running as a native app inside Snowpark Container Services.

Figure 3: Proof of the Ray cluster running inside Snowflake as a native app

This cluster was setup using a Snowflake consumer account through a Streamlit app shown in Figure 1.

All the code I used to deploy this Ray cluster is on Github here. In the following sections, I go through the steps required to build this Ray native application once as a provider and distribute it to multiple consumers of choice as a private listing so that they can get started with Ray in minutes.

Provider and Consumer Workflows

As a Native App Provider

I used Snowflake CLI to convert my ray head, worker, custom worker, grafana and prometheus docker images into a Snowflake Native App within Snowpark Container Services with an installation script comprised of Snowpark Python stored procedures. Read here

Figure 4 shows an overview of the application once installed in my provider account.

Figure 4: As a provider: Ray Native app on Snowpark Container Services installed on provider account

Once the app was available in the provider account, I followed these steps to publish the application package into a private listing. In a nutshell, I had to add a release version / patch in order to create a private listing as shown in Figure 5.

Figure 5: As a provider: Adding a new version/patch for the Ray native app on Snowpark Container Services

Once the version/patch was available, I could add a new listing via Snowflake’s Provider Studio as shown in Figure 6 and share the Ray on Snowpark Container Services application with the Snowflake accounts of my choice, without leaving Snowflake governance boundary.

Figure 6: As a provider: Private Listing of Ray on Snowpark Container Services application

As a Native App Consumer (separate snowflake account)

As a Native App Consumer AccountAdmin

As an Accountadmin of another Snowflake account, lets now set-up some pre-requisites such as a new Snowflake role that could be given privileges to work with the app. Let’s create a new role RAY_CONSUMER_ROLE as shown in the code snippet below and give it privileges to use a warehouse, give the ability to create a network integration and database creation privileges. Let’s also enable a change bundle for 2024_04 which allows us to use port-ranges in SPCS as well as introduces the concept of service roles for endpoints. Last but not the least, some commented template code is also given below which shows how to assign access to databases/tables etc to the application once installed. Note that this is just some quick sample code. Follow Snowflake documentation for best practices around RBAC.

USE ROLE ACCOUNTADMIN;
CREATE ROLE IF NOT EXISTS RAY_CONSUMER_ROLE;
GRANT ROLE RAY_CONSUMER_ROLE TO ROLE ACCOUNTADMIN;
CREATE WAREHOUSE IF NOT EXISTS XSMALL_WH;
GRANT USAGE ON WAREHOUSE XSMALL_WH to role RAY_CONSUMER_ROLE;
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE RAY_CONSUMER_ROLE;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE RAY_CONSUMER_ROLE;
GRANT ROLE RAY_CONSUMER_ROLE to USER PLAKHANPAL;
SELECT SYSTEM$ENABLE_BEHAVIOR_CHANGE_BUNDLE('2024_04');

USE ROLE RAY_CONSUMER_ROLE;
CREATE DATABASE IF NOT EXISTS RAY_CONSUMER_DB;
USE DATABASE RAY_CONSUMER_DB;
CREATE SCHEMA IF NOT EXISTS DEMO_SCH;
USE SCHEMA DEMO_SCH;
USE WAREHOUSE XSMALL_WH;
--GRANT CREATE TABLE ON SCHEMA RAY_consumer_db.demo_sch TO APPLICATION DISTRIBUTED_RAY_ON_SPCS;
--GRANT CREATE STAGE ON SCHEMA RAY_consumer_db.demo_sch TO APPLICATION DISTRIBUTED_RAY_ON_SPCS;
--GRANT CREATE FILE FORMAT ON SCHEMA RAY_consumer_db.demo_sch TO APPLICATION DISTRIBUTED_RAY_ON_SPCS;
--GRANT CREATE VIEW ON SCHEMA RAY_consumer_db.demo_sch TO APPLICATION DISTRIBUTED_RAY_ON_SPCS; — brings in port ranges, as well as management of endpoint access using a service role. See here: https://docs.snowflake.com/en/developer-guide/snowpark-container-services/working-with-services#prerequisite-to-use-this-feature

After creating the RAY_CONSUMER_ROLE, now the consumer AccountAdmin can go into the Private Sharing section of the Snowsight UI and install the application as shown in Figure 7 and Figure 8. In Figure 8, the chosen application name ‘DISTRIBUTED_RAY_ON_SPCS’ will create two new databases, ‘DISTRIBUTED_RAY_ON_SPCS’ and ‘DISTRIBUTED_RAY_ON_SPCS_DATA’.

Figure 7: As a consumer Accountadmin: Private listing of Ray application on Snowpark Container Services
Figure 8: As a consumer Accountadmin: Installing the Ray application in Snowflake consumer account

Once the application is installed on the Snowflake consumer account as an accountadmin, an accountadmin can now accept the grants the application needs as shown in Figure 9.

Figure 9: As a consumer Accountadmin: Grants to be accepted by an Accountadmin

Now, as an AccountAdmin, manage access to the application by granting it to the consumer role RAY_CONSUMER_ROLE. This is shown in Figure 10, where the application role RAY_APP_ROLE has been granted to the role RAY_CONSUMER_ROLE by the consumer Accountadmin.

Figure 10: As a consumer AccountAdmin: Grant access to RAY_CONSUMER_ROLE

As a Native App Consumer with role RAY_CONSUMER_ROLE

Now let’s switch to the RAY_CONSUMER_ROLE and enter into the app. The first time the consumer logs into the application, it will check if an external access integration exists. If not, it will ask the consumer to create an external access integration (currently setup as 0.0.0.0:80 and 0.0.0.0:443) to allow outbound access to the internet to install python libraries using pip and download models from huggingface. As a provider, you can make this network rule be more restrictive if you like (and point to pip urls, huggingface urls, github urls etc). The external access integration is only one time setup and the next time consumer logs into the app, the consumer will not be given the external access integration popup.

After external access integration has been defined, now a consumer can create a heterogeneous Ray cluster to their choice. I chose the following setup for the demonstration: Ray head with GPU_NV_S (1 Nvidia A10G), 3 Ray workers with GPU_NV_S (1 Nvidia A10G each) and 2 custom Ray workers with GPU_NV_M (4 Nvidia A10Gs in each custom worker). This intial UI for setting up the Ray Cluster is seen in Figure 11 and can be reached on SnowSight through Apps-> Installed Apps ->Distributed Ray on Snowpark Container Services -> Launch App. Once the cluster UI shows up and selections are made for the Ray head instance type, Ray worker instance type and number of instances, as well as Ray custom worker and number of custom workers, the heterogenous Ray cluster can be created by clicking on the Create Cluster / Check Status` button. More details on the compute pools available within Snowpark Container Services can be seen here.

Figure 11: As a consumer: Streamlit UI to configure Ray cluster

Once the Ray cluster is created, we can see the status of the Ray head, worker and custom worker containers, as well as a few public endpoints that can be accessed via URLs as shown in Figure 11. Wait for the statuses of Ray head and worker to turn READY before accessing the URLs. If you also chose the Ray custom worker, also wait for the Ray custom worker status to turn READY.

A few of the notable URLs are the notebook URL as well as the Ray dashboard URL. Copy the notebook and the Ray dashboard URL into the browser, authenticate with Snowflake and access both endpoints.

The Ray dashboard is shown in Figure 2 and the JupyterLab interface where Ray code will be executed is shown in Figure 13.

Figure 13: As a consumer: Jupyterlab URL where Ray code is written

Ray tests on the Native App

I did two tests on the distributed Ray native app in Snowpark Container Services. The code for tests below can be found on Github here.

Test 1: Image Classification Batch Inference with PyTorch

The test is derived from the documented pytorch batch inference example on Ray. The code used is in Github above.

Test 2: LLM serving using Ray Serve

The test deploys `lmsys/vicuna-13b-v1.5–16k` model from HuggingFace as a REST API. The code used is in Github above.

Summary

This blog presents an idea on how a complete Ray framework can be brought as a native app on Snowpark Container Services, which highlights how easily providers can build native apps and share with consumers. In this blog, I decided to offer this as a free private listing but a provider can very easily put the same thing on Snowflake Marketplace and make some $.

Considerations

A few considerations before using this app:

  1. Native Apps on SPCS, as well as block storage for SPCS are still in public preview, so not yet recommended in production.
  2. Native Apps currently execute in the owner’s context rather than the caller’s context. See more here. Running native apps in caller’s context is on the roadmap.

--

--