How to Scale Python on Every Major Cloud Provider

Best practices for cloud computing with Python

Robert Nishihara

Published in

Distributed Computing with Ray

4 min readMay 15, 2020

In the words of Pywren creator Eric Jonas, “The cloud is too damn hard!”

Once you’ve developed a Python application on your laptop and want to scale it up in the cloud (perhaps with more data or more GPUs), the next steps are unclear, and unless you have an infrastructure team that’s already set it up for you, “Just use Kubernetes” is not so simple.

So you’ll choose one of AWS, GCP, and Azure and navigate over to the management console to sift through instance types, security groups, spot prices, availability zones, instance limits, and so on. Once you’ve got all of that sorted out and managed to rent some virtual machines, you’ve still got to figure out how to actually run your application on them. How exactly does my Python script divide up the work across a 10 machine cluster? At this point, you may try to rewrite your application using PySpark or mpi4py or Celery.

If that fails you’ll build a brand new distributed system like Uber and Lyft recently did with Fiber and Flyte. In either case, you’re either rewriting your application, building a new distributed system, or both. All to speed up your Python script in the cloud.

We’re going to walk through how to use Ray to launch clusters and scale Python on any of these cloud providers with just a few commands.

Here it is in a diagram. Step 1 is to develop your Python application. This usually happens on your laptop, but you can do it in the cloud if you prefer. Step 2 is to launch a cluster on your cloud provider of choice. Step 3 is to run your application wherever you want (your laptop or in the cloud).

Scale your Python application on any cloud provider with three steps. **Step 1:** develop your application locally (or in the cloud if you prefer). **Step 2:** launch a cluster on your cloud provider of choice. **Step 3:** run your application on the cluster!

Setup

First, install some Python dependencies.

pip install -U ray \
               boto3 \
               azure-cli \
               azure-core \
               google-api-python-client

Next, configure credentials for the cloud provider of your choice. You can skip this step if you’re already set up to use your cloud provider from the command line.

AWS: Configure your credentials in ~/.aws/credentials as described in the boto docs.
Azure: Log in using az login, then configure your credentials with az account set -s <subscription_id>.
GCP: Set the GOOGLE_APPLICATION_CREDENTIALS environment variable as described in docs.

The Ray cluster launcher uses a config file that looks something like this

A minimal cluster config file for use with the Ray cluster launcher.

Here you can specify everything including setup commands to run, instance types to use, files to rsync, autoscaling strategies, and so on.

Fetch a slightly more complete example with additional configurable fields as follows.

# AWS
wget https://gist.githubusercontent.com/robertnishihara/a9ce1d0d434cd52dd9b07eb57f4c3476/raw/dafd4c6bd26fe4599dc3c6b05e80789188b3e2e5/aws-config.yaml
# Azure
wget https://gist.githubusercontent.com/robertnishihara/a9ce1d0d434cd52dd9b07eb57f4c3476/raw/dafd4c6bd26fe4599dc3c6b05e80789188b3e2e5/azure-config.yaml
# GCP
wget https://gist.githubusercontent.com/robertnishihara/a9ce1d0d434cd52dd9b07eb57f4c3476/raw/1846cbf971d1cd708b3d29d9ae50ad882fbaac50/gcp-config.yaml

You’ll need to make a few minor modifications to the above config files:

Azure: Replace ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub with the appropriate key files.
GCP: Set the appropriate project_id.

Step 1: Create a Python Application

Define a simple Python script that we want to scale up.

A simple Python application that can run on your laptop or in the cloud and tracks the IP addresses of the machines that its tasks are executed on. This script uses Ray to parallelize the computation.

You can run the example (actually a slightly more complete one) as follows.

# Fetch the example.
wget https://gist.githubusercontent.com/robertnishihara/a9ce1d0d434cd52dd9b07eb57f4c3476/raw/4313660c0bd40f8bd909f70c1e0abc4be8584198/script.py# Run the script.
python script.py

You’ll see the following output.

This cluster consists of
    1 nodes in total
    16.0 CPU resources in totalTasks executed
    10000 tasks on XXX.XXX.X.XXX

Now run the same script on a Ray cluster. These instructions are for AWS. To use Azure or GCP, simple replace aws-config.yaml with either azure-config.yaml or gcp-config.yaml in both lines.

Step 2: Start the Cluster

# AWS
ray up -y aws-config.yaml
# Azure
ray up -y azure-config.yaml
# GCP
ray up -y gcp-config.yaml

Step 3: Run the Script on the Cluster

# AWS
ray submit aws-config.yaml script.py
# Azure
ray submit azure-config.yaml script.py
# GCP
ray submit gcp-config.yaml script.py

You’ll see the following output.

This cluster consists of
    3 nodes in total
    6.0 CPU resources in totalTasks executed
    3561 tasks on XXX.XXX.X.XXX
    2685 tasks on XXX.XXX.X.XXX
    3754 tasks on XXX.XXX.X.XXX

If it says there is only 1 node, then you may need to wait a little longer for the other nodes to start up.

In this case, the 10,000 tasks were run across three machines.

You can also connect to the cluster and poke around with one of the following commands.

# Connect to the cluster (via ssh).# AWS
ray attach aws-config.yaml
# Azure
ray attach azure-config.yaml
# GCP
ray attach gcp-config.yaml

Shutdown the Cluster

Don’t forget to shutdown your cluster when you’re done!

# AWS
ray down -y aws-config.yaml
# Azure
ray down -y azure-config.yaml
# GCP
ray down -y gcp-config.yaml

Further Details

Want to add support for a new cloud provider? Just extend the NodeProvider class, which usually takes 200–300 lines of code. Take a look at the implementation for AWS.

Learn more about the Ray cluster launcher.
Learn more about Ray from the documentation, GitHub, and Twitter.
Learn more about how to scale machine learning training, hyperparameter tuning, model serving, and reinforcement learning.

If you have any questions, feedback, or suggestions, please join our community through Discourse or Slack! If you would like to see how Ray is being used throughout industry, consider joining us at Ray Summit!