So You Want To…. Just Get (Pytorch) Running in Snowpark Container Services

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

9 min readJun 20, 2024

TLDR: A slightly more advanced tutorial that bridges the gap between hello_world and world-beating (Snowpark) Container powered products

Containers can seem complex, but they don’t have to be. One reason for this complexity is the abundance of guides that dive right into advanced topics with titles like “Use Containers to Easily Build for Unlimited Scale and Elasticity by Harnessing Multi-node Container Compute Pools”. These guides often assume a high level of container expertise, packed to the brim with presumed knowledge on yaml files, docker and the command line. In a word, overwhelming. When issues arise, it can be difficult to distinguish between user error and implementation quirks. If you’re like me, you might be dealing with a bit of imposter syndrome, which can make things even more challenging.

Thankfully, Snowflake has simplified the world of containers with its user-friendly Snowpark Container Services. This guide is designed to further ease your journey by starting from the basics. It assumes you have close to zero knowledge about containers and are finding it challenging to bridge the gap between the tutorials and your actual goals.

0. Install Docker

The easiest way to do this is to go to https://docs.docker.com/engine/install/ and download the relevant copy. You’ll need to run this in the background, too, so you might as well double-click on the app once it’s downloaded and installed.

1. Do the SIMPLEST Tutorials

I know I said this was simple, but if you want it really simple, you can’t beat the hello world stuff in our docs on Snowpark Container Services. Follow it blindly and get something running if you’ve never even heard of a container before. Then, jump into the next steps and learn a bit more about what’s going on.

2. Setup Local Files (so you can containerise)

You’re going to need three things to make this all work

Python Script (main.py is a good starter name): what runs in the container, referenced in the Dockerfile. This doesn’t have to be a single file. It could be a big package, but let’s keep it simple for now
Dockerfile: describes how to build a Docker image, things like the base environment, the package imports and how to execute the code inside it. Think of it like the conda create process for building a virtual environment
spec.yaml: tells Snowflake how to run your container, mainly what the hardware will look like. This is snowflake specific, but it’s pretty similar to the docker-compose.yaml.

Python Script

Easy peasy here; create a script to run in Python. This will be no different from anything else you run; it just needs to respect some of Python’s quirks, i.e., the if __name__ bit). It’ll look something like this:

# all my fancy code...

def run_job():
    #do something like run a pytorch neural network that references my fancy code

if __name__ == "__main__":
    run_job()

Note to avoid this being too lengthy here I’ve removed all the complexity, but find an actual script here.

Locally, you’d just navigate to the folder that contained main.py and type “python3 main.py” from the command line to make it run (more on this in the Dockerfile statement below). The critical thing to remember is that this can be arbitrarily complex, so it can be print(“hello world”) or some newfangled LLM that will unseat OpenAI.

Dockerfile

Yes, just Dockerfile; no file type to be seen here. It’s going to look something like this:

ARG BASE_IMAGE=continuumio/miniconda3:4.12.0   
FROM $BASE_IMAGE
RUN conda install python=3.8 && \
    pip install --upgrade pip && \
    pip install torch==2.0.0 && \
    pip install torchdata==0.6.0 && \
    pip install snowflake-ml-python==1.0.12 && \
    pip install snowflake-snowpark-python==1.9.0 \
    pip install accelerate==0.29.3

COPY main.py ./
ENTRYPOINT ["python", "main.py"]

ARG BASE_IMAGE is just the base image is just the environment that this will be based on, in our case it’s “continuumio/miniconda3:4.12.0”, more details can be found here.

After that we see the RUN the pip/conda installs. They’ll run pretty much the same way they do when you’re in your local environment, so just list out all the packages that you need that will be used by your main.py file.

This leads us to COPY main.py, which pulls in the local file(s) that define the actual work the container will do.

Finally, ENTRYPOINT will be used to execute the Python. This is no different from what you would run from the command line to execute your script.

spec.yaml

Ok, last up is the spec.yaml file. This is where you tell Snowflake what actually needs to be done to make this file work.

spec:
  container:
  - env:
      MOUNT_PATH: /dev/shm
    image: /tutorial_db/data_schema/tutorial_repository/my_2job_image:latest
    name: main
    resources:
      limits:
        memory: 192G
        nvidia.com/gpu: 4
      requests:
        memory: 188G
        nvidia.com/gpu: 4
    volumeMounts:
    - mountPath: /opt/training-output
      name: training-output
    - mountPath: /dev/shm
      name: dshm
  volumes:
  - name: training-output
    source: '@tutorial_db.data_schema.tutorial_stage'
  - name: dshm
    size: 10Gi
    source: memory

At first glance, this may seem complex, but it’s actually quite straightforward once you delve into it. The Snowpark Container Services docs are a great resource, you can find them here.

Even so, I’ll call out a few key points:

image: this is the same image that we build and push in steps 4. and 5. below
resources: note we’ve got 4 Nvidia GPUs specified (4 A10Gs) because when we create our compute pool we will be using a GPU_NV_M which has 4 Nvidia A10Gs in it, see here
volumes: this gives us the ability to persist things in a stage within Snowflake between runs of a job

3. Get Snowflake Ready

This is well covered in the tutorials in the Snowflake docs, but in the interest of completeness, I’ve included the SQL here too:

USE ROLE ACCOUNTADMIN;

CREATE ROLE test_role;

CREATE DATABASE IF NOT EXISTS tutorial_db;
GRANT OWNERSHIP ON DATABASE tutorial_db TO ROLE test_role COPY CURRENT GRANTS;

CREATE OR REPLACE WAREHOUSE tutorial_warehouse WITH
  WAREHOUSE_SIZE='X-SMALL';
GRANT USAGE ON WAREHOUSE tutorial_warehouse TO ROLE test_role;

CREATE SECURITY INTEGRATION IF NOT EXISTS snowservices_ingress_oauth
  TYPE=oauth
  OAUTH_CLIENT=snowservices_ingress
  ENABLED=true;

GRANT BIND SERVICE ENDPOINT ON ACCOUNT TO ROLE test_role;

CREATE COMPUTE POOL tutorial_gpu_pool
  MIN_NODES = 1
  MAX_NODES = 1
  INSTANCE_FAMILY = GPU_NV_M;
GRANT USAGE, MONITOR ON COMPUTE POOL tutorial_gpu_pool TO ROLE test_role;

GRANT ROLE test_role TO USER admin --your log in

USE ROLE test_role;
USE DATABASE tutorial_db;
USE WAREHOUSE tutorial_warehouse;

CREATE SCHEMA IF NOT EXISTS data_schema;
CREATE IMAGE REPOSITORY IF NOT EXISTS tutorial_repository;
CREATE OR REPLACE STAGE tutorial_stage;

Some things to consider are:

Don’t think you can just run it all with ACCOUNTADMIN, you can’t. This is deliberate and is a departure for those of us who are used to enjoying god mode privileges that the role provides. test_role is definitely required (or some other equivalent).
The Compute Pools can be anything in this list, in our case we’re using GPUs (GPU_NV_M, which contains 4 Nvidia A10Gs) because neural networks run faster on them.

4. Build a Container, in your local environment

First, in Snowflake run the following:

SHOW IMAGE REPOSITORIES

this will get you something like this:

note we have the repository url (you could infer it, it’s not hard, but it never hurts to be CERTAIN)

Next, from the command line or terminal navigate to the location of your DockerFile on your local machine and run the following:

docker build --rm --platform linux/amd64 -t repository_url/my_2job_image:latest .

where repository_url would be replace with the url above i.e.

sfseeurope-eu-demo211.registry.snowflakecomputing.com/tutorial_db/data_schema/tutorial_repository for me.

docker calls Docker, kinda self explanatory
build this is just telling Docker to build an image
— rm removes intermediate containers created by the build
— platform specifies the platform this is intended for
repository_url/my_2job_image:latest . is the REPOSITORY that you will be creating

Note, make sure you’ve included the “.” at the end; it’s easy to miss if you’re not familiar with Docker

Some things to note, I’m being casual here and just using the “latest” tag, practically speaking it’s much better to use an explicit version tag, explained here.

Now we have a container built, but let’s double-check, just run the following from the command line:

docker image ls

and you’ll be able to see a list of images that Docker has locally.

5. Docker Login via Command Line/Terminal

To enable Docker to push your container into Snowflake you’ll need to login. Run the following from the terminal/command line:

docker login <registry_hostname> -u <username>

registry_hostname takes the form <orgname>-<acctname>.registry.snowflakecomputing.com and will look something like this sfseeurope-eu_demo1234.registry.snowflakecomputing.com
username is the user that you use to login to the Snowflake GUI
password, docker will prompt you for the password, same password that you login to the GUI with

6. Push the Container INTO Snowflake

Local images are nice, but we want them in Snowflake! Simple step here too:

docker push respository_url/my_2job_image:latest

7. PUT the YAML in Snowflake

We’re almost there, we just need to get the YAML into snowflake. Best way to do this is via SNOWSQL, just go to the download page. Once you’ve downloaded it you’ll need to login with your account details by typing

snowsql -a account_name

where account_name will look something like this “sfseeurope-eu_demo1234" (at least it does in my case).

Then after giving your username and password you’ll be able to type in SQL from the CLI. you want to do the following:

USE ROLE test_role;
USE DATABASE tutorial_db;
USE SCHEMA data_schema;
PUT file:///the_file_location @tutorial_stage
  AUTO_COMPRESS=FALSE
  OVERWRITE=TRUE;

You can see the results in the snowflake GUI too.

For those who are that way inclined that blue + Files at the top right is your ticket to an uploaded yaml too

8. Running the Container (as a Service Job)

Then all you need to do is execute the SQL command provided. Note, I’ve been explicit about the database, role, schema etc. just to ensure we’re using the correct ones. Snowpark Container Services is particularly sensitive to role management, more so than other parts of Snowflake. To avoid any potential issues and save yourself the headache, just be explicit:

USE ROLE test_role;
USE DATABASE tutorial_db;
USE SCHEMA data_schema;
USE WAREHOUSE tutorial_warehouse;
EXECUTE JOB SERVICE
IN COMPUTE POOL tutorial_gpu_pool
NAME=tutorial_MT_job_service
FROM @tutorial_stage
SPEC='my_job_spec.yaml';

9. Work Out What Went Wrong

Containers can feel impenetrable when things go wrong, being executed at arms reach in a cloud platform no less. But really, it’s pretty simple, as long as you’ve included a few critical bits of code. In order of importance:

Does it work locally: This is pretty obvious, but there is a temptation to run something the first time in a container when you might well have a dud container or even dud code in the container.
Quick and Dirty: add some print statements into your code that will checkpoint how far things got and let you interrogate the code
Logging: the more “robust” thing to do is set a logging function, but practically speaking, it’s going to be similar to just using print() statements

2. and 3. presume you have somewhere to examine the outputs. Fortunately, you do! All you need to run is this:

SELECT SYSTEM$GET_SERVICE_LOGS('TUTORIAL_DB.data_schema.tutorial_MT_job_service', 0, 'main');

This will return the log files from your attempt, and you can then debug what is going on.

Wrapping Up

I hope this guide is helpful in getting you up and running in containers, giving you the confidence to deploy complex applications knowing you’ve got something a little more than “toy” built already.

My final thought goes out to the those who see containers as a means to train models and not much more. Snowflake has recently gone into Private Preview on container runtimes that make executing code in containers about 2 mouse clicks away from regular python execution in a notebook. Watch this space as things are going to get even easier in the future.

Extra Resources

The full docs can be found here, and my colleagues have written some great blog posts on ways to use Snowpark Container Services. Once you’ve completed the above these are great next steps:

Also — thanks to Caleb Baechtold for his review comments