Running Containerized Apps on Snowflake

Published in

Blue Orange Digital

10 min readJan 31, 2024

Snowpark Container Services were released as a private preview feature near the end of last year. It is a fully managed container offering that helps you quickly deploy, manage, and scale containerized applications without having to move data out of Snowflake. It became limited and publicly available in two regions, Europe (London) and Asia Pacific (Mumbai), and is expected to be released soon to the following regions: US West (Oregon), Europe (Frankfurt), Asia Pacific (Tokyo), and Canada (Central).

What are containers?

Containers are lightweight, portable, and executable software packages that include everything needed to run a piece of software, including the code, runtime, libraries, operating system, and system tools. They encapsulate an application and its dependencies in a standalone unit, making it easy to deploy and run consistently across different environments. Containers provide a way to package, distribute, and manage software more efficiently and reproducibly.

Containerization streamlines software deployment, fostering consistency, agility, and cost efficiency. Key benefits include rapid scaling, resource optimization, enhanced security through isolation, support for microservices, and facilitation of DevOps practices. Containers also enable version control, automated orchestration, and vendor-agnostic infrastructure, promoting flexibility in deployment environments.

Docker is one of the most popular containerization platforms, and it played a significant role in popularizing container technology. However, other containerization solutions like podman, containerd, and rkt also exist, providing alternatives for container deployment.

Snowflake strategy of getting the data close to the application processing layer

Snowflake’s strategy of having the application layer close to the data layer is a key aspect of its architecture. This approach is designed to optimize data storage, processing, and analytics, making them faster, easier to use, and more flexible².

Snowflake’s unique architecture consists of three key layers: Database Storage, Query Processing, and Cloud Services. When data is loaded into Snowflake, it reorganizes that data into an internally optimized, compressed, columnar format. This data is stored in cloud storage and is only accessible through SQL query operations run using Snowflake².

This strategy offers several business advantages. It breaks down data silos, enabling seamless access to data. It also allows for on-the-fly scalable compute, handling massive amounts of data across the organization. The results are improved performance, utility and cost efficiency. With Snowflake, customers pay only for the resources they consume, leading to significant cost savings. Snowflake offers a ready-to-use platform, reducing the need for customers to bear setup and maintenance costs and complications. It also integrates data in the cloud across the company, making it easier for businesses to manage and utilize their data. Snowflake’s strategy helps address data integrity and governance challenges. As more customers join Snowflake, the data on its cloud can be exchanged with other customers, enhancing the overall value of the platform. These advantages make Snowflake an attractive solution for businesses looking to leverage their data for insights and decision-making.

A simple hands-on example

Let’s create a simple Python web application as an example and run it locally, on Docker, on Kubernetes, and finally on Snowflake.

The sample application

This is a simple Flask web application that, when accessed at the root URL (“/”), returns a “Hello World” message along with the current timestamp. The application runs on the default HTTP port (80) and is accessible from any network interface, hello.py.

from datetime import datetime
from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
       return "Hello World: "+str(datetime.now())

if __name__ == "__main__":
       app.run(host='0.0.0.0', port=80)

If you run this code as python hello.py. We can open the localhost URL on the browser and see the following screen.

Containerizing the application

With this simple application, it is absolutely possible to encapsulate it in a container, so it is possible to distribute it efficiently across multiple platforms. Keep in mind that Docker is a platform that enables the execution of applications within lightweight isolated containers. Docker uses containerization technology to package an application and its runtime, libraries, and system tools into a single, portable unit. This approach facilitates seamless deployment and scaling, as containers can run consistently on any system that supports Docker. The Docker engine, a key component, manages the creation and execution of containers, abstracting away the underlying infrastructure. This allows developers to focus on building and shipping applications without worrying about compatibility issues or conflicts with the host system, as we can see in the diagram below.

This is done by a configuration file called Dockerfile. It sets up an Alpine Linux-based image, installs Python 3 and Flask, copies a Python script hello.py into the container, and configures the entry point to execute Python 3. The default command runs the hello.py script, which is assumed to contain a Flask application.

FROM alpine:3.15
COPY hello.py /
RUN apk add python3 py3-pip
RUN pip3 install Flask
ENTRYPOINT ["python3"]
CMD ["hello.py"]

Having that ready, we can build the docker application image using the following command: `docker build — platform=linux/amd64 -t hello . `. This command builds a Docker image named “hello” from the Dockerfile in the current directory, specifying the target platform as Linux on AMD64 architecture.

With the image available at the local repository, it is possible to run a container from it locally with the command docker run -p 80:80 hello. Keep in mind that an image is a static package containing all software dependencies, and a container is a live, executable instance created from an image, enabling portable and consistent application deployment. The command will then run a Docker container named “hello,” mapping port 80 on the host to port 80 on the container.

From this point, it would be possible to push this image to remote repositories like Dockerhub or any of the significant clouds like Azure, AWS, or GCP. And now it is also possible to the same to Snowflake.

Distributing the containerized application through an orchestrator

An orchestrator like Kubernetes automates containerized applications’ deployment, scaling, and management, optimizing resource usage and enhancing reliability. It ensures efficient resource utilization, facilitates fault tolerance, and enables self-healing. Key business benefits include faster development cycles, cost savings through optimized resource usage, enhanced reliability, scalability, consistency across environments, and flexibility in multi-cloud deployments.

Using a platform like Kubernetes, instead of deploying individual containers, it is more common to deploy a collection of them called pods. The orchestrator’s responsibility is not limited to managing the single application but mainly to managing the cluster of machines that will run it. In order to do that, it is necessary to execute steps like provisioning the cluster, deploying the application on its machines (also called nodes), enabling a service layer to access the app, and opening the required ports in case of a web application.

Doing all the in a platform like GCP can be done through a GKE (Google Kubernetes Engine) command line interface or a graphical one, in which we would define all these configurations, as shown below.

Notice that even through the GKE UI, a YAML configuration file is generated. It is essential to realize that because we will have to create this file manually with Snowflake.

How it works on Snowflake

Snowpark Container Services, in addition to working with the basic Snowflake objects, such as databases and warehouses, you work with these objects: image repository, compute pool, service, and job. Applications, services and jobs run in a compute pool, which is a collection of one or more virtual machine (VM) nodes. For ML and LLM capabilities, it supports machine types powered with GPUs.

It is a fully managed container offering designed to facilitate the deployment, management, and scaling of containerized applications within the Snowflake ecosystem. This service enables users to run containerized workloads directly within Snowflake, ensuring that data doesn’t need to be moved out of the Snowflake environment for processing. All of this comes with Snowflake platform benefits, most notably ease-of-use, security, and governance features. And you now have a scalable, flexible compute layer next to the robust Snowflake data layer without needing to move data off the platform.

Our development process here will follow the activities described below.

As the first step, we are going to create a Snowflake database role named CONTAINER_USER_ROLE, which grants it specific privileges like creating databases, warehouses, and compute pools, as well as monitoring usage. It also assigns imported privileges to the Snowflake database. Additionally, it grants CONTAINER_USER_ROLE to the ACCOUNTADMIN role and creates a database, a warehouse, and two stages with encryption settings.

// Create an CONTAINER_USER_ROLE with required privileges
USE ROLE ACCOUNTADMIN;
CREATE ROLE CONTAINER_USER_ROLE;
GRANT CREATE DATABASE ON ACCOUNT TO ROLE CONTAINER_USER_ROLE;
GRANT CREATE WAREHOUSE ON ACCOUNT TO ROLE CONTAINER_USER_ROLE;
GRANT CREATE COMPUTE POOL ON ACCOUNT TO ROLE CONTAINER_USER_ROLE;
GRANT CREATE INTEGRATION ON ACCOUNT TO ROLE CONTAINER_USER_ROLE;
GRANT MONITOR USAGE ON ACCOUNT TO  ROLE  CONTAINER_USER_ROLE;
GRANT BIND SERVICE ENDPOINT ON ACCOUNT TO ROLE CONTAINER_USER_ROLE;
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE CONTAINER_USER_ROLE;

// Grant CONTAINER_USER_ROLE to ACCOUNTADMIN
grant role CONTAINER_USER_ROLE to role ACCOUNTADMIN;

// Create Database, Warehouse, and Image spec stage
USE ROLE CONTAINER_USER_ROLE;
CREATE OR REPLACE DATABASE CONTAINER_HOL_DB;

CREATE OR REPLACE WAREHOUSE CONTAINER_HOL_WH
  WAREHOUSE_SIZE = XSMALL
  AUTO_SUSPEND = 120
  AUTO_RESUME = TRUE;
  
CREATE STAGE IF NOT EXISTS specs
ENCRYPTION = (TYPE='SNOWFLAKE_SSE');

CREATE STAGE IF NOT EXISTS volumes
ENCRYPTION = (TYPE='SNOWFLAKE_SSE')
DIRECTORY = (ENABLE = TRUE);

Then we can switch to the ACCOUNTADMIN role, create an OAuth-based security integration named snowservices_ingress_oauth, then switch to the CONTAINER_USER_ROLE role and make a compute pool named CONTAINER_HOL_POOL with specific configurations. Lastly, it creates an image repository in a specified schema and displays the image repositories in that schema.

USE ROLE ACCOUNTADMIN;
CREATE SECURITY INTEGRATION IF NOT EXISTS snowservices_ingress_oauth
  TYPE=oauth
  OAUTH_CLIENT=snowservices_ingress
  ENABLED=true;

USE ROLE CONTAINER_USER_ROLE;
CREATE COMPUTE POOL IF NOT EXISTS CONTAINER_HOL_POOL
MIN_NODES = 1
MAX_NODES = 1
INSTANCE_FAMILY = standard_1;

CREATE IMAGE REPOSITORY CONTAINER_HOL_DB.PUBLIC.IMAGE_REPO;

SHOW IMAGE REPOSITORIES IN SCHEMA CONTAINER_HOL_DB.PUBLIC;

With the initial configuration ready on Snowflake, we can create the development environment on your workstation using conda, with this conda_env.yml.

name: env
channels:
  - https://repo.anaconda.com/pkgs/snowflake
dependencies:
  - python=3.10
  - snowflake-snowpark-python[pandas]
  - ipykernel

Then create the environment with conda env create -f conda_env.yml and activate it conda activate env. After that, you may install the SnowCLI as follows:

pip install hatch
git clone https://github.com/Snowflake-Labs/snowcli
cd snowcli
hatch build && pip install .

Use the following command to create the Snowflake connection snow connection add, which will ask you the following information:

name : CONTAINER_HOL_DB
account name: <ORG>-<ACCOUNT-NAME> # e.g. MYORGANIZATION-MYACCOUNT
username : <user_name>
password : <password>  
role: CONTAINER_USER_ROLE  
warehouse : CONTAINER_HOL_WH
Database : CONTAINER_HOL_DB  
Schema : public  
connection :   
port : 
Region :

Use the following command to test the connection snow connection test — connection “CONTAINER_CONN”. Back to Snowflake you can check that the container registry is up and running with:

use role CONTAINER_user_role;
show image repositories in schema CONTAINER_HOL_DB.PUBLIC;

You can connect to this registry from your local workstation using the following command based on the information displayed. Remember that you must have the Docker Desktop installed in your system. When prompted, inform your Snowflake password.

docker login <snowflake_registry_hostname> -u <user_name>

With our pre-built container image locally available, you can tag it with the repository information and push it into the registry.

docker tag hello:latest <repository_url>/hello:dev
docker push <repository_url>/hello:dev

You can double-check if the image was correctly registered on Snowflake:

USE ROLE CONTAINER_USER_ROLE;
CALL SYSTEM$REGISTRY_LIST_IMAGES('/CONTAINER_HOL_DB/PUBLIC/IMAGE_REPO');

Similarly, what we did on Kubernetes we can configure the service and deploy the app with the following YAML file. This YAML defines a Kubernetes pod specification named “hello” with a container called “hello,” utilizing a specific container image, mounting a volume for “/home/hello,” exposing an endpoint on port 80 publicly, and configuring network policies to allow internet egress.

spec:
  containers:
    - name: hello
      image: <repo-url>/hello:dev
      volumeMounts:
        - name: hello-home
          mountPath: /home/hello
  endpoints:
    - name: hello
      port: 80
      public: true
  volumes:
    - name: hello-home
      source: "@volumes/hello"
      uid: 1000
      gid: 1000
  networkPolicyConfig:
      allowInternetEgress: true

Let’s upload this file to its appropriate staging area in Snowflake as follows:

snow object stage copy ./hello.yaml @specs — overwrite — connection CONTAINER_CONN

Double check if it is available there, by

USE ROLE CONTAINER_USER_ROLE;
LS @CONTAINER_HOL_DB.PUBLIC.SPECS;

Similarly to what was done with Kubernetes, with it is possible to create the service:

use role CONTAINER_user_role;
create service CONTAINER_HOL_DB.PUBLIC.hello
    in compute pool CONTAINER_HOL_POOL
    from @specs
    spec='hello.yaml';

And follow up if it is up and running:

CALL SYSTEM$GET_SERVICE_STATUS('CONTAINER_HOL_DB.PUBLIC.hello');
CALL SYSTEM$GET_SERVICE_LOGS('CONTAINER_HOL_DB.PUBLIC.hello', '0', 'hello',10);

Once you have it running, get the URL created for the app with:

SHOW ENDPOINTS IN SERVICE hello;

Make sure that your user doesn’t have ACCOUNTADMIN, SECURITYADMIN, or ORGADMIN as the default role.

ALTER USER <yser> SET DEFAULT_ROLE = CONTAINER_USER_ROLE;

Access the generated URL and log in with your user! You should see something like this: