Using Lifecycle Configuration Scripts with Amazon SageMaker for Snowpark

Contributors: Bosco Albuquerque (AWS Senior Partner Solution Architect), Kris Skrinak (AWS Machine Learning Segment Lead)

Snowflake and AWS have many joint customers using Amazon SageMaker Studio with the Snowflake Python packages to use the robust SageMaker Studio IDE and all of the rich functionality of Snowpark to push down data prep and deploy models.

In this blog, we will outline the SageMaker Studio IDE and its components then walk through the easiest way to set up Lifecycle Configuration Script to use Snowflake packages. If all you’re looking to do is set up the Lifecycle Configuration Script feel free to move directly to the section “Setting up a Lifecycle Configuration Script”.

The Amazon SageMaker Studio IDE

SageMaker Studio Notebooks provide a more integrated experience compared to the traditional SageMaker Notebook Instances. Let’s dive into the components:

Instances:

In the context of Amazon SageMaker, an instance is compute or the underlying EC2 (Elastic Compute Cloud) instance or server that runs your Jupyter notebook or performs the tasks you’ve initiated.

These instances come in various types, optimized for different tasks (compute, memory, GPU, etc.), and you can select the appropriate type based on the requirements of your workload with a lot of the complexities abstracted away

Apps:

In the context of Amazon SageMaker Studio, an “App” Jupyter instance you’re interacting with.

Amazon SageMaker Studio allows you to have multiple Apps for different tasks. For instance, you can have one App running JupyterLab for data visualization, another for model training, and yet another for monitoring.

Each App is isolated, which means you can safely work on different parts of your ML workflow without them interfering with each other.

Sessions:

A session in SageMaker Studio refers to a user’s active workspace. When you log into SageMaker Studio, a session is created for you.

The session maintains your environment’s state, which includes any running Apps, associated instances, and other activities.

If you close a session (e.g., closing your browser or being inactive for an extended period), you can reopen it later, and SageMaker Studio restores your environment to its previous state. However, there’s a limit to how long a session stays active in the background. If you exceed that limit, you might lose unsaved changes.

It’s essential to understand that while you can close your App (like a JupyterLab interface) and still maintain your session’s state, terminating the underlying instances will lose the session state.

Environments:

An environment in SageMaker Studio encapsulates a specific set of Python libraries and dependencies. It provides an isolated space where you can install specific packages without affecting other environments or the main system.

This is especially useful in data science and machine learning workflows, where one project might require one version of a library (e.g., TensorFlow 2.x) and another project might require a different version (e.g., TensorFlow 1.x).

Put it all together

Think of an instance as the underlying computer or server running your tasks.

An App is like a specific software or program running on that computer, such as the Jupyter interface.

A session is your user experience encompassing everything you’re doing in SageMaker Studio at any given time. It’s like your ongoing ‘login’ or ‘workspace’ that keeps track of your activities.

An environment can be thought of as a customized setup on that computer. It determines the specific tools, libraries, and Python version you have available. Just as on a personal computer where you might have different user accounts or profiles with distinct settings and apps, in SageMaker Studio, you can switch between different environments tailored for specific tasks or projects.”

Together, these components ensure that SageMaker Studio offers a flexible, efficient, and customizable workspace for machine learning and data science workflows.

Setting up a Lifecycle Configuration Script

Start by going to the Sagemaker service and select “Domain” then select whatever Sagemaker domain you will be working in.

Once in the domain select “Environment” then scroll down to the section for “Lifecycle configurations for personal Studio apps” and select “Attach”. Here you will be able to create a configuration script and attach it at the same time.

Next, select “New configuration” then select “Jupyter kernel gateway app” and provide some sort of name to the lifecycle configuration script, here we’re naming the script “snowpark”. Copy and paste the provided script into the “scripts” window then select “Attach to domain”.

# Activate the conda environment where Jupyter is installed:
eval "$(conda shell.bash hook)"
# environment creation
conda create -n snowpark_env python=3.8 -y
source activate snowpark_env
# OVERVIEW
# This script installs a the snowpark package and related packages
pip install snowflake-snowpark-python
pip install pandas
pip install notebook
pip install scikit-learn
pip install cachetools
pip install pyarrow==10.0.1

Now go into Amazon SageMaker Studio and into the launcher and select change environment. Select an image that use Python 3.8, here we’re using a Tensorflow image with CPUs and Python 3.8.The kernel should be Python 3 and select the startup script that you just created.

Next select “Create notebook” and within 2–3 minutes you should have a notebook with the appropriate packages installed from the lifecycle configuration script.

To test it works, please copy and paste the below code into a code block and run it. These simple imports should run and you are now on your way to suing Snowpark in Studio!

from snowflake.snowpark.session import Session
from snowflake.snowpark.functions import *
from snowflake.snowpark.types import *

Conclusion

In this walkthrough we familiarize you with the Amazon SageMaker Sstudio IDE to help you getand get you up and running with Snowpark in Amazon SageMaker Studio with Lifecycle Configuration scripts.

Many of templates out there for Lifecycle Configuration scripts that have become stale, it is our intent to keep this walkthrough updated and working. In the future this script will become more simple as users can utilize one simple command of pip install snowflake to install all snowflake distribution packages.

For additional information related to using Snowpark with Amazon SageMaker Studio you can leveragepick up with this quickstart to prepare data then build and deploy a model with Snowflake and Amazon SageMmaker.

--

--