Getting started with Amazon SageMaker Studio Lab

Ioan Catana
10 min readDec 3, 2021

--

Amazon SageMaker Studio Lab was released at re:invent 2021. As a ML learner, what should I know about?

Amazon SageMaker Studio Lab image (Image by authors)
Amazon SageMaker Studio Lab image (Image by authors)

Joint post by Mia Chang and Ioan Catana. Reviewed by Simon Zamarin, Swagat Kulkarni , Joe Pringle, Michele Monclova, Emily Webber, Antje Barth, and David Stone.

Becoming a data scientist is a journey that is both challenging and rewarding, and the best way to learn is with hands on labs, tutorials, and experimentation. Unfortunately, aspiring data scientists often struggle to get access to compute resources to learn, experiment, and build projects. One option is to invest in your own hardware, but on some laptops even getting Python installed correctly can be an uphill battle, even for people who are already familiar with using it! Cloud-hosted ML environments are easier to set up, but compute resources are often expensive. If not, they might lack persistent storage to store custom environments, datasets, and code.

AWS just announced Amazon SageMaker Studio Lab at re:Invent 2021 to address these challenges and eliminate the setup hassle, but still provide enough power and flexibility to allow users to conduct meaningful activities and experiments. SageMaker Studio Lab is free, and it provides a fully-fledged JupyterLab environment, CPU and GPU access, persistent storage, example notebooks, and integrations with popular online courses and libraries. With SageMaker Studio Lab, aspiring data scientists and experienced ML practitioners alike can easily get started learning, building, and experimenting in a fully managed preinstalled data science environment. SageMaker Studio Lab also makes it much easier for ML instructors, trainers, hackathon organizers and others to design and deliver effective courses and activities without having to troubleshoot configuration issues for each individual user. This blog post will walk though the process of registering for SageMaker Studio Lab, outline its main features, and cover helpful resources for getting started.

What is Amazon SageMaker Studio Lab?

Amazon SageMaker Studio Lab provides access to AWS compute resources, and is based on the open source JupyterLab, a web-based interactive development environment for Jupyter notebooks, code, and data.

Amazon SageMaker Studio Lab provides you 12 hours session on CPU or four hours session on GPU runtime instantaneously, without waiting for others using the shared resources, and gives you the possibility to restart your session to continue working. There’s 15 GB of free storage for each account to store datasets and notebooks and save your work so you can come back later and pick up where you left off. With full support for Git, you can have version control for your learning and development progress and share the workload with others.

The development environment in the Amazon SageMaker Studio Lab comes with Python 3.9. If you want to work with other python packages, like Pytorch, Tensorflow, Hugging Face, or OpenCV, SageMaker Studio Lab supports both pip and conda as the python package management system. You can install persistent package with pip or conda in both notebooks and the command line, just remember to use the % instead of ! , which looks like %conda install or %pip install. You can also install and use R Kernels.

How to register?

To get started with Amazon SageMaker Studio Lab, the first step is to go through the registration process and request a free account using your email. The main steps are described in the following lines.

When arriving on the main page the first thing to do is to click on the Request account button.

Amazon SageMaker Studio Lab website homepage (Image by authors)
Amazon SageMaker Studio Lab website homepage (Image by authors)

Then fill in the information like your email, first and last name, the country, organization name and occupation.

Request account form in Amazon SageMaker Studio Lab (Image by authors)
Request account form in Amazon SageMaker Studio Lab (Image by authors)

Once the request is submitted and approved, a confirmation email will be sent to the given email address. The email will contain a link inviting to create the account. Note that there might be a slight delay in approving the account due to internal review process for each account request to prevent abuse and illegitimate activities.

Account request approval email from The Amazon SageMaker Studio Lab team (Image by authors)
Account request approval email from The Amazon SageMaker Studio Lab team (Image by authors)

Click on the Create Account button, inside the email, a new browser window invites you to create a password and provide a username for signing into the Amazon SageMaker Studio Lab. Note that once your request has been approved you have 7 days to claim your account. After that time your approval will be retracted and you will have request it again.

Create account form with customer agreement, service terms, privacy notice, and acceptable use policy (Image by authors)
Create account form with customer agreement, service terms, privacy notice, and acceptable use policy (Image by authors)

And as the final step of the registration process, check the inbox again. And inside the new email, there is a Verify your email button. Click the button, it will lead you to the SageMaker Studio Lab homepage.

Verify your email address mail (Image by authors)
Verify your email address mail (Image by authors)

Now you are ready to login into SageMaker Studio Lab using the email or username and the password you set up earlier.

Sign in page of Amazon SageMaker Studio Lab (Image by authors)
Sign in page of Amazon SageMaker Studio Lab (Image by authors)

After clicking on the Sign in button, the main page displays with the project status, the corresponding instance type and a list of resources to help us start with learning and experimenting with machine learning.

My project page with start runtime and learning resources (Image by authors)
My project page with start runtime and learning resources (Image by authors)

Let’s get started!

To get started after the registration steps, on the main page you need to select the compute type between CPU and GPU. GPU is better for workloads that run on GPU, such as training deep learning model. After four hours your session times out, however if you are still active feel free to start another session. For general-purpose computing tasks, training of non-parallel or intensively-branching algorithms, CPU is a better choice. Projects using CPU have a 12-hour session runtime.

Once the compute type is selected, click on the Start runtime button to start using the instance.

My project page with GPU instance started (Image by authors)
My project page with GPU instance started (Image by authors)

For starting to learn with sample notebooks, you can choose between the Dive into Deep Learning (D2L) notebooks, AWS Machine Learning University (MLU), Hugging Face or AWS Machine Learning Blog.

Learning resources and community resources on My project page in Amazon SageMaker Studio Lab (Image by authors)
Learning resources and community resources on My project page in Amazon SageMaker Studio Lab (Image by authors)

If you choose to Open Hugging Face notebooks, a new tab opens and invites you to log in with Github to allow Amazon SageMaker Studio Lab to access the Hugging Face notebooks.

Inviting to log in Github and synchronize with Amazon SageMaker Studio Lab (Image by authors)
Inviting to log in Github and synchronize with Amazon SageMaker Studio Lab (Image by authors)

You can now click on the Log in with Github button and on the next page click on the Authorize Amazon SageMaker button.

Authorizing Amazon SageMaker Studio Lab to access Github via identity credentials (Image by authors)
Authorizing Amazon SageMaker Studio Lab to access Github via identity credentials (Image by authors)

If you’re looking for a link to authorize the Amazon SageMaker GitHub app, you also can try this link.

After the authorization step, you can copy the Hugging Face notebooks to your project directly by clicking on the Copy to project button.

Initializing Hugging Face to be used by copying samples to the project (Image by authors)
Initializing Hugging Face to be used by copying samples to the project (Image by authors)

A general reminder, you will need to change all of the “!” installs to “%” notation, more details on this blog post.

You can decide to copy the entire repository by clicking on the Clone Entire Repo button and on the next screen on the default Clone button.

Copying Hugging Face samples by cloning the related repo to the project (Image by authors)
Copying Hugging Face samples by cloning the related repo to the project (Image by authors)

Once the Github repository clone succeeded you are able to see all the sample files in Amazon SageMaker Studio Lab interface.

View of the Hugging Face samples copied into Amazon SageMaker Studio Lab project (Image by authors)
View of the Hugging Face samples copied into Amazon SageMaker Studio Lab project (Image by authors)

You can also upload our own custom notebook in Amazon SageMaker Studio Lab by clicking on the Upload Files button in the top left side like in classic JupyterLab. You can create a new notebook from File/New/Notebook.

Closer look at the Upload Files and the folder structure in Amazon SageMaker Studio Lab (Image by authors)
Closer look at the Upload Files and the folder structure in Amazon SageMaker Studio Lab (Image by authors)

You can install additional libraries or packages by using pip or conda commands from the terminal or directly from the notebook. Moreover, there is an option to use yml files and launch these from the terminal window. To use a terminal window, select File/New/Terminal and then you can start editing our yml file. For example, to install the packages for the previous Hugging Face NLP notebook, you can use a sample env_nlp.yml file.

name: nlp
dependencies:
- python=3.9
- pip
- pip:
- ipywidgets
- git+https://github.com/huggingface/transformers
- datasets
- sacrebleu
- torch
- sentencepiece

The installation command to launch from the terminal with the above file is
conda env create -f env_nlp.yml

The terminal window shows the environment installation progress (Image by authors)
The terminal window shows the environment installation progress (Image by authors)

By launching the above command, a new Python environment nlp is created containing all the specified packages. For activating the new nlp environment the command to launch is the following:
conda activate nlp.

Now, select the nlp environment as a kernel to be used in one of the Hugging Face NLP notebook examples. The kernel is the image hosting the notebook which can be a default Python, or a customized environment. Note that you might need to adapt the Hugging Face notebooks for your specific use case and install additional packages before running them.

Select the nlp environment from preferred kernel list (Image by authors)
Select the nlp environment from preferred kernel list (Image by authors)

An important note is that all your project files and customized Conda environments will persist into SageMaker Studio Lab even after the session has ended. So the next time when you log into SageMaker Studio Lab, all the files will be ready to be reused and there is no need to reinstall all the related packages.

Add an “Open in Studio Lab” button to your own notebooks

Last but not least, you can also add your very own “Open in Studio Lab” button to all of your notebooks hosted on GitHub! Doing this is really easy. First, copy the url of the notebook on GitHub that you want to modify. For the NLP notebook hosted on the Studio Lab Examples, that looks like this:

Next,

Finally, add this as a line within the Markdown of your notebook. We like to add these right below the top line header, so customers can use the code right away.

This should render a purple button nicely on your notebook, like this.

Open in Studio Lab icon (Image by authors)
Open in Studio Lab icon (Image by authors)

Things to notice?

You are now ready to start your the machine learning journey on the Amazon SageMaker Studio Lab with your own project. One thing worthy to be noticed is the instance compute time. It is designed for learning and education purpose, instead of activities like mining cryptocurrency or serving a website. It keeps running for up to 12 hours on CPU instances and 4 hours on GPU instances with 15 GB of persistent storage.

After your user session times out after 12 hours for CPU and 4 hours for GPU, depending on the compute type you choose, the system will save your work, and stop the runtime. You can come back and sign in later to continue your learning experience on Amazon SageMaker Studio Lab.

What’s next?

Here are some great resources and events to practice your data science skills and get involved in hands on activities.

(1) Learn from the best

Get access to the same machine learning courses used to train Amazon’s own developers on machine learning. Learn how to use ML with the learn-at-your-own-pace AWS Machine Learning University (MLU) Accelerator learning series.

Dive into Deep Learning with a free interactive book (150 Jupyter notebooks) that teaches the ideas, the math, and the code. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.

To learn from the best about machine learning on nature language processing(NLP), you may visit Hugging Face. Hugging Face was founded in 2016, based in New York and Paris. Hugging face brings over 7,000 pre-trained models in 164 languages with transformers, tokenisers, and datasets that developers can easily work with for their NLP projects and try out the notebooks on Amazon SageMaker Studio Lab.

Another recommended content to read is Amazon machine learning blog which contains the latest developments, research, and techniques in the AI and machine learning space, inspiring you to gain a broader view and understanding of the best practices on various machine learning projects.

(2) Be part of the community

To meet other developers who are using Amazon SageMaker Studio Lab, you may visit Studio Lab Samples to send your feedback on Github with hashtag #studio-lab. Or visit Stack Overflow with hashtag #studio-lab to collaborate with other developers and ML practitioners.

(3) Join the AWS Disaster Respose Hackathon using Studio Lab

AWS Disaster Response Hackathon picture (Image by authors)
AWS Disaster Response Hackathon picture (Image by authors)

Together with Amazon SageMaker Studio Lab, AWS has also launched the AWS Disaster Response Hackathon.

Both the frequency and severity of natural disasters are increasing. This year alone, we have seen significant wildfires across the Western United States and in countries like Greece and Turkey; major floods across Europe; and temperature increases in most countries in Asia.

This hackathon hopes to stimulate ways of applying ML to solve pressing challenges in natural disaster preparedness and response. You will be able to work on open-source projects as a team or an individual along with developers across the globe. ML and Disaster Response experts will provide guidance and answer your questions in monthly office hours.

The hackathon offers a total of $54,000 USD in prices runs through February 7, 2022. It is also an attempt to set the Guinness World Record for the “largest machine learning competition.” So make sure to submit your project in time!

This hackathon is a great way to start learning ML and apply your knowledge and passion while doing good in the world. Join the hackathon here.

We hope this getting started guide was useful and we look forward to seeing you contribute to the SageMaker Studio Lab community!

--

--