Apache Spark and BigQuery with AWS Sagemaker Studio
Extend the capabilities of Sagemaker Studio container images with new libraries.
In the following post, you will learn how to extend the Sagemaker Studio Spark container image to incorporate additional libraries and interact with Google Cloud Services such as BigQuery. We will then create a notebook to retrieve data from a BigQuery table using Amazon Sagemaker Studio.
Introduction
On December 3, 2019, AWS introduced Amazon SageMaker Studio as The First Fully Integrated Development Environment For Machine Learning. According to AWS, Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.
Amazon SageMaker Studio lets you manage your entire ML workflow, providing features that improve the overall ML engineering experience. It offers SageMaker Notebooks to let you easily create and share Jupyter notebooks without having to manage infrastructure. SageMaker Experiments to organize, track and compare ML training and model evaluation jobs or data processing jobs run via SageMaker Processing. Amazon SageMaker Debugger to analyze complex training issues, and receive alerts. SageMaker Autopilot to build models automatically with…