AWS Certified Machine Learning Cheat Sheet — ML-OPs 2/3

tanta base
4 min readNov 24, 2023

--

This is the second installment of ML-OPs in AWS. The importance of machine learning operations cannot be overstated. Having ML-OPs knowledge can make you a well-rounded machine learning engineer. In this installment we’ll do a brief review of Instance Types, then move on to SageMaker and Kubernetes, SageMaker Projects, Inference Pipelines and Spot Training.

Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.

Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!

So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!

Want to know how I passed this exam? Check this guide out!

This series has you covered on the ML-OPs in AWS:

robot looking down at a checklist
Bring your models to life with machine learning ops!

Instance Types

The instance types for the built in algorithms were covered for each algorithm, you can more about that in the built in algorithm series, starting here.

However, generally speaking algorithms that use deep learning will benefit from a GPU instance (P3, g4dn) for training, but GPU instances can be more expensive. Inference is less demanding and compute instances will generally be enough.

SageMaker and Kubernetes

You can integrate Kubernetes with SageMaker in two ways. First way is SageMaker Operators for Kubernetes, this wraps SageMaker operations so Kubernetes can access them. To integrate these you can use a Amazon EKS and create a SageMaker jobs natively using the Kubernetes API.

Second way is SageMaker Components for Kubeflow Pipelines, which allows you to define pipelines and wrap the entire process of building and deploying, testing and tuning your models in a ML-OP environment. Both of these components enable hybrid machine learning workflows (if you have sensitive data that you want to keep on premises) and integration of existing machine learning platforms built on Kubernetes or Kubeflow.

SageMaker Projects

This SageMaker natively machine learning operations solution with continous integration and continuous deployment. Use SageMaker projects to create an ML-OPs solution to orchestrate and manage building a custom image for processing, training and inference, feature engineering, training, evaluation, deploying, monitoring and updating. SageMaker Projects uses code repositories for building and deploying ML solutions and SageMaker pipelines to actually define those steps.

Inference Pipelines

An inference pipelines takes in input data and optionally transforms it before making an inference. You can use an inference pipeline to combine preprocessing, predictions, and post-processing data science tasks. The entire assembled pipeline is considered a model where you can do either real-time or batch inferences.

SageMaker handles invocations as a sequence of HTTP requests. The first container handles the first request, then that response is sent as another request to the second container, etc. Finally, SageMaker returns the last response back to the client. When you deploy the pipeline model, SageMaker will install and run all the containers on each EC2 instance in the endpoint or transform job.

Can have between 2 and 15 containers that work together, and any combination of pre-trained algorithms or your own algorithm in a docker container. You can use containers from Spark ML or scikit-learn. If you use Spark you can run it with Glue or EMR and they will be serialized into MLeap format.

In summary, this chains multiple inference containers into one pipeline of results.

Spot Training

You can train models using managed Amazon EC2 Spot instances instead of on-demand instances. These can optimize the cost of training up to 90% over on-demand instances and SageMaker can manage Spot interruptions on your behalf. You can specify which training jobs use spot instances and a stopping condition that specifies how long SageMaker waits for a job to run using Amazon EC2 Spot instances. Metrics and logs generated during training runs are available in CloudWatch.

In case of interruptions you can configure your spot training job to use checkpoints and the training job can resume from the last checkpoint instead of restarting. Spot instances can increase training time because you may have to wait for a spot instance to become available.

ML-OPs are the gears that bring your models to life

Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Features:

and high level machine learning services:

and built in algorithms:

  • 1/5 for Linear Learner, XGBoost, Seq-to-Seq and DeepAR here
  • 2/5 for BlazingText, Object2Vec, Object Detection and Image Classification and DeepAR here
  • 3/5 for Semantic Segmentation, Random Cut Forest, Neural Topic Model and LDA here
  • 4/5 for KNN, K-Means, PCA and Factorization for here
  • 5/5 for IP insights and reinforcement learning here

and this article on lesser known high level features for industrial or educational purposes

and this article on Security in AWS

Thanks for reading and happy studying!

--

--

tanta base

I am data and machine learning engineer. I specialize in all things natural language, recommendation systems, information retrieval, chatbots and bioinformatics