AWS Certified Machine Learning Cheat Sheet — ML-OPs 3/3

tanta base
4 min readNov 24, 2023

--

This is the third and last installment of ML-OPs in AWS. You made it this far! Hopefully this series has helped you gain knowledge in the operations side of machine learning. In this installment we’ll cover Availability Zones, Serverless Inference, SageMaker Inference Recommender and Auto Scaling.

Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.

Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!

So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!

Want to know how I passed this exam? Check this guide out!

This series has you covered on the ML-OPs in AWS:

robot looking down at a list
ML-OPs is the final step in model building!

Availability Zones

SageMaker endpoints can help protect your application from Availability Zone outages and instance failures. If an outage occurs or instance fails, SageMaker automatically attempts to distribute your instances across Availability Zones. AWS recommends that you deploy multiple instances for each production endpoint. If you are using a VPC configure it with at least two subnets, each in a different Availability Zone. AWS recommends to use more small instance types in different Availability Zones to host the endpoints.

Serverless Inference

This is ideal for workloads that have idle periods between heavy traffic, it can also tolerate cold starts (can use CloudWatch to monitor how long your cold start time is). Serverless endpoints can launch compute resources automatically and scale them depending on traffic. This integrates with AWS Lambda to offer high available, built-in fault tolerance and automatic scaling. When there are no requests, Severless Inference scales down your enpoint to 0, helping to lower costs. When you have a predictable heavy traffic you can use Provisioned Concurrency with Serverless Inference. Just specify your container, memory and con-concurrency requirements and AWS will take care of the amount of hardware. Can use CloudWatch to monitor the ModelSetupTime to deploy your new inference models as its being automatically scaled up or down, Invocations , and MemoryUtilization over time.

This is a good option if you want to set up deployment capacity for your inference automatically

SageMaker Inference Recommender

This reduces the time to deploy machine learning models to production by automating load testing and model tuning across SageMaker machine learning instances. You can use the Inference Recommender to deploy to a real-time or serverless inference endpoint that brings the best performance at the lowest cost. It helps you select the best instance type and configuration or serverless configuration.

First, create or register a SageMaker model to the model registry with the model artifacts, then use AWS SDK for python or the SageMaker console to run benchmarking jobs for different endpoint configurations. Inference Recommender jobs help you collect and visualize metrics across performance and resource utilization to help you decide on which endpoint type and configuration to choose. Existing models may have benchmarks already.

Can do both instance recommendations (runs load tests on recommended instance types) and endpoint recommendations (custom load test, you specify the instances, traffic patterns, latency requirements and throughout requirements)

This is a good option if you want to set up deployment capacity for your inference endpoints manually

Auto Scaling

Auto Scaling can adjust the number of instances for your model in response to changes in the workload. When the workload increases, Auto Scaling can bring more instances and when it decreases, unnecessary instances are removed so you don’t pay for provisioned instances that you aren’t using. You can set up a policy to define target metrics, min/max capacity and cool down periods. It will automatically add or remove inference nodes. It works with CloudWatch to monitor performance and scale inference nodes as needed. If you have multiple production variants you can dynamically adjust the number of instances. A good practice is to load test your configuration before using it to make sure the scaling policy you set works as expected.

This is a good option if you want something in between manual set up capacity and automatic deployment capacity for your inference

ML-OPs are the gears that bring your models to life

Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Features:

and high level machine learning services:

and built in algorithms:

  • 1/5 for Linear Learner, XGBoost, Seq-to-Seq and DeepAR here
  • 2/5 for BlazingText, Object2Vec, Object Detection and Image Classification and DeepAR here
  • 3/5 for Semantic Segmentation, Random Cut Forest, Neural Topic Model and LDA here
  • 4/5 for KNN, K-Means, PCA and Factorization for here
  • 5/5 for IP insights and reinforcement learning here

and this article on lesser known high level features for industrial or educational purposes

and this article on Security in AWS

Thanks for reading and happy studying!

--

--

tanta base

I am data and machine learning engineer. I specialize in all things natural language, recommendation systems, information retrieval, chatbots and bioinformatics