AWS Certified Machine Learning Cheat Sheet — SageMaker Features 1/3
This marks the first installment of the SageMaker Features series! SageMaker Features help the model building process in various ways, so whether you are a seasoned professional or just starting out in AWS, this series will have something for everyone. These features in this installment are Automatic Model Tuning, Apache Spark, SageMaker Studio and SageMaker Debugger.
Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.
Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!
So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!
This series has you covered on the features in SageMaker, from Model Monitoring to Autopilot, this series has it all!
Want to know how I passed this exam? Check this guide out!
Full list of all installments:
- 1/3 for Automatic Model Tuning, Apache Spark, SageMaker Studio and SageMaker Debugger here
- 2/3 for Autopilot, Model Monitor, Deployment Safeguards and Canvas here
- 3/3 for Training Complier, Feature Store, Lineage Tracking and Data Wrangler here
TL;DR
- Automatic Model Tuning can iterate many training jobs over your model to find the best model version. The job can run in parallel so many hyperparamter tuning jobs can be run at once. Some best practices are: run at least one successful training job first, don’t try too many hyperparameters at once and choose small ranges, scale logarithmically and don’t run too many jobs at once. Since the tuning process learns from each incremental step, too much concurrency can actually hinder that learning.
- Apache Spark is used for large-scale data processing. You can distribute the processing of that data across an entire cluster on Spark. Typically, you put your data in a Spark data frame, run your processing jobs and then access your data in a data frame object. The SageMakerEstimator classes allow tight integration between Spark and SageMaker for several models including XGBoost.
- SageMaker Studio is an IDE that provides a single platform for all your machine learning needs and allows seamless collaboration. It also has a feature called SageMaker Experiments that allows you log, track, analyze, and visual results of multiple models.
- SageMaker Debugger provides tools to debug the training job and seeks to improve the performance of the model. Debugger will log and create a CloudWatch event when the rule is hit. Supported frameworks are Tensorflow, PyTorch, MXNet, XGBoost and SageMaker generic estimator. There is a Debugger Insights Dashboard.
Automatic Model Tuning
What problem does it solve?
Hyperparameter tuning is a core aspect of machine learning and it can greatly impact the performance of the model. The hyperparameters are usually tuned to your specific model and data, so the hyperparameters are unique for each project. Tuning the hyperparameters can be expensive in terms of time and resources.
This problem can quickly become a headache when you have many hyperparameters and you have to try different combinations of them to find the best model.
Why is it the solution?
AMT can iterate many training jobs over your model to find the best model version. You just have to specify the hyperparameters, the ranges, the metrics you are optimizing and how many iterations you want. You can also choose the scale for integer and continuous hyperparameter ranges.
SageMaker can create a tuning job that trains the model with different combinations of hyperparameters. It doesn’t try every combination possible, it learns as it goes and can determine which hyperparameter values are the best to try out next. The job can run in parallel so many hyperparamter tuning jobs can be run at once.
Afterwards the most performant model can be deployed.
What are best practices?
- You should run at least one training job first before trying AMT and the training job should finish without any errors
- You should know your dataset, have an understanding of the algorithm, and the best measure of success for it.
- Don’t try too many hyperparameters at once and choose small ranges
- If you have a hyperparameter that spans several orders of magnitude then you may want to scale logarithmically. For example, you may want to choose a logarithmic scale for the
learning_rate
- In order for AMT to learn the best hyperparameters, don’t run too many jobs at once
- Since the tuning process learns from each incremental step, too much concurrency can actually hinder that learning.
Apache Spark
What problem does it solve?
Big data processing can take a long time, especially if it’s a large amount of data and you have to do some intense computation or transformations on the data. Doing big data processing directly within a SageMaker notebook can be a headache.
Why is it the solution?
Apache Spark is used for large-scale data processing. It’s is a great option for pre-processing data and has a robust ML library. Using the Spark framework in SageMaker allows you to easily apply data transformations and extract features. You can distribute the processing of that data across an entire cluster on Spark.
Typically, you put your data in a Spark data frame, run your processing jobs and then access your data in a data frame object. Then can create a SageMaker estimator, like K-means, PCA or XGBoost. So essentially, process your data in Spark and then build your model in SageMaker. The SageMakerEstimator classes allow tight integration between Spark and SageMaker for several models including XGBoost
SageMaker Studio
What problem does it solve?
In AWS, typically a model is built within a SageMaker notebook. This has many benefits, but some engineers may not be accustomed to a notebook environment, seeing previous training jobs can be a pain and collaborating with multiple team members within a notebook instance can be a challenge.
Why is it the solution?
SageMaker Studio is an IDE that provides a single platform for all your machine learning needs and allows seamless collaboration. The Studio can also lower the learning curve for new machine learning engineers. Studio is faster than using a notebook and allows sharing of notebooks. You can also switch between hardware configurations. It also has a feature called SageMaker Experiments that allows you log, track, analyze, and visual results of multiple models.
SageMaker Debugger
What problem does it solve?
Training a model can go sideways in multiple ways. The model can be overfitted, there can be saturated activation functions or vanishing gradients.
Why is it the solution?
Debugger provides tools to debug the training job and seeks to improve the performance of the model. It saves internal model state at periodical intervals.
You can create rules for detecting problems during training, and Debugger will log and create a CloudWatch event when the rule is hit.
Built in rules are monitoring for bottlenecks, profile model framework operations and debug model parameters.
Debugger can send alerts when anomalies are found during training, can try to resolve those anomalies and identify the cause. Supported frameworks are Tensorflow, PyTorch, MXNet, XGBoost and SageMaker generic estimator.
The typical steps are:
- adding
sagemaker_debugger
to your training script - configure a SageMaker training job with SageMaker Degbugger
- start a training job
- receive alerts and take action
- explore deep analysis of the issues
- fix, consider suggestions from Debugger and repeat until your model reaches the target accuracy
There is a Debugger Insights Dashboard to see the ProfilerReport, hardware metrics to see CPUBottlenecks
and GPUMemoryIncreases
, and framework metrics. The dashboard also allows you to profile system resource usage and training.
There also APIs for Debugger on GitHub, using those APIs you can create your own hooks and rules. The library is called SMDebug
, this is the client library to register hooks for accessing your training data.
Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Built In Algorithms:
- 1/5 for Linear Learner, XGBoost, Seq-to-Seq and DeepAR here
- 2/5 for BlazingText, Object2Vec, Object Detection and Image Classification and DeepAR here
- 3/5 for Semantic Segmentation, Random Cut Forest, Neural Topic Model and LDA here
- 4/5 for KNN, K-Means, PCA and Factorization for here
- 5/5 for IP insights and reinforcement learning here
and high level machine learning services:
- 1/2 for Comprehend, Translate, Transcribe and Polly here
- 2/2 for Rekognition, Forecast, Lex, Personalize here
and this article on lesser known high level features for industrial or educational purposes
and for ML-OPs in AWS:
- 1/3 for SageMaker and Docker, Production Variants and SageMaker Neo here
- 2/3 for Instance Types, SageMaker and Kubernetes, SageMaker Projects, Inference Pipelines and Spot Training here
- 3/3 for Availability Zones, Serverless Inference, SageMaker Inference Recommender and Auto Scaling here
and this article on Security in AWS
Thanks for reading and happy studying!