AWS Certified Machine Learning Cheat Sheet — Built In Algorithms 1/5
This series has you covered on the built-in algorithms in SageMaker and reviews supervised, unsupervised and reinforcement learning! In this installment we’ll review Linear Learner, XGBoost, Seq-to-Seq and DeepAR.
Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.
Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!
So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!
Want to know how I passed this exam? Check this guide out!
Full list of all installments:
- 1/5 for Linear Learner, XGBoost, Seq-to-Seq and DeepAR here
- 2/5 for BlazingText, Object2Vec, Object Detection and Image Classification and DeepAR here
- 3/5 for Semantic Segmentation, Random Cut Forest, Neural Topic Model and LDA here
- 4/5 for KNN, K-Means, PCA and Factorization for here
- 5/5 for IP insights and reinforcement learning here
We’ll cover Linear Learner, XGBoost, Seq-to-Seq and DeepAR in this installment.
TL;DR
- Linear Learner is for both classification and regression tasks. It is a supervised learning technique. For best results normalize and shuffle.
- XGBoost is a gradient boosted tree algorithm for Classification, Regression and Ranking. It is a supervised learning technique. Subsample and eta prevents overfitting.
- Seq-to-Seq is used to inputs a sequence of tokens and output is another sequence of tokens. It is a supervised learning technique. It uses RNNs and CNNs with attention as encoder-decoder architectures. It is used for Machine translation, text summarization and speech to text.
- DeepAR Forecasting is an algorithm for forecasting scalar time series. It is a supervised learning technique. It uses RNNs. Some best practices are to not break up the time series or provide a part of it, to avoid large
prediction_length
,context_length
andprediction_length
should be the same, total observations across training time should be greater than 300, setprediction_length
for the number of time-steps the model is set to predict and finally ARIMA or ETS might get more accurate results on a single time series
Linear Learner
What is it?
An algorithm that provides a linear solution for both classification and regression. It maps a vector x to an approximation of the y label. For classification a linear threshold function is used.
What type of learning?
Supervised
What problems can it solve?
Classification and Regression
What are inputs?
x and y, where x is a high dimensional vector and y is a numeric label. Uses a matrix, rows represent observations and columns represent the dimensions of features. Also, one column for the label
What are labels for binary classification?
0 and 1
What are labels for multiclass classification?
0 to num_classes -1
What are labels for regression problems?
y is a real number
What does it optimize for continuous objectives?
Mean square error, cross entropy loss and absolute error
What does it optimize for discrete objectives?
F1, precision, recall and accuracy
What are the requirements?
Input and output locations, objective type, feature dimension
What are the input formats?
Protobuf is more efficient (float32 tensors) and csv (first column is assumed to be the label)
File and pipe mode are both supported. Pipe mode is more efficient with larger training sets.
How are the inferences scored?
For classification, the score is a single floating point number. For multiclass the score will be a list of one floating point number per class
What are the best practices?
Normalize (linear learner can do this automatically) and shuffle
What are some hyperparameters?
For tuning multiclass can choose balance_multiclass_weights
so each class has equal importance in loss funtion
Can also adjust learning_rate
, mini_batch_size
, L1
and Wd
(weight decay, also known as L2)
What EC2 instance does it support?
Single or Multi CPU and GPU
XGBoost
What is it?
Open sourced implementation of gradient boosted tree algorithm. Can predict a target variable by combining ensemble of estimates from a set of simpler and weaker models.
What type of learning?
Supervised
What problems can it solve?
Classification, Regression and Ranking
How can you use it?
As a built in or as a framework. Using it in SageMaker you have more flexibility, small memory footprint, better logging and improved hyperparameter validation.
What are advantages of using it a framework?
Can run customized training scripts that can incorporate additional data processing
What are the advantages of using it as a built in algorithm?
Runs directly on input datasets
What are the input formats?
Text, csv, libsvm, x-parquet, protobuf
What are the training input?
For columnar input, it assumes the first column is the target/label. For csv there should be no header. For libsvm assumes columns after thelabel column contains zero based index value pairs for features.
What are the inferences input?
For column input, it assumes there is no label column. For csv there should be no header
What are instance weight supports?
To differentiate the importance of labelled data. You can assign each instance a weight value
What are variations in output?
tree_method
hyperparameter determines the algorithm that is used XGBoost. The methods are approx
, hist
, and gpu_hist
(train on single instance GPU, more cost effective)
What are some hyperparameters?
subsample
pervents overfittingeta
is step size shrinkage, prevents overfittinggamma
is minimum loss reduction required to make a further partition on a leaf node of the tree. The largergamma
is, the more conservative it isalpha
is L1 regularization. The largeralpha
is, the more conservative it islamda
is L2 regulatization. The largerlamda
is, the more conservative it iseval_metric
sets optimization metric. Can set it to AUC if you care about false positives. Can also set it to error or rmsescale_pos_weight
adjusts balance of postive/negative weights, good for unbalanced data. For best results set to sum(negative cases )/ sum(positive cases)max_depth
sets the depth of the tree. Too high of a value can cause overfitting
What EC2 instance does it support?
GPU and CPU
Seq-to-Seq
What is it?
An algorithm that inputs a sequence of tokens and output is another sequence of tokens.
What type of learning?
Supervised: RNNs and CNNs with attention as encoder-decoder architectures
What problems can it solve?
Machine translation, text summarization and speech to text.
What are training inputs?
Protobuf, tokens are expected as integers
What does training job expect?
- training data:
train.rec
(must be tokenized) - validation data:
val.rec
(must be tokenized) - two vocab files:
vocab.src.json
andvocab.trg.json
(maps tokens to words)
Note: pre-trained models and public training sets are available
What are inference inputs?
- json (supports additional configuration: {
attention_matrix
:true
}, recommended for small batches) - protobuf (recommended for bulk inferences)
What does it optimize?
accuracy
if you have a validation data set
BLEU Score
for machine translation
perplexity
for machine translation
What EC2 instance does it support?
GPU only, cannot be paralyzed, but multi GPUs
DeepAR Forecasting
What is it?
An algorithm for forecasting scalar time series. Uses classical forecasting methods: autoregressive integrated moving average and exponential smoothing. Can train same model over several related time series and it outperforms standard ARIMA and ETS.
What type of learning?
Supervised: RNNs
What problems can it solve?
Can train a model over all time series for different series groupings, such as different products, server loads, etc. Can generate forecasts for new time series that are similar to ones it was trained on. Can find frequencies and seasonalities.
What are training inputs?
Training and test datasets can either be json, gzip or parquet (has better performance). Can input a directory or single files. Can specify other input formats with content_type
What does training job expect?
Input files should have two fields:
- start in YYYY-MM-DD HH:MM:SS format
- target
- dynamic_feat (optional) sets dynamic features if a promotion was applied to a product in the time series. Missing values are not supported in this feature
- cat (optional) array of categorical features that can encode the groups the record belongs to, the algorithm uses it to extract cardinality of the groups
What are training guidelines?
Start time and length of time series can differ, but all series must have the same:
- frequency
- number of categorical features
- number of dynamic features
Time series should occur at random
If model trained with cat
feature it must be included in inference
If cat
is in dataset, but you dont want to use it, then set cardinality
to ""
If dataset contains dynamic_feat
the algorithm uses it automatically. It should have same length of target. if model was trained with dynamic_feat
it must be included in inference
If dynamic_feat
is in the dataset but you don’t want to use it then set num_dynamic_feat
to ""
What are evaluation metrics?
RMSE and accuary using weighted quantile loss
What are inference inputs?
Json
instances
which includes one or more time seriesconfiguration
which includes parameters for generating the forecast
What are best practices?
Dont break up time series or provide a part of it. You can split the dataset for training and testing, but provide the entire time series for training and testing
Avoid large values for prediction_length
Set context_length
(number of points the model sees before making a prediction) as the same value for prediction_length
DeepAR works best if the total number of observations across training time series is greater than 300
Can set prediction_length
for the number of time-steps the model is set to predict. Can use this field to determine what part of data is for training and what part is for testing
ARIMA or ETS might get more accurate results on a single time series
What EC2 instance does it support?
GPU and CPU
Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Features:
- 1/3 for Automatic Model Tuning, Apache Spark, SageMaker Studio and SageMaker Debugger here
- 2/3 for Autopilot, Model Monitor, Deployment Safeguards and Canvas here
- 3/3 for Training Complier, Feature Store, Lineage Tracking and Data Wrangler here
and high level machine learning services:
- 1/2 for Comprehend, Translate, Transcribe and Polly here
- 2/2 for Rekognition, Forecast, Lex, Personalize here
and this article on lesser known high level features for industrial or educational purposes
and for ML-OPs in AWS:
- 1/3 for SageMaker and Docker, Production Variants and SageMaker Neo here
- 2/3 for Instance Types, SageMaker and Kubernetes, SageMaker Projects, Inference Pipelines and Spot Training here
- 3/3 for Availability Zones, Serverless Inference, SageMaker Inference Recommender and Auto Scaling here
and this article on Security in AWS
Thanks for reading and happy studying!