AWS Certified Machine Learning Cheat Sheet — Built In Algorithms 5/5

tanta base
5 min readNov 13, 2023

--

This marks the last installment of the built in algorithms series, because all good things must come to an end. However, we are going to end with a bang with IP insights and reinforcement learning.

Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.

Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!

So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!

This series has you covered on the built-in algorithms in SageMaker and reviews supervised, unsupervised and reinforcement learning!

Want to know how I passed this exam? Check this guide out!

Full list of all installments:

  • 1/5 for Linear Learner, XGBoost, Seq-to-Seq and DeepAR here
  • 2/5 for BlazingText, Object2Vec, Object Detection and Image Classification and DeepAR here
  • 3/5 for Semantic Segmentation, Random Cut Forest, Neural Topic Model and LDA here
  • 4/5 for KNN, K-Means, PCA and Factorization for here
  • 5/5 for IP insights and reinforcement learning here

We’ll cover IP insights and reinforcement learning in this installment.

robot in classroom with blackboard behind it
Machine Learning is human learning too!

TL;DR

  • IP Insights Learns usage patterns for IPv4 addresses. It is unsupervised learning. It can use it to identify logins from anomalous IP address, to trigger multi-factor authentication system and can learn vector representations/embeddings to measure similarities for clustering and visualization (need large hash size for this). Set num_entity_vectors to set hash size, set to 2x unique entity identifiers. Set vector_dim size of embedding vectors, but too large can cause overfitting.
  • Reinforcement Learning can train to make decisions under uncertainty and in dynamic environments. It learns by a continuous process of receiving rewards and punishments for every action taken. Used for supply chain management, HVAC systems, robotics, game AI, dialog systems and autonomous vehicles.

IP Insights

What is it?

Learns usage patterns for IPv4 addresses. It is designed to capture associations between IPv4 addresses and other entities, like user IDs and account numbers.

It inputs historical data and can learn the usage pattern of each entity and returns a score that determines the anomalous pattern of the event.

Uses neural network to learn latent vector representation of IP addresses.

Can generate anomalous samples that randomly pairs names and IPs, if dataset is unbalanced.

What type of learning?

Unsupervised

What problems can it solve?

Can use it to identify logins from anomalous IP address.

Can be used to trigger multi-factor authentication system.

Can also be used for downstream tasks. Can learn vector representations/embeddings to measure similarities for clustering and visualization. Need large hash size for this.

What are training inputs?

Supports training and optional validation channels (computes AUC score)

CSV format only (first column is a string that provides id for entity and second column is IPv4 address in decimal dot notation). Only support file mode

What are some hyperparameters?

num_entity_vectors to set hash size, set to 2x unique entity identifiers

vector_dim size of embedding vectors, too large can cause overfitting

What EC2 instance does it support?

Can use both GPU and CPU. Can use distributed CPU for large training datasets. GPU is recommended ml.p3.xlarge or higher. Can use one or more GPUs. Size of CPU depends on vector_dim and num_entity_vectors

IP Insights supports P2, P3, G4dn, and G5 GPU families.

Reinforcement Learning

What is it?

Creates a numerical reward signal. Is an interactive problem where learning is done through the experience. Learns by a continuous process of receiving rewards and punishments for every action taken. Can train to make decisions under uncertainty and in dynamic environments.

Training usually consists of many episodes, and an episode consists of all the time steps in an Markov Decision Processes (MDP) from initial state until terminal state.

A MDP is a framework for modeling decision making, also called a discrete time stochastic process.

Based on MDP, the following steps are:

  • Environment: the space where the model operates, can be the real world or a simulator. Example: a chess board
  • State: Information about the environment and past steps that is relevant to the future. For example, the position of a robot at the current time step is the state. Example: where the game pieces are
  • Reward: Number that represents the value of the state that resulted from the last action the agent took. Model can find the strategy to optimize the cumulative reward over time, is strategy is called a policy. Example: Gaining a point for taking opponents game piece
  • Observation: Information about the state of the environment that is available at each step. Example, surrounding of the game board

What problems can it solve?

Supply chain management, HVAC systems, robotics, game AI, dialog systems and autonomous vehicles.

What are the Key Features in SageMaker?

To train in SageMaker use:

  • A Deep Learning framework. SageMaker supports Apache MXNet and TensorFlow
  • RL toolkit, it manages the interactions between the model and the environment. Also provides a selection of RL algorithms. SageMaker supports Intel Coach and Ray RLib.
  • RL environment, and use a custom, open source or commercial environments. Some are MATLAB, Simulink, EnergyPlus, RoboSchool, PyBullet, Amazon Sumerian and AWS RoboMaker

Can distribute training and/or environment rollout. Can use Multi-core and multi-instance.

GPU are helpful and supports multiple instances and multiple cores

Machine learning can seem like a lot, but you got this!

Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Features:

and high level machine learning services:

and this article on lesser known high level features for industrial or educational purposes

and for ML-OPs in AWS:

and this article on Security in AWS

You made it to 5/5! Thanks for reading and happy studying!

--

--

tanta base

I am data and machine learning engineer. I specialize in all things natural language, recommendation systems, information retrieval, chatbots and bioinformatics