AWS Certified Machine Learning Cheat Sheet — High Level Machine Learning Services 2/2

tanta base
10 min readNov 20, 2023

--

This is the second and last installment of the high level machine learning services that amazon provides. These high level services can be used out of the box if you don’t have time to train your own model or if you are a non-engineer who wants a machine learning system in their workflow or enterprise. Rekognition, Forecast, Lex, Personalize will be reviewed in this installment.

If you want more AWS cheat sheets, scroll down for a list of the other series!

Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.

Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!

So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!

Want to know how I passed this exam? Check this guide out!

This series has you covered on the high level machine learning services that are fully managed and cloud based. These services can be used without any machine learning expertise.

The installments in this series are:

robot sitting at a desk looking at a monitor and typing on keyboard
Machine learning is also human learning!

Rekognition

What is it?

A service that allows visual analysis to applications. With Rekognition, you don’t need any deep learning or machine learning expertise. It comes pre-trained and fully managed. With Rekognition you can build an application to search, verify and organize images. You can use custom labels with Rekognition by training a small set of labeled images.

Rekognition integrates with Augmented AI to route low confidence predictions from Rekognition to a human reviewer. You can also use your own face collections. Some use cases are:

  • computer vision
  • object or scene detection
  • image moderation
  • facial analysis
  • celebrity recognition
  • face comparisons
  • text within an image
  • timeline marking with a video and people pathing

Rekognition Video is a video recognition service that detects activities, movement of people in frame, recognizes objects, celebrities and inappropriate content. You can stream live videos or videos stored in S3. Rekognition Video allows you also to index metadata like objects, activities, scene, landmarks, celebrities, and faces that make video search easy. Amazon Rekognition Video can use Amazon Kinesis Video Streams to receive and process a video stream, video must be H.264 encoded, 5–30 FPS and favor resolution over framerate.

Rekognition Image is an application that can detect objects, scenes, activities, landmarks, faces, dominant colors and image quality. In addition, it also extracts text, recognizes celebrities, identifies inappropriate content and you can compare faces. You can use it with Lambda to trigger image analysis after an upload.

What are best practices?

Rekognition works best with consumer and professional videos taken with frontal field of view in normal color, lighting conditions and good visibility of the eyes. Rekognition was not tested for black and white, IR or extreme lighting condition. Custom labels cannot be used on faces. It supports image sizes up to 15MB when passed as an S3 object (S3 will generally be faster) and up to 5MB when submitted as an image byte array. It supports video sizes up to 10GB and up to 6 hour videos when passed from as a S3 object.

AWS recommends using VGA (640x480) resolution or higher. Going below QVGA (320x240) may cause Rekognition to not properly identify things. AWS also recommends that the smallest object or face present in the image is at least 5% of the size. The system is trained to recognize faces larger than 32 pixels. Resolution, blur, fast moving persons, lighting, pose may affect the prediction.

Forecast

What is it?

Can generate more accurate forecasts with machine learning. It is a fully managed service and machine learning expertise is not needed. Forecast can work with virtually any historical time series data and can combine with associated data to find relationships. There is an AutoML option to choose the best model for your time series data. It can integrate with AWS CloudFormation and AWS Step Functions to quickly deploy an end-to-end workflow.

It can track the accuracy of your model over time as new data is added. Forecast can create an explainability report, which provides more insights into better managing business operations. You can also include local weather with Weather Index. Forecast can generate forecasts at three quantiles: 10%, 50% and 90%. You can subset your data to create predictions with Forecast. Forecast provides six different accuracy metrics.

Some Forecast algorithms are:

  • CNN-QR, best suited for large datasets with hundreds of time series and only model that can accepts related historical time series data and metadata.
  • DeepAR+, uses RNN, best suited for large datasets with hundreds of time series and best suited for large datasets. Accepts related forward-looking time series and meta data.
  • Prophet, additive model, can handle non-linear trends and seasonality
  • NPTS, non-parametric time series, good for sparse data, has variants for season and climate forecast.
  • ARIMA, good for simple datasets, less than 100 time series
  • ETS, good for simple datasets, less than 100 time series

Lex

What is it?

Lex is powered by same technology as Alexa, Lex is a fully managed service that can help with speech recognition and language understanding, and uses deep-learning to improve over time. You can use Lex to build a chat bot which can be deployed to chat platforms, mobile clients and IoT devices, such as AWS mobile SDK, Facebook, Messenger, Slack and Twilio.

Amazon Lex can be integrated with AWS Lambda for Intent fulfillment, Amazon Cognito for user authentication, and Amazon Polly for text to speech. Lambda can also be used to validate user input using a codeHook. Lex can take speech or text and can learn the intent of them. You can monitor your bot to understand how users are interacting with it. There is no bandwidth constraints with Lex.

Some vocabulary:

  • Intents performs an action in response to the natural language of the user
  • Utterances are spoken or typed phrases that invokes an intent
  • Slots are inputs needed to fulfill the intent
  • Prompts are used to elicit a value for the slot
  • Fulfillment is the mechanism for the intent

To create a bot you need to define the actions of the bot, these actions are known as intents and for each intent you will add an utterance and a slot. Finally, you will add business logic to execute the action.

You can use Automated Chatbot Designed to automate your chatbot design. You can provide transcripts and Lex can create a chat bot around it by applying NLP and deep learning, removing overlaps and ambiguity. It can extract intents, user requests, phrases and values for slots. It ensures intents are well defined and separated, providing a better user experience. You can integrate this with Amazon Connect transcripts, to automatically feed in transcripts to the bot designer.

Personalize

What is it?

A fully managed machine learning, cloud-based and scalable service that uses data you provide to generate a product and content recommendation system. A personalized recommendation engine for your customers can increase engagement, customer satisfaction, loyalty and/or sales which can lead to an increase in revenue/profitability. It can be an effective solution for companies with large or expanding user bases. Personalize automates the process of creating tailored suggestions, and can have a recommendation model ready in days.

You can stream real-time data or historical data to Personalize. In addition, Personalize allows you to surface items without pervious user interactions with that item, this helps users discover new products/items (choose “new item exploration weight” when creating the model). Personalize can integrate with OpenSearch to personalize search results. You can provide an explicit schema in Avro format. The two main APIs are GetRecommendations and GetPersonalizedRanking.

Also, you can measure the business outcome of Personalize with events sent to the system, and you can visualize and evaluate them to develop a data-driven personalization strategy (define a “metric attribution,” that you want to evaluate and report on).

Some use cases:

  • Recommendations tailored to the user’s profile, behavior, preferences and history to boost customer engagement and satisfaction, can drive higher conversation rates. For example, add multiple types of personalized video recommendations to your streaming app or add product recommendations to a retail app
  • Personalize ranking to surface relevant items or content to a specific user
  • Recommend similar items to encourage exploration, upsell and cross-sell
  • Create personalize emails or a targeted marketing campaign for customer segmentation

To get started, you provide this data:

  • Data about your users: age, location, device type
  • Data about your items: genre, price
  • Interactions between users and items: clicks, purchases

Personalize has a three step process in the AWS management console or a set of API calls:

  • Point Amazon Personalize to your data in an S3 bucket, upload your data in an API call, or SageMaker Data Wrangler to prep and import your data. You can optionally add datasets that contains additional information about your catalog and customer base. For better performance, you can also provide: event type, event value, contextual metadata, and item and user metadata. Personalize also can analyze the data you provided and offer suggestions to assist in improving your data preparation.
  • Train a custom and private recommendation system for your data either by letting the service choose the algorithm with AutoML or manually choosing one of the several algorithms available.
  • Then the model can be deployed with a single API call and can be used in production applications to get real-time recommendations.

You can use these metrics to evaluate Personalize’s recommendations:

  • AB testing - Considered the best measure of the impact of the model on business metrics. If you do not have A/B testing in place you can use Amazon CloudWatch Evidently
  • Offline metrics - These are calculated by splitting the datasets into training and testing, it lets you view the affects of modifying hyperparameters and algorithms used to train your model against historical data. You can use these metrics to provide a directional sense of the quality of a solution version against other versions.
  • Online metrics — Empirical results observed in your user’s interactions with real-time recommendations provided in a live environment. Running the recommendation system for a few weeks before testing is recommended.

Top features of Personalize:

  • User segmentation — segment users for targeted messages
  • Domain optimized recommenders — pre-built recommenders for common business use cases
  • New item recommendations — create recommendations for new products and content when user preference is sparse (also known as cold-start problem)
  • Real-time or batch recommendations
  • Personalized search — surface relevant search results for the user
  • Unstructured text support — NLP and attention-based modeling to automatically extract key information, ex. gather information from product descriptions, reviews, movie synopses, or other unstructured text
  • Contextual recommendations — generate recommendations with a context such as user segment, device type, location, time of day.
  • Business rules — use filters and promotions that control the percentage of promoted content for each user, can filter out items the user has already bought, can filter in premium content, or filter in a percentage of some category
  • Trending recommendations — recommend items that are gaining popularity
  • Recommendation impact — measure total business impact of any events, such as page view, video start, click, etc.

Some hyperparameters:

  • hidden_dimension (HPO): The number of hidden variables used in the model. Hidden variables recreate users’ purchase history and item statistics to generate ranking scores.
  • bptt (backpropagation through time): Determines whether to use the back-propagation through time technique. Back-propagation through time is a technique that updates weights in recurrent neural network-based algorithms. Use bptt for long-term credits to connect delayed rewards to early events.
  • recency-masks: Determines whether the model should consider the latest popularity trends in the Interactions dataset. Latest popularity trends might include sudden changes in the underlying patterns of interaction events.
  • min_user_history_length_percentile: The minimum percentile of user history lengths to include in model training. History length is the total amount of data about a user. Use min_user_history_length_percentile to exclude a percentage of users with short history lengths.
  • max_user_history_length_percentile: The maximum percentile of user history lengths to include in model training. History length is the total amount of data about a user. Use max_user_history_length_percentile to exclude a percentage of users with long history lengths because data for these users tend to contain noise.
  • exploration_weight: Determines how frequently recommendations include items with less interactions data or relevance. The closer the value is to 1.0, the more exploration. At zero, no exploration occurs and recommendations are based on current data (relevance).
  • exploration_item_age_cut_off: Specify the maximum item age in days since the latest interaction across all items in the Interactions dataset. This defines the scope of item exploration based on item age. Amazon Personalize determines an item’s age based on its creation timestamp or, if creation timestamp data is missing, interactions data.

Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Built In Algorithms:

  • 1/5 for Linear Learner, XGBoost, Seq-to-Seq and DeepAR here
  • 2/5 for BlazingText, Object2Vec, Object Detection and Image Classification and DeepAR here
  • 3/5 for Semantic Segmentation, Random Cut Forest, Neural Topic Model and LDA here
  • 4/5 for KNN, K-Means, PCA and Factorization for here
  • 5/5 for IP insights and reinforcement learning here

and this installment for SageMaker Features:

and this article on lesser known high level features for industrial or educational purposes

and for ML-OPs in AWS:

and this article on Security in AWS

Thanks for reading and happy studying!

--

--

tanta base

I am data and machine learning engineer. I specialize in all things natural language, recommendation systems, information retrieval, chatbots and bioinformatics