AWS Certified Machine Learning Cheat Sheet — Security

tanta base
5 min readNov 24, 2023

--

Data demands our protection and it’s up to us to advocate for data privacy. AWS is proud of their security architecture and they are touted as one of the most secure global cloud infrastructures. In this article we’ll review SageMaker and security. Any one of these topics could be it’s own separate article, this is just to provide a brief overview and some important highlights.

Machine Learning certifications are all the rage now and AWS is one of the top cloud platforms.

Getting AWS certified can show employers your Machine Learning and cloud computing knowledge. AWS certifications can also give you life-time bragging rights!

So, whether you want a resume builder or just to consolidate your knowledge, the AWS Certified Machine Learning Exam is a great start!

Want to know how I passed this exam? Check this guide out!

Robot holding a lock
Have a secure enviorment before your machine learns!

SageMaker Security

Overview

Tenets:

  • Customer data is not retained
  • Encryption at rest and transit
  • Compute and network isolation
  • Secure, fully-managed infrastructure
  • Audibility
  • Compliance
  • Least privilege

Common Isolation Architecture:

  • Secure, managed VPC only accessible to control plane
  • Each job or notebook instance has its own fresh EC2 instance, EC2 instances are never reused
  • Network isolation with crafted security group
  • VPC endpoint for accessing Amazon S3, data is never intercepted and it never crosses the internet
  • Optional internet access
  • VPC support

SageMaker Notebook

Notebook is web based, so instances of data can be kept in the cloud and restricted to the security policies of your organization. Notebook is usually launched from SageMaker console, but you can build your own portals or get a URL programmatically. Communication is by SSL and can be accessed like any other web app, no tunneling or VPN required. Only users with the right permissions can access a notebook. You can also access your notebook through Private Link, which can restrict access based on AWS Direct Connect or custom VPC.

Notebooks can be created without internet access, can control or prohibit internet access when they are created. Notebooks can route internet access through custom VPC. Can configure notebooks using lifecycle hooks to setup the way they work. You can also connect the notebook to a git repo for code sharing, if specified. GitHub, AWS CodeCommit and private repos are supported via a VPC.

By default when a notebook instance is created, users that log into that notebook have root access. If you don’t want users to have root access to a notebook instance, set RootAccess field to Disabled . You can also disable root access for users when you create or update a notebook instance in the Amazon SageMaker console.

SageMaker Hosting

You can host deployed models, clients call the model via a RESTful HTTP endpoint, but still need AWS SigV4 authentication to call the endpoint. With Private Links you can add services directly to VPC, and backend service can call for inferences with data that it constructs without that data crossing the internet.

SageMaker Distributed Training

A cluster, group of EC2 instances, is allocated for each training job, this shares the same managed VPC with other training jobs in that cell of SageMaker. AWS wants these EC2 instances to be able to talk freely with one another. To do that, one security group is made for each cluster job and that allows cross talk between the EC2 nodes within that job, this also allows communication to the control plane.

You can optionally add encryption here, but encryption can impact performance for some algorithms. If encryption is turned on, then all communication between nodes is encrypted. If encryption is not turned on, then communication is still secured within the VPC in the security group.

Some intranetwork data in transit (inside the service platform) is unencrypted, this includes communication between nodes during a distributed processing job and and distributed training jobs.

Each node when its created will need to create local disk storage, this EBS storage is created fresh on each job, there is a reset in between to guarantee the locks are completely clean and fully secure. The EBS storage is encrypted automatically, you can supply your own key or AWS can generate a key.

Compliance

AWS views compliance as a shared responsibility between AWS and the customer. AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud. Customers are also responsible for other factors including the sensitivity of their data, the company’s requirements, and applicable laws and regulations. AWS recommends that you protect AWS account credentials and set up individual users with AWS IAM Identity Center or AWS Identity and Access Management (IAM). AWS also recommends that you never put confidential or sensitive information, such as your customers’ email addresses, into tags or free-form text fields.

AWS recommends that you secure your data in the following ways:

  • Use multi-factor authentication (MFA) with each account.
  • Use SSL/TLS to communicate with AWS resources. We require TLS 1.2 and recommend TLS 1.3.
  • Set up API and user activity logging with AWS CloudTrail.
  • Use AWS encryption solutions, along with all default security controls within AWS services.
  • Use advanced managed security services such as Amazon Macie, which assists in discovering and securing sensitive data that is stored in Amazon S3.
  • If you require FIPS 140–2 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint.
Keep your data secure!

Want more AWS Machine Learning Cheat Sheets? Well, I got you covered! Check out this series for SageMaker Features:

and high level machine learning services:

and this article on lesser known high level features for industrial or educational purposes

and for ML-OPs in AWS:

Thanks for reading and happy studying!

--

--

tanta base

I am data and machine learning engineer. I specialize in all things natural language, recommendation systems, information retrieval, chatbots and bioinformatics