AWS SageMaker for seamless Model building and deployment.
Amazon Web Service provides AWS Sagemaker for all ML engineers, Data Scientists, and even Analysts to clean, transform, apply models, train, and test their data. Machine learning is an iterative process. It requires workflow tools and dedicated hardware to process data sets. Data training tells a machine how to behave in a certain way based on recurring pattern recognition from the given dataset. The data is then taught how to react to fresh data patterns. Once data scientists optimize the ML model, the software development teams convert the finished model into products or API for mass access.
AWS SageMaker is a cloud-based ML platform, that enables us to design, train, and deploy our machine learning models. These models can be easily shared and combined with other online instances for cross-collaboration and integration. We can include these freshly trained data as part of a data pipeline, which extracts data from the source, loads it, cleans and transforms it, then trains and tests the data, creates models, and finally renders a visualization by passing it to a dashboard or some virtually hosted rendering solution, like AWS Quicksight or any other customizable or personalized BI tools.
Before we move into what is SageMaker and its benefits, let’s check why we need it to begin with. We can build our models on a traditional python Dev environment or some online popular notebooks, that enable us to run our code. But the flexibility, scalability, and integration are limited in this scenario and this is where external services like SageMaker come in to streamline the entire process and make its training to deployment a faster and easier process.
While building a model a data scientist or ML engineer will need to perform the following steps.
- Data processing: Nearly 70% to 75% of our time will go into cleaning, transforming, and preparing our data. Data as we know comes from various raw sources and requires tremendous cleaning and transformations. Cleaning requires us to remove null values, outliers, and garbage values. Transformations are converting our data values or results into a more suitable format that be later used for further processing. This varies from project to project and organization to organization as well.
- Select Algorithm: Learning algorithms or ensemble techniques can be used that can extract unique insights or explore the data for domain-specific analysis.
- Selecting frameworks: Frameworks are a collection of libraries or functions that enable developers to build and support their algorithms, they are selected based on our problem statement and business use cases.
- Train, Test and Optimize: TTO (train, test and optimize) once your algorithm has been selected and your code works, we need to train our model with the help of the dataset provided. Firstly we need to split our dataset in train and test, with a common ratio of 75–25 or 70–30, depending on data size. Training the model is basically where your machine learns from your input and predicted data. (Target variables are provided in case of supervised learning). Once the model is built and learned, it will be tested and a score will be provided. Based on the score, we will improve our accuracy and optimize our results by trying different algorithms, changing the features being considered, and selecting a different sample from the dataset.
- Integration: The optimized model then needs to be visualized and presented for further use cases, and thus needs to be integrated with the frontend Dashboards, BI tools or datasheets.
- Deployment: The final step is to deploy all our code, dashboards, data-cleaning platform, and front-end environments.
SageMaker Composition and what it does for us
- Select, clean, process, and train data
- Select ML or other ensemble learning models to train your data
- Test and optimize the data
- Setup the environments
- Deploys the model
- Scale, manage and deploy production environment.
Popular ML frameworks, tools, and programming languages supported
SageMaker model building Pipeline
(From AWS documentation)
We can categorize it broadly into 2 categories Model Build and Model Deploy.
- When we submit the model for training a training job will be created
- A training job will create an S3 bucket for storage, and input data will be fetched from here.
- Once the job is built SageMaker launches the Compute instances.
- Then it trains the models on the training set and stores the output in the AWS S3 bucket.
- The algorithm gets saved in the AWS SageMaker critical system processes on our ML instances.
- Registry to keep track of the on-premises jobs and their current state.
- Cloud watch for setting alerts or periodic updates or job training.
- Amazon SageMaker Model Monitor monitors the quality of the ML models in production. We can set up continuous monitoring for a batch transform job that runs regularly, or on-schedule monitoring for asynchronous batch transform jobs. With Model Monitor, we can set alerts that notify us when there are deviations in the model quality.
- Monitor data quality — Monitor drift in data quality.
- Monitor model quality — Monitor drift in model quality metrics, such as accuracy.
- Monitor Bias Drift for Models in Production — Monitor bias in your model’s predictions.
- Monitor Feature Attribution Drift for Models in Production — Monitor drift in feature attribution.
- S3 buckets are used here as well for result verification and storage.
- A more detailed approach can be seen in the image below
(From AWS Documentation)
Working around SageMaker
Creating roles is very important, it enables you to work with a team, and provides different levels of abstraction, by giving each developer only the access that he/she will require. Thus providing security to your data and preventing any possible losses either due to deliberate actions or natural mistakes.
Sagemaker configuration has 2 ways to go, one is the direct quick setup, where most of our policies and configurations will be predefined and will save us the hassle of going through each stage and setup them up.
SageMaker Setup Link. Need to log in as a root user in order to access it.
As we can see here the quick setup is completed in just 1 minute
Configurations that will be set up automatically
- Public internet access, and standard encryption
- SageMaker Studio Integration
- Sharable SageMaker Studio Notebooks
- SageMaker Canvas
- IAM Authentication
The standard setup will take 10 mins but will give you the option and flexibility to manually set up each and every configuration. Thus enabling you to scale as per load.
- The domain name should be unique to your AWS account and can be literally anything. User profile name and role should be assigned, so as to add other collaborators and contributors to the project.
- In the standard setup, we need to set the authentication method and next set the permissions. In the next step, we would need to set up our networking protocols with it either set to Public or Virtual Private Network (VPC) and also by including the encryption key.
- Select your Jupyter lab notebook version and how you plan to share it with others as a view or editor and what privileges you wish to assign to them. As we can see the storage is present in an S3 bucket and we can also encrypt it while sharing.
AWS SageMaker Canvas helps us to use Machine Learning to generate predictions without needing to code. We can use the SageMaker Canvas UI to import your data and perform analyses.
With new innovations and technologies, building better and more accurate models is not very difficult. Cloud computing has made deployment and integration smoother and better, with us needing to focus on only development and model building.