Day10 of 100DaysofML
Machine Learning/Deep Learning on AWS. I’m going to be explaining from the absolute basics so that the entire process seems easy. But first, why AWS?
AWS or Amazon Web Services provides us with a number of ML based tools that help us in getting our results in a fraction of a second. They also provide us with services that can be embedded into our application for security or any other miscellaneous process.
Let us start by explaining the close relation between AI, ML and DL. Let us take a look at the common hierarchy diagram.
Just from the diagram, it may be seen that under the subcategory of AI, ML and DL come. DL makes use of its complex neural networks which are pretrained on AWS. Companies can make use of the data that they receive from users based on their activity and use that make decisions for the business.
FYI: One of my first projects on AWS was to create a facial recognition project on AWS and it took me about 45 min to do so and integrate it with my own server. That's how powerful AWS is. AWS is making most of its AI based tools open to market for them to integrate with their services. The services of AWS can be integrated using API’s. The 3 key services that they offer are:
1. Vision based: Amazon Rekognition
2. Speech based: Amazon Polly
3. Chatbot based: Amazon Lex
On AWS, while making use of ML based services, there are 3 main steps which we need to keep in our minds. They are:
1. Train your model
2. Evaluate and optimize
3. Retrieve predictions
“Where traditional Machine Learning focuses on feature engineering, Deep Learning focuses on end-to-end learning based on raw features. Each layer is responsible for analyzing additional complex features in the data.”
What resources do you use on AWS?
There are multiple resources available of AWS but computing power is very important on AWS. AWS luckily helps you initialize P type CPU’s (very powerful) to help make predictions and other basic computations or also add a GPU to the instance (EC2 instance) in order to provide more computation power. All these resources can be easily launched with the help of a click and are easily deployable.
Another major advantage of AWS is that it is highly economical. For NN, with over a 100 internal layers, the hardware requirements would be very high and AWS provides us with all the resources which can be launched with just a click.
ML pipeline for business solutions
The most important thing to do while planning to use an ML model for your webapp or for your business is to understand if the ML model is the right thing to do for your specific use case. Lets go to the routes.
ML uses data which is used to train the model which is then used to make predictions. Let us take a call center for instance, there may be a number of fraudulent calls and in order to get the user to the right customer service executive might be a tedious process since there are so many departments. This is where we need to understand our customer pattern and try to redirect them to the right executive so as to improve the process for them. This is just an example though. Now, there are a number of algorithms out there. So, which one should we use? Since we are not exactly looking for values but looking for classes, so we can term it as a multiclass problem.
Now, let's get to the ML pipeline:
- Data collection and Integration: There are a number of sources of data which are related to our problem and it is good to use them but preprocessing is a problem and identifying the features which are important for the problem and which should be used for training our model. One of the important rules to maintain is:
You should have at least 10 times the number of data points/observations as that of the number of features. Eg: if you have 5 features, then you should have a minimum of 50 observations in the dataset. - Data Preparation: As a data scientist, it is important to know about the data that you are missing and what is needed to get the model to be more accurate and if the model matches our expectations. The main job is to manually and critically explore your data along with filling up on anomalies or missing data. We should also verify if the labels are closely related to our ML model.
- Data Visualization and analysis: This is a very important step in understanding the vastness of the data that we are working with. It helps give us a better overview of the outliers and the way in which the data is spread out and splitting up the data in the form of histograms and scatter plots. Scatter plots help us find the relation with our features. We need to identify the noisy observations and eliminate it coz they will affect the performance of the model.
- Feature Selection and Engineering: It involves the selection of features from our dataset. Here, another concept called Feature Engineering comes in whereby a given feature is converted into an engineered feature which would have more use. It is one of the most time consuming part of the ML pipeline.
- Model Training: This is the most important part. Here, we need to split our data into 3 sections:
- Training data: This includes the features and the labels and we pass this to train the model.
- Testing data: This only contains the features whose labels need to be predicted
- Validation data
Make sure to randomize the data in order to make sure that the model is properly trained else the model gets adapted to the fixed set of values.
Low variance and low bias is what we usually need. I'll try to cover this in another blog. - Model Evaluation: After the initial stage of training is done, the model needs to be evaluated using the data that has been split and obtained.
- However, we need to make sure that we do not overfit or underfit the data. This can be identified using the confusion matrix of the model that can be printed using sklearn as well. We can also print the precision and accuracy to evaluate the model.
- Prediction: The final stage whereby we map the input to output and compare both of them.
AWS has the Amazon Sagemaker combines all these features and is one of the most powerful tools that have been developed.
Now, to get us started with the most prominent AWS based ML tools, let us start with Amazon Rekognition.
- Amazon Rekognition is a fully managed AWS based Deep Learning tool that helps us in facial recognition based tasks using a simple API that can be used for authentication as well. It can be used to identify faces, landscapes and even the mood of a person. It can be used with 0 experience in ML. It can also be used for Image moderation by triggering a lambda function. In fact, it has a number of use cases.
- AWS DeepLens is a Deep Learning based video tool for developers. DeepLens is basically a wireless enabled camera and development platform integrated along with AWS cloud and helps with computer vision applications for developers. The project using DeepLens is run along with the help of AWS lambda, AWS Sagemaker and AWS green grass. You might not be familiar with these services so I’d suggest reading about them since they could come in very handy for a number of applications. But just a gist, Sagemaker is used to create a model or import a pre-trained model, lambda creates a project function to make inferences. On the other hand, DeepLens is used in order to help with the loading of the function and for generating the prediction. Greengrass is then used in order to deploy the application project. The basic architecture can be seen in the diagram given below:
Funfact: Rekognition can also be used in order to create a video by taking in samples from an S3 storage and editing it by creating a number of timed cuts.
One of the famous use cases was seen in the Royal Wedding in England as each person entered, they were tagged and identified using AWS. A representation is shown below:
Besides these, we have a number of use cases such as Amazon Deep Composer and Amazon SageMaker about whom I shall be mentioning in the future blogs. Keep Learning.
Cheers.