Stories by Sriramya Kannepalli on Medium

Invoke an Amazon SageMaker endpoint using AWS Lambda

Sriramya Kannepalli — Fri, 01 May 2020 01:48:19 GMT

Step-by-step guide for calling an Amazon SageMaker XGBoost regression model endpoint using API Gateway and AWS Lambda

Assuming you are here because you already have a deployed SageMaker ML model endpoint, trying to understand the process of hosting it using AWS Lambda functions. If yes, then let’s get started..

Note : If not, you can create a XGBoost SageMaker endpoint by reading my previous blog Random Forest and XGBoost on Amazon SageMaker and implement this jupyter notebook.

https://medium.com/media/867cae4bd72dce0a07f95847bdeda816/href

Where is my Endpoint ?

Now after training and deploying the model in SageMaker, you can copy the name of your deployed endpoint from SageMaker>Inference>endpoints from SageMaker Console. We will be using the endpoint name while defining the environment variables in AWS Lambda function.

Creating AWS Lambda Function

Search for ‘Lambda’ in AWS console and click on ‘create function’ under functions.

Screenshot — Create new Function in AWS Lambda

Select Runtime — Python 3.6 and add the below sample code in Function code:

https://medium.com/media/ebff6f0e37989e8cc217958f8b9e8786/href

and click on ‘Save’.

Sample lambda_function-Code in python

Click on edit Environment Variables > Add environment variable and add

Key : ‘ENDPOINT_NAME’

Value: ‘’ and click ‘Save’.

If you are implementing the XGBoost model endpoint used in the above sample tutorial then your value string may be in format :

Value: ‘xgboost-YEAR–MONTH–DATE–xx–xx–xx–xxx’

Defining IAM role

Then scroll up and click on ‘Permissions’ tab.

Click on Execution Role > Role name

You well be redirected to IAM Console.

Click on your policy under Policy Name.
Click on edit policy > JSON
Add comma at the end of existing JSON string and include the following string at the end.

https://medium.com/media/30de5ff3caaaf6470eded50b52f26e78/href

4. Don’t forget to click ‘Review Policy’ and ‘Save Changes’.

This will give your Lambda function permission to invoke a SageMaker model endpoint.

API Gateway

Now search for API Gateway in AWS Console

Click on ‘Import’ under REST API section.

Select ‘New API’ under ‘Create new API’.

Enter ‘API name’ in Settings and click on ‘Create API’. You will be redirected to below screen —

Click on ‘Create Resource’. Enter ‘Resource Name’ and click ‘Create Resource’. For e.g. I named my resource ‘housing-predictor’

Click on ‘Actions’ and ‘Create Method’

Select ‘POST’ method and click on ‘✔️ ’.

Enter Lambda Function name and click on ‘save’.

Now go to ‘Actions’ and hit ‘Deploy API’.

Select ‘New Stage’ and enter Stage Name. For e.g test or prod and click on ‘Deploy’.

Now click on POST under Stages > test and copy the Invoke URL ending with

Note: Your URL should end with and not ‘test’.

Now that we have the Lambda function, an API Gateway, and the test data(copy single data point from your test data), let’s test it using Postman, which is an HTTP client for testing web services. You can download the latest version of Postman here.

Place the Invoke URL into Postman as shown in the following screenshot and choose POST as method. In the Body tab, place the test data as shown in the following screenshot. Choose the Send button and you will see the returned result as “1005792.625” for the case of the test data shown in XGBoost tutorial.

Hurrah!! 🎊🎉🎊 You have created a model endpoint deployed and hosted by Amazon SageMaker. Then you called the endpoint using serverless architecture(an API Gateway and a Lambda function) that invoke the endpoint.

Now you know how to call a machine learning model endpoint hosted by Amazon SageMaker using AWS Lamda serverless Functions.. Congratulations !!! 👏👏👏

Photo by Susan Quiles Photography on Unsplash

Invoke an Amazon SageMaker endpoint using AWS Lambda was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

Random Forest and XGBoost on Amazon SageMaker and AWS Lambda

Sriramya Kannepalli — Thu, 30 Apr 2020 17:58:06 GMT

Step-by-Step process for implementing regression model using Random Forest and XGBoost on Amazon SageMaker and AWS Lambda Functions.

Photo by Kevin Ku on Unsplash

Introduction

I wrote this blog as a part of my virtual talk on Deploying ML models using Amazon Sagemaker and Lambda functions in Minneapolis Women in Machine Learning & Data Science(WiMLDS). So, here we go -

The best way to learn how to use Amazon SageMaker is to create, train, and deploy a simple machine learning model on it, we will take a top down approach, we will directly login into AWS Console, start a SageMaker notebook instance, understand Decision Trees(building block of Random forest and XGBoost) and then train and deploy the endpoints to AWS Lambda.

Let’s get started..

2. Search for Amazon SageMaker in ‘Find Services’ and open SageMaker dashboard.

3. Click on Notebook instances and Create Notebook instance.

4. Enter the Notebook instance name.

Select Notebook instance type ‘ml.t2.medium’ from the dropdown. We only plan to use this notebook instance as development environment and rely-on the on-demand environment to execute heavy lifting training and deployment jobs i.e. We will assign ‘ml.m4.xlarge’ instance in our training and deployment scripts. For info on other notebook instance types, please refer Amazon SageMaker Pricing.

5. Grant permissions to the notebook instance through IAM role, so that necessary AWS resources can be accessed from the notebook without the need to provide AWS credentials every time.

If you don’t have IAM role in place, Amazon SageMaker will automatically create a role for you with your permission.

6. Click on ‘Create Notebook Instance’.

7. It takes around 1–2 mins to change into ‘Active’ status from ‘Pending’.

8. Now Click on ‘Open Jupyter’.

9. You can upload your own files from local using ‘upload’ similar to what you do in a normal Jupyter notebook interface.Remember these files are getting saved in the current ‘ml.t2.notebook’ instance and if you decide to delete the notebook instance after your work is done, you will loose the files too.

10. If you are new to SageMaker, you can always refer to the huge list of ‘SageMaker examples’ written by AWS SMEs as a start point.

Now moving on to the Regression with Random Forest & Amazon SageMaker XGBoost algorithm, to do this, you need the following:

A dataset. We will use Kaggle dataset : House sales predicition in King County, Seattle US. This dataset contains sale prices of houses sold in King County, Seattle, between May 2014 and May 2015.It’s a great dataset for evaluating simple regression models.
An algorithm. We will use the Random Forest algorithm in scikit-learn and XGBoost Algorithm provided by Amazon SageMaker to train the model using the housing dataset and predict the prices.

You also need a few resources for storing your data and running the code in Amazon SageMaker:

An Amazon Simple Storage Service (Amazon S3) bucket to store the training data and the model artifacts that Amazon SageMaker creates when it trains the model( don’t worry move on, we will assign this in our code below)
An Amazon SageMaker notebook instance to prepare and process data and to train and deploy a machine learning model (We already started a notebook instance above)
A Jupyter notebook to use with the notebook instance to prepare your training data and train and deploy the model (If are following along from the beginning, we have our Jupyter notebook open)

We will be writing our code in Python 3 -

Important: To train, deploy, and validate a model in Amazon SageMaker, you can use one of these methods.

Amazon SageMaker Python SDK.
AWS SDK for Python (Boto 3).

Amazon Sagemaker Python SDK vs AWS SDK for Python(Boto 3)

The Amazon SageMaker Python SDK abstracts several implementation details, and is easy to use. If you’re a first-time Amazon SageMaker user, aws recommends that you use it to train, deploy, and validate the model.

On the other hand, Boto 3 is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services.

Today we will learn how to create all of the resources that you need to train, and deploy a model using Amazon SageMaker Python SDK.

The steps include:

Fetching the dataset.
Explore and Transform the Training Data so that it can be fed to Amazon SageMaker algorithms.
Feature Engineering and Data Visualizations.
Prepare the data.
Data Ingestion.
Train a Model.
Launching a training job with the Python SDK.
Deploy the Model to Amazon SageMaker.
Validate the Model.
Integrating Amazon SageMaker Endpoints into Internet-facing Applications.
Clean up

Before we start working with the data let’s quickly understand —

What is a Decision Tree and how Tree Ensembles form the basis for Random Forest and XG Boost?

Let’s start with a decision tree :

Decision Tree

A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogeneous).
Decision tree builds regression or classification models in the form of a tree structure.

The process of repeatedly partitioning the data to obtain homogeneous groups is called recursive partitioning.

Step 1: Identify the binary question that splits data points into two groups that are most homogeneous.

Step 2: Repeat Step 1 for each leaf node, until a stopping criterion is reached.

Source : Diego Lopez Yse (Apr 17, 2019). Decision tree. Retrieved from Medium: https://towardsdatascience.com/the-complete-guide-to-decision-trees-28a4e3c7be14

Fable of blind men and elephant

Source: Jinde Shubham.(Jul 3, 2018). Ensemble learning is Fable of blind men and elephant. Retrieved from Medium: https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e

The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner, thus increasing the accuracy of the model. In the above picture, four blind men are trying to predict an elephant by touching its parts. Though their predictions are right in their own perspective but they are weak learners in term of predicting an elephant. When these weak learners discuss together they can identify an elephant, hence forming an ensemble.

Wisdom of the Crowd

“In an ensemble, predictions could be combined either by majority-voting or by taking averages. Below is an illustration of how an ensemble formed by majority-voting yields more accurate predictions than the individual models it is based on: ”

Source : Annalyn Ng and Kenneth Soo (July 27, 2016). How a tree is created in a random forest. Retrieved from algobeans.com: https://algobeans.com/2016/07/27/decision-trees-tutorial/

Bagging and Boosting:

Source: Zulaikha Lateef(Jun 28,2019). Bagging and Boosting . Retrieved from Edureka.co :https://www.edureka.co/blog/boosting-machine-learning/

Bagging:

“ Refers to non-sequential learning.

- For T rounds, a random subset of samples is drawn (with replacement) from the training sample.

- Each of these draws are independent of the previous round’s draw but have the same distribution.

- These randomly selected samples are then used to grow a decision tree (weak learner). The most popular class (or average prediction value in case of regression problems) is then chosen as the final prediction value.

The bagging approach is also called bootstrapping.”

Boosting:

“ Boosting describes the combination of many weak learners into one very accurate prediction algorithm.

- A weak learner refers to a learning algorithm that only predicts slightly better than randomly.

- When looking at tree-based ensemble algorithms a single decision tree would be the weak learner and the combination of multiple of these would result in the AdaBoost algorithm, for example.

- The boosting approach is a sequential algorithm that makes predictions for T rounds on the entire training sample and iteratively improves the performance of the boosting algorithm with the information from the prior round’s prediction accuracy. “

Source: Julia Nikulski(Mar 16, 2020). Bagging and Boosting. Retrieved from Medium: https://towardsdatascience.com/the-ultimate-guide-to-adaboost-random-forests-and-xgboost-7f9327061c4f

Random Forest

Now, Random Forest is a combination of tree ensemble and bagging.

“ A random forest is an example of an ensemble, which is a combination of predictions from different models. It also uses bagging. Bagging is used to create thousands of decision trees with minimal correlation. In bagging, a random subset of the training data is selected to train each tree. Furthermore, the model randomly restricts the variables which may be used at the splits of each tree. Hence, the trees grown are dissimilar, but they still retain certain predictive power.”

Source : Annalyn Ng and Kenneth Soo (July 27, 2016). Wisdom of crowd . Retrived from algobeans.com: https://algobeans.com/2016/07/27/decision-trees-tutorial/

“In the above example, there are 9 variables represented by 9 colors. At each split, a subset of variables is randomly sampled from the original 9. Within this subset, the algorithm chooses the best variable for the split. The size of the subset was set to the square root of the original number of variables. Hence, in our example, this number is 3.”

Now with this understanding let’s move on to Random Forest implementation on Amazon SageMaker Notebook Instance. For this you need to download the Jupyter notebook from here and data from here. Upload them into your SageMaker notebook instance as explained above and follow along.

https://medium.com/media/ee680fe5ec8ca94701bd7676730f4f24/href

XGBoost Algorithm

XGBoost (eXtreme Gradient Boosting) was introduced by Chen & Guestrin in 2016.

It was developed mainly to increase speed and performance, while introducing regularization parameters to reduce overfitting.

To begin with, let us first learn about the model choice of XGBoost: decision tree ensembles. The tree ensemble model consists of a set of classification and regression trees (CART). Here’s a simple example of a CART that classifies whether someone will like a hypothetical computer game X.
We classify the members of a family into different leaves, and assign them the score on the corresponding leaf. A CART is a bit different from decision trees, in which the leaf only contains decision values. In CART, a real score is associated with each of the leaves, which gives us richer interpretations that go beyond classification. This also allows for a principled, unified approach to optimization.

So Let’s get started with XGBoost implementation on Sagemaker —

https://medium.com/media/867cae4bd72dce0a07f95847bdeda816/href

Integrating Amazon SageMaker Endpoints into Internet-facing Applications

In a production environment, you might have an internet-facing application sending requests to the endpoint for inference. The following high-level example shows how to integrate your model endpoint into your application.

For an example of how to use Amazon API Gateway and AWS Lambda to set up and deploy a web service that you can call from a client application -

Create an IAM role that the AWS Lambda service principal can assume. Give the role permissions to call the Amazon SageMaker InvokeEndpoint API.
Create a Lambda function that calls the Amazon SageMaker InvokeEndpoint API.
Call the Lambda function from a mobile application.

Starting from the client side,

A client script calls an Amazon API Gateway API action and passes parameter values.
API Gateway is a layer that provides API to the client. In addition, it seals the backend so that AWS Lambda stays and executes in a protected private network.
API Gateway passes the parameter values to the Lambda function.
The Lambda function parses the value and sends it to the SageMaker model endpoint.
The model performs the prediction and returns the predicted value to AWS Lambda. The Lambda function parses the returned value and sends it back to API Gateway. API Gateway responds to the client with that value.

But, what is AWS Lambda ?

AWS Lambda is a compute service, serverless computing platform provided by Amazon as a part of AWS that lets you run code without provisioning or managing servers.
It is a computing service that runs code in response to events and automatically manages the computing resources required by that code.

For integrating the endpoints created in this notebook with AWS Lambda please read my blog Invoke an Amazon SageMaker endpoint using AWS Lambda.

Final words on Amazon Sagemaker Pricing -

Try Amazon SageMaker for two months, free!

As part of the AWS Free Tier, you can get started with Amazon SageMaker for free. If you have never used Amazon SageMaker before, for the first two months, you are offered a monthly free tier of 250 hours of t2.medium or t3.medium notebook usage for building your models, plus 50 hours of m4.xlarge or m5.xlarge for training, plus 125 hours of m4.xlarge or m5.xlarge for deploying your machine learning models for real-time inferencing and batch transform with Amazon SageMaker. Your free tier starts from the first month when you create your first SageMaker resource.

References

Diego Lopez Yse(Apr 17, 2019). Decision tree. Retrieved from Medium: https://towardsdatascience.com/the-complete-guide-to-decision-trees-28a4e3c7be14
Jinde Shubham.(Jul 3, 2018). Ensemble learning is Fable of blind men and elephant. Retrieved from Medium :https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 785–794. DOI:https://doi.org/10.1145/2939672.2939785
Annalyn Ng and Kenneth Soo (July 27, 2016). How a tree is created in a random forest. Retrived from algobeans.com: https://algobeans.com/2016/07/27/decision-trees-tutorial/
AWS SageMaker Screenshots and figures. Retrieved from Amazon Web Services, Inc. : https://docs.aws.amazon.com/sagemaker/#amazon-sagemaker-overview

Random Forest and XGBoost on Amazon SageMaker and AWS Lambda was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

Quick setup instructions for installing PyTorch and fastai on Raspberry Pi 4

Sriramya Kannepalli — Sat, 04 Apr 2020 04:33:21 GMT

PyTorch and fastai on Raspberry Pi 4 Model B for doing deep learning tasks like image classification and object detection.

source: https://www.raspberrypi.org/products/raspberry-pi-4-model-b/

Yes, like everyone who just started with Raspberry Pi to test their deep learning models, even I got too excited and curious to deploy my image classifier Baby_Vibes built with deep learning libraries PyTorch and fastai on my brand new Raspberry Pi 4 model B brought from amazon.com

Before I could jump into working with my model inference, there was no clear documentation readily available for setting up Raspberry Pi 4 with Pytorch and Fastai on Python 3.7. However, I found this discussion link from fastai forums very useful to get started.

Let’s quickly understand what are we dealing with —

What is PyTorch?

Source : https://pytorch.org/

PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. It is primarily developed by Facebook’s AI Research lab. It is free and open-source software released under the Modified BSD license.

There are several PyTorch online tutorials and YouTube videos available now but still to highlight my favorites : the official PyTorch tutorials and fastai — Practical Deep Learning for Coders, v3

What is Fast.ai?

source: https://www.fast.ai/

fastai is a modern deep learning library, available from GitHub as open source under the Apache 2 license, which can be installed directly using the conda or pip package managers. It includes complete documentation and tutorials, and is the subject of the book Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD (Howard and Gugger 2020).

And finally what is Raspberry Pi 4?

Your tiny, dual-display, desktop computer

…and robot brains, smart home hub, media centre, networked AI core, factory controller, and much more..

Raspberry Pi 4 Model B

Below are the libraries/packages we will be installing -

Python 3.7
PyTorch dependencies
torch
torchvision
fast.ai
fast.ai dependencies

Note: If this is the first time you are switching on your Raspberry Pi, please refer Raspberry Pi 4 Getting Started video and complete the initial boot up. Once your are done, open “Terminal” as shown below -

Raspberry Pi 4 Model B Terminal Screen

Run the below command to get hold of the ARM processor configuration of our Pi required for searching compatible PyTorch wheel -

uname -a

If the output is armv7l GNU/Linux continue with the installation.

Wait !!! What is ARM processor and how do they differ from Intel processors present in most of our desktop PCs ??

ARM (Advanced RISC(Reduced Instruction Set Computing) Machines ) has been at the center of modern microprocessors and embedded design.

ARM processors are extensively used in consumer electronic devices such as smartphones, tablets, multimedia players and other mobile devices, such as wearables. Because of their reduced instruction set, they require fewer transistors, which enables a smaller die size for the integrated circuitry (IC).

Intel processors, on the other hand, fit into a family called CISC which stands for Complex Instruction Set Computing.

Unlike RISC computers, the instructions available on a CISC are more focused on performing complex tasks with large amounts of flexibility. Intel, for its part, has mainly produced processors aimed at high performance and high throughput environments, including desktop PCs, laptops, servers, and even supercomputers.

If you are interested to know more read this — Understanding the Differences Between ARM and x86 Processing Cores.

So that’s the reason why we cannot use a python package from our PC/Laptop/standard x86_64 machine directly into our Raspberry Pi since it will not be compatible with the processor architecture of Pi, instead python packages installed in Pi should be compiled for its specific architecture.

This being said there are two ways of installing PyTorch on Raspberry Pi:

Building PyTorch from source: If you are interested in this , please refer to the amazing blog — Building PyTorch for the Raspberry Pi boards
Using pre-built PyTorch wheels uploaded by some great people who did all the hard work for us.

We will be going with the easy one, option 2 and using the pre-built PyTorch wheels uploaded by others compatible with armv7l GNU/Linux.

So go ahead and download torch-1.3 and torchvision-0.4 wheel files and copy them to your pi with USB or directly download from the links using chromium browser in your Pi.

Python has two flavors, Python 2x and Python 3x. We will be working with Python3x for our installations.

In a terminal window, check for python 3.7 by typing :

python3 --version

If you get python version as 3.7, continue with the installation, if not then refer How to install Python 3.7 on Raspberry Pi. If you are new to virtual environments please refer this and move on.

sudo apt update && sudo apt upgrade
mkdir project_folder
cd project_folder
python3 -m venv env
source env/bin/activate

Install PyTorch dependencies first:

sudo apt install libopenblas-dev libblas-dev m4 cmake cython python3-dev python3-yaml python3-setuptools python3-wheel python3-pillow python3-numpy

Ensure torch and torchvision wheel files are saved in the project_folder and type in terminal:

sudo apt install libatlas3-base

sudo pip3 install numpy

python3 -m pip install Pillow==6.1

pip3 install torch-1.3.0a0+deadc27-cp37-cp37m-linux_armv7l.whl

pip3 install torchvision-0.4.0a0+d31eafa-cp37-cp37m-linux_armv7l.whl

pip3 install fastai --no-deps

To test if everything is installed correctly, log into your python terminal and run the commands:

$ python3.7

>>> import torch

>>> import torchvision

>>> import fastai

If you get further errors while doing —

from fastai.vision import *

Create new text file in project_folder and copy below contents. Name it requirements.txt

beautifulsoup4
bottleneck
fastprogress>=0.2.1
matplotlib
numexpr
nvidia-ml-py3
packaging
pandas
pyyaml
requests
scipy

Now type in terminal:

pip3 install -r requirements.txt

This should resolve your error and import torch, torchvision and fastai successfully for vision projects. We have skipped spacy dependency needed for the fastai text package, so for vision it should work. Refer fastai forums discussions.

Final Note : You may get warnings if you used a different version of PyTorch for training your model file ‘export.pkl’ and installed a different version of PyTorch on pi. If you feel these warnings can be ignored, python has a warnings module in sys-system specific parameters and functions library and this is how you can handle it -

import sys

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

Add this code snippet to your inference.py or app.py file.

If you could come along up to here. Pheww.. Yes we did it !! we are all set to test our deep learning PyTorch fastai image classification model inference on Raspberry pi 4 Model b. Hurray!!

https://medium.com/media/a9269748f7d375e66eeb341ae47e9dfd/href

If you want a sample code for testing, please clone Baby-Vibes github and start working.

Baby-Vibes uses Image classification to identify crying baby and pass a Voice command to Google Home to play their favorite cartoon for e.g. Tom and Jerry in this case, while we excuse us from every one and come to their rescue.

If you are interested in learning Pytorch fastai Azure Web service deployment you can refer this article.

Happy Coding !!!😊😊😊

References:

Quick setup instructions for installing PyTorch and fastai on Raspberry Pi 4 was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

PyTorch Web Service deployment using Azure Machine Learning Service and Azure Web Apps from VS Code

Sriramya Kannepalli — Fri, 20 Mar 2020 19:34:35 GMT

This blog is mainly focused on deploying a PyTorch fastai deep learning Image Classification model to Azure Web apps/Azure Web Services using Azure machine learning SDK and VS Code in Linux.

Photo by Valeria Zoncoll on Unsplash

Data collection and prototyping in Google Colab
Training on Azure Machine Learning service.
Deployment in Azure Container Instances as Azure Web Service.
Consuming the web service from front end deployed as a Python flask Azure Web app.

Data Collection and prototyping in Google Colab:

Check out fastai’s build your own data set module which contains a few helper functions to allow you to build your own dateset for image classification. In this project I downloaded Images of crying babies and happy babies from google images and quickly tested them in google colab for initial baseline. Check my Google colab notebook .

Training on Azure Machine Learning:

Once our dateset and model parameters are base lined, I chose Microsoft Azure for training and deployment as a web app.

Prerequisites -

VS Code enabled with following extensions:

Python
Azure Account
Azure CLI Tools -The Azure CLI provides commands for managing Azure resources.
Azure Machine Learning — The machine learning extension to the CLI provides commands for working with Azure Machine Learning resources.
Azure App Service
Azure Storage

Note: If you are new to VS Code, refer to this.

An Azure subscription. If you do not have one, try the free or paid version of Azure Machine Learning.
Git local

We will be using the GitHub repository Baby Vibes — Image classifier to detect crying babies and play Tom and Jerry for making them laugh.

Let’s get started -

Open ubuntu shell and type:

>> mkdir Baby_Vibes
>> cd Baby_Vibes
>> git clone https://github.com/SriramyaK/Baby_Vibes_Pytorch_Azure_Webservice
>> code .

Open new terminal from VS Code and type:

>> conda env create -f myenv.yml
>> conda myenv/bin/activate

Login into Azure from VS Code Machine Learning space:
Click on “Create Workspace”

Input a name for Azure ML Workspace name
Select an Azure Subscription
Select a resource group or create a new resource group if you don’t have any.
Select a location
Select a workspace sku “Basic”

You can also check out other ways of creating workspaces:

4. Open Baby_Vibes_train.ipynb. Update your workspace name, subscription-id and resource group and select “Run all cells”.

5. After training is done successfully download ‘Babies.pkl’ from “VS Code > Machine learning > workspace > Models > Babies.pkl” and save it as ‘export.pkl’ in the project folder.

Why as ‘export.pkl’ only ? Check docs of fast.ai

6. Now you can detach Machine Learning compute from “VS Code > Machine Learning > Subscription name > workspace name > compute > gpu — compute” to save dollars and move onto deployment phase.

Deployment as Azure Web Service using Azure Containers:

Open deploy.ipynb update your workspace name, subscription-id and resource group and select “Run all cells”.

Checkpoints:

Ensure “export.pkl” and score_and_track.py are all in the current project folder.
We are deploying with the below azure container instance configuration. Check deploy.ipynb

aciconfig = AciWebservice.deploy_configuration(cpu_cores=2, memory_gb=4,tags={‘data’: ‘Babies’, ‘method’:’transfer learning’, ‘framework’:’pytorch’},description=’Image classification of Baby Vibes’)

which means we are using 2 cpus and 4gb memory for running our model inference. (Remember we removed our gpu compute after training the model!!!)

Make a note of your web service uri, we need it for Web app deployment

Consuming the web service from front end deployed as a Python flask Azure Web app:

Mkdir flask_deploy
Cd flask_deploy
python3 -m venv env
Source env/bin/activate
Move the following files into flask_deploy folder and ensure your folder structure looks like this:

6. Click on Azure App service pane in VS Code and create a new Web App.

7. Enter a globally unique name for the new web app, in our case ‘babyvibes1’.

8. Select runtime for our linux app i.e python 3.6

Once we get a message that new web app “BabyVibes1” is created we can go to Home>All resources>appsvc_linux_centralus >Apps > BabyVibes1 > Configuration > General Settings

Copy text from “startup.txt” saved in your current “flask-deploy” folder into the “Startup Command” field shown in the above screen.

Why are we doing this?

The reason is App Service uses Gunicorn WSGI HTTP Server to run an app, which looks for a file named application.py or app.py. Since our main module is in app.py file, we have to customize the startup command as shown below -

“gunicorn — bind=0.0.0.0 — timeout 600 app:app”

For more info read azure docs

Our goal is to deploy our app from local git to Azure App Service that we created. To enable that, we first have to configure some deployment settings. Click “Deployment center”, and select “Local Git”:

Scroll down to see “local git” option

On the next step, select “Kudu” as the build server:

Click on “Continue”

Click “Finish”, wait for notification and you will get Git Clone Uri like this:

https://babyvibes1.scm.azurewebsites.net:443/babyvibes1.git

Also, click “Deployment Credentials” to see your app credentials:

Here you can see your app username and password, which you can change or create another user credentials for your deployment purposes.

Now we are ready for our deployment. Go to your local terminal and add Azure remote to your local Git repository. Replace with the URL of the Git remote that you get in the previous step i.e. https://babyvibes1.scm.azurewebsites.net:443/babyvibes1.git

Initiate git repository in the current folder:

>> git init
>> git add .gitignore
>> git commit -m “.gitignore added”
>> git add .
>> git commit -m “Code files added”
>> git remote add azure-baby https://babyvibes1.scm.azurewebsites.net:443/babyvibes1.git

Now commit any changes to local git:

>> git commit -a -m “first commit”

And push to the “azure-baby” remote to deploy your app with the following command:

>> git push azure-baby master

When prompted for credentials by Git Credential Manager, make sure that you enter the credentials you created in Configure a deployment user, not the credentials you use to sign in to the Azure portal.

This command may take a few minutes to run. We can check the status of Web App “babyvibes1” in VS Code and do the final “Deploy to Web App” step -

Select the folder to zip and deploy i.e. in our case flask_deploy

We can see a pop up like this at the bottom left corner :

Followed by successful deployment like -

If deployment is successful you can see the below website page -

https://babyvibes.azurewebsites.net/

References:

PyTorch Web Service deployment using Azure Machine Learning Service and Azure Web Apps from VS Code was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

Working on Jupyter notebooks in VS Code from virtual conda environment

Sriramya Kannepalli — Tue, 29 Oct 2019 00:41:48 GMT

Now we can do native editing of Jupyter notebooks from Visual Studio Code, visualize interactive graphs and deploy data science projects from one place!

Photo by Caspar Camille Rubin on Unsplash

When I came across this announcement by Microsoft regarding Native Editing of Jupyter Notebooks in VS Code I got excited to use this feature as you can now directly edit .ipynb files and get the interactivity of Jupyter notebooks with all of the power of VS Code.

Why VS Code ?

“VS Code has a lot of built-in features like debugging, embedded Git control and GitHub, syntax highlighting, intelligent code completion, snippets, and code refactoring. It is very much customizable, allowing users to change the theme, keyboard shortcuts, preferences, and install plugins that add additional functionality. It also has a terminal embedded into it.” — Datacamp

But while trying out this new feature, being new to VS Code and Anaconda Virtual environment, took some time for me to fix environmental issues. I would like to share my experience mainly for people like me who are used to doing data science in jupyter notebooks and want to start using VS Code native editing of jupyter notebooks.

Let’s get started:

Install Anaconda from anaconda.com (If you are confused between Anaconda vs Miniconda vs conda go through the excellent blog by Daniel Bourke. If you already have Anaconda installed in your system you can start from step 3)
If you prefer using a command line interface (CLI), you can use conda to verify the installation using Anaconda Prompt on Windows or terminal.
To open Anaconda Prompt:

Windows: Click Start, search or select Anaconda Prompt from the menu.

Source: anaconda.com

Note: Anaconda prompt comes with Anaconda installation and is different from the regular Windows powershell or command prompt). For detailed instructions check out official Anaconda website.

4. a) If you want to create virtual conda environment/project folder in the base location:

(base)C:\Users\

Type in Anaconda prompt :

(base)C:\Users\ conda create -n my-proj python=3.7 pandas numpy matplotlib scikit-learn jupyter notebook

With the above command we are asking conda to create a virtual environment by name my-proj with a specific version python 3.7 numpy matplotlib scikit-learn and jupyter notebook packages. You can edit the package names based on your project requirement.

Type in the below command to activate the virtual environment.

(base) C:\Users\ conda activate my-proj

(my-proj) C:\Users\ code

Additional libraries can be added with ‘conda install ’ or ‘pip install ’ commands.

5. Add your project folder to the VS Code project workspace.

Note: Ensure Visual Studio Code is installed and Anaconda, Python and Jupyter extensions are enabled. Jupyter Notebook support for visual studio code gives complete details for accessing Jupyter Notebook from Visual Studio Code.This blog is mainly focussed on native editing of jupyter notebook from conda environment.

6. Open terminal from VS Code and ensure you are in in the virtual conda environment. If not activate environment once again —

C:\Users\\Anaconda3\envs\my-proj> conda activate my-proj

7. Select interpreter from available interpreters and ensure it is same as the virtual conda environment path that we created above followed by python.exe:

 C:\Users\\\python.exe

8. In case you are getting errors, open .vscode folder created in our folder.Verify settings.json ‘python.pythonPath’ dictionary value set to

“C:\\Users\\\\\\python.exe”

If not edit the path to the current virtual environment location followed by ‘python.exe’ and replace ‘\’ with ‘\\’.

9. Create a new jupyter notebook and start coding!!

Download plot_iris_dataset.ipynb and upload in folder to test the native editing of jupyter notebooks in Visual Studio Code. If everything works fine you will be able to see the below visualization.

Plot viewer

The Plot Viewer gives you the ability to work more deeply with your plots. In the viewer you can pan, zoom, and navigate plots in the current session. You can also export plots to PDF, SVG, and PNG formats.

Within the Notebook Editor window, double-click any plot to open it in the viewer, or select the plot viewer button on the upper left corner of the plot (visible on hover) — code.visualstudio.com

Disclaimer : It is taking around 2–3 min for starting jupyter server for the first time in virtual environment!!!

Thank you to the following for helping me understand Anaconda, miniconda and Virtual environments in Python and experimenting with Visual Studio Code.

Working on Jupyter notebooks in VS Code from virtual conda environment was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

Vizag Smart City to model itself on San Francisco

Sriramya Kannepalli — Sat, 08 Jun 2019 13:08:03 GMT

K-Means Clustering And Segmentation Of Neighborhoods — Unsupervised Machine Learning Algorithm

Report based on Machine Learning/K-means Algorithm Neighborhoods segmentation and clustering

Introduction
Data collection and preprocessing
Methodology
Results
Discussion and Conclusion

1.1 Description and Discussion of the Background

Based on the article published in Business-Standard.com on 15th June 2016

Can Vizag remodel itself as San Francisco? If yes, what is the current growth rate of both the cities? How are they interrelated with respect to geography and demographics?

As a part of IBM Data Science Professional Certificate Course, I decided to explore the neighborhoods of Vizag, AP, India with the neighborhoods of San Francisco, US to understand the investment opportunities and the city overall growth and development at par with San Francisco using Clustering & Segmentation techniques, ML (Machine Learning). Data Visualizations (using seaborn and matplotlib in Python) are created to explore GDP, Per Capita Income, climatic conditions, tourism and educational institutions of both the cities. All these data points will help us understand the rate of growth in Visakhapatnam and scope of development in different sectors.

1.2.Problem

Data that shows the current status of the 2 cities and identifies potential areas and different sectors of investment in Visakhapatnam. This is achieved by comparing the neighborhoods of Vizag and San Francisco and visualizing data for identifying patterns in their geographical and demographic similarities.

1.3.Interest

This project will highlight the investor opportunities with increased scope of attracting NRI (Non-Resident Indians) investments which can help Vizag to realize its ambitious economic growth goals while preserving and enhancing livability for the benefit of local citizens.[6].

2. Data

2.1 Data Requirements

Following datasets have been used in the project:

Postal Codes of Visakhapatnam. Data has been scraped and cleaned from Yo!Vizag — City’s Exclusive Magazine and Portal [1] using Beautiful Soup and pandas libraries and saved in .csv format.
Foursquare API to get the most common venues of given boroughs of Visakhapatnam and San Francisco respectively.[2]
Visakhapatnam [3] and San Francisco Wikipedia Pages [4] have been scraped and cleaned for creating Word clouds.
Zip codes of San Francisco. Data has been downloaded in .csv format from https://datasf.org/and cleaned using pandas.
Economy of Visakhapatnam
Per Capita Income of San Francisco
Population data of Visakhapatnam
GDP data of San Francisco

2.2.Data Analysis:

2 Cities will be analyzed in this project: Visakhapatnam and San Francisco.

I will be using the below datasets for analyzing Visakhapatnam.

Data 1: Neighborhood has a total of 684 areas. Most notable areas of the city include urban areas like Dwaraka Nagar, Gajuwaka, Gopalapatnam, Jagadamba Centre, Maddilapalem, Madhurawada, Seethammadhara and semi-rural suburbs such as Simhachalam, Pendurthi, and Parwada.

Data has been scraped and cleaned from Yo!Vizag — City’s Exclusive Magazine and Portal– using Beautiful Soup and pandas libraries and saved in .csv format.

Code Link — https://gist.github.com/SriramyaK/662166d0fa646b4bacd918bb2b03be2d

We don’t have Geocodes data readily available— latitudinal/longitudinal coordinates required for plotting a folium map. I used GeoPy — geopy is a Python 2 and 3 client for several popular geocoding web services.

Geopy makes it easy for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources to get the data.

Code Link — https://gist.github.com/SriramyaK/ca9993d229e744ca23c1e1b492d49b1d

San Francisco Data:

Data 2: SFO has total of 36 neighborhoods [4]. But due to limited data available we could analyze only 26 neighborhoods. Data has been downloaded in .csv format from https://datasf.org/ and cleaned using pandas. Below are the 1st five neighborhoods:

Data 3: For the below analysis we will get data from Wikipedia:

Visakhapatnam and San Francisco City Demographics.
Visakhapatnam Tourism and Attractions.
San Francisco Tourism and conventions.

Data 4: Visakhapatnam and SFO geographical coordinates will be utilized as input for Foursquare API, that will be leveraged to extract information for each neighborhood respectively.

Data 5: Population, GDP, Per capita Income, Tourism, Educational Institutions and Weather data of Vizag and San Francisco.

3. Methodology:

Analytic Approach:

In this project, first part is clustering of Visakhapatnam using k means algorithm. Visakhapatnam has 648 pin codes/areas/postal codes, geocodes of only 326 locations have been included in the data analysis. We will explore the areas around central Visakhapatnam and compare it with the neighborhoods of San Francisco to understand the geographical similarities.

2nd part comprises of clustering of San Francisco. For San Francisco out of 36 neighborhoods venues of 27 neighborhoods have been explored in this project using Foursquare API.

3rd part includes data visualizations and comparison of available data of both the cities for insights to take investment decisions in Vizag. Word clouds created from the wiki pages of Vizag and SFO further add value to our discussion.

Exploratory Data Analysis:

Data 1: Visakhapatnam Geographical Coordinates Data.

We use geopy and folium libraries to create a map of Visakhapatnam city with neighborhoods imposed on it. 326 areas are plotted using their latitude and longitude values to obtain a high-level visualization of the neighborhoods.

Fig: Visakhapatnam Neighborhood Visualization

Now let’s explore venues around Andhra University, one of the most prestigious and oldest university in Andhra Pradesh located in central Vizag. We selected this location as Andhra University is located on the uplands of Visakhapatnam, the university campus is scenic, with the Bay of Bengal on one side of it and on the other, the green Kailasagiri hill range. This location is apt for our analysis as San Francisco was also chosen because of the geographical similarity.

Longitude and Latitude values of Andhra University, Sivajipalem Road, Sector 4, Pedda Waltair, Visakhapatnam, Andhra Pradesh, 530001, India are 17.7376312,83.3300513027767.

Now, let’s get the top 10 venues that are in Andhra university within a radius of 500 meters.

Foursquare API gave only 2 unique venues

Now we repeat the same steps for all the neighborhoods around Andhra university to get the most common venue categories. Snap shot of first 5 neighborhoods and their venue categories.

There are 39 unique categories of venues in the neighborhoods of Andhra University.

Now we repeat the same for all the neighborhoods in Visakhapatnam city. Let’s look at first 2 neighborhoods with the top 5 most common venues to get an idea. Refer here for Code

Now we run the k-means algorithm to cluster the neighborhoods into 4 clusters. The no. of clusters is decided by using Elbow method for optimal k. In our scenario the optimal no. of k Is 4.

Below horizontal Bar Chart shows the count of most common venues in each cluster. Based on the analysis, we can clearly see the presence of clothing Store/Shopping complex in every cluster which shows the amount of urbanization and development throughout every neighborhood of Visakhapatnam. Breakfast spots, food restaurants are other common venues in cluster 1 and 2.

Horizontal Bar Chart using Matplotlib — https://gist.github.com/SriramyaK/dd6a590eff690d8c12b2930bb89d2ad0

Fig. Clustering and segmentation of Visakhapatnam using k means algorithm –

Cluster 1 has the maximum no. of venues and development. There is a significant population increase in recent past. Below is the Bar chart depicting the population of both the cities in last 5 years.

Let’s explore the data further.

We can see the presence of Historic sites, harbor, fish markets and beach which gives us some idea on the geographical similarity between Vizag and SFO. Let’s Visualize this in word clouds with Tourism data of Vizag and San Francisco scraped from the travel website TripAdvisor. In case you are wondering what is Word cloud, you can refer to the Datacamp Tutorial:

Fig. Word cloud of San Francisco list of tourist Attractions:

Word Cloud of list of Visakhapatnam tourist attractions:

Above word clouds signify the similarity in the two cities Museum, Park and Beach/Bay being the most common among them. Some other already existing natural tourist spots adding to the beauty of the city are waterfalls, caves, hills, wildlife and temples in Visakhapatnam.

But when we closely observe the word cloud of tourism of San Francisco, there are several untapped opportunities like Fisherman’s wharf, Pier 39, Twin Peaks, Big Bus Hop on Hop off tour etc. that can be implemented in Visakhapatnam due to similar geographical features and weather conditions.

Box plot of weather conditions of Visakhapatnam and San Francisco in a Year:

The hot and humid conditions of Visakhapatnam as compared to San Francisco clearly show huge scope for establishment of amusement water parks and recreational activities. Cruises, Sailing, Hiking trails and Water tours can create major spike in tourism and boost GDP of Visakhapatnam.

Though there is a significant difference in the GDP and Per Capita Income of Vizag and San Francisco, Visakhapatnam has managed to top the charts of urban population amongst all the 13 districts in Andhra Pradesh, India. According to data uploaded onto the CM’s Dashboard, the 2011 Census of India states that Visakhapatnam stood first in the state with 47.45% of urban populace.

The difference in GDP and Per Capita Income of the two cities signify the importance of technology and investments required for the city to remodel itself as San Francisco in the next 10 years.[1] Achieving the vision will require a “Smart City” approach to regional development and infrastructure planning and delivery. For further information please refer the below link — https://www.smartvizag.in/index.php/projects/

To Summarize, I created word cloud using seaborn libraries and web scraping Wikipedia page using beautiful soup.

Fig. Word Cloud of Visakhapatnam Wikipedia Page

In this word cloud we can clearly see that Visakhapatnam has a coast, port, railway, naval base, university, stadium and is a metropolitan city with historic sites and international airport.

With this information we move on to the analysis of San Francisco and identify potential ideas for development.

Data 2: SFO Geographical Coordinates Data is downloaded in .csv format from https://datasf.org/and cleaned using pandas. We explored 27 neighborhoods of San Francisco in our analysis.

Below are the first 5 neighborhoods.

Fig. SFO Neighborhood Visualization using Folium and geopy libraries.

As we explore each neighborhood further for identifying similarities with Visakhapatnam, let’s start with venues around the neighborhood surrounded with Beach in SFO.

Now, let’s get the top 100 venues that are in North Beach, SF, California within a radius of 500 meters.

Foursquare API gave 100 unique venues. Let’s explore the data –

We will do the same analysis for all the neighborhoods of North Beach, SF and explore the venues returned by Foursquare API to understand the most common venue categories.

We repeat the same for all the neighborhoods of SF. There are 261 unique categories in SF. Now we run the k-means algorithm to cluster the neighborhoods into 4 clusters. The no. of clusters is decided by using Elbow method for optimal k. In our scenario the optimal no. of k Is 4.

Fig.Clustering of neighborhoods of San Francisco using k means algorithm

Below is the horizontal bar chart for most common venues in each cluster

As we can see from the above analysis, neighborhoods in cluster 0 are highly developed with wide range of restaurants, dance studios, juice bars, coffee shops, event spaces etc. The venues in Vizag and San Francisco are largely different and unique in nature due to different levels of development/urbanization rates in both the cities.

But this analysis gives a high-level idea on the new categories of venues that can be invested in Visakhapatnam and tailored based on the needs of local population. Some categories like juice bars, dance studios, event places which currently are not present in the most common venue categories in Vizag leaves some scope for new investments.

Finally, we will look at the word cloud of San Francisco created from Wikipedia to explore further.

Fig. Word cloud of San Francisco created from Wikipedia

We can see the words military, Bay area, hill, Pacific Ocean, Ferry, waterfront, historic building etc. which show some similarity in the geographic and demographic data of Visakhapatnam and San Francisco.

Word cloud of Educational institutions and universities in San Francisco:

Despite its limited geographical space, San Francisco, California is home to a multitude of colleges and universities. San Francisco Conservatory of Music, San Francisco School of Digital Filmmaking, San Francisco Art Institute and Art Institute of California — San Francisco, a private campus which focuses on video game and design-based education (interior, fashion etc.) are some of the unique colleges and universities which can be further explored and established in Visakhapatnam.

4. Results:

Though we could show limited results in demographic and geographical factors from the given data set in the clustering and segmentation of the two cities and word clouds of the Wikipedia pages of Visakhapatnam and San Francisco, but we could bring out some business ideas on the new venue categories like dance studios, juice bars, coffee shops, event spaces and wide range of restaurants like sushi restaurant, Mediterranean restaurant etc. which can be tailored based on the priorities and interests of local population in Visakhapatnam. Tourism when developed in right way with advanced technologies and FDI can play major role boosting city’s economy to remodel itself as San Francisco in near future. Educational Institutions form one more area of potential development.

5. Discussion and Conclusion:

Tourism has huge potential of development as a part of Smart city initiatives in Vizag. Cruises, Sailing, Hiking trails and Water tours can create major spike in tourism and boost GDP of Visakhapatnam.
Educational Institutions data can be explored further.
Business investor looking for real estate investment can further explore areas/neighborhoods in cluster 1 of Visakhapatnam as these are the areas having the highest development with restaurants, breakfast spots, shopping complex etc. as compared to the places in other clusters.
For people interested in coming up with startup ideas in the food sector of smart city — dance studios, juice bars, coffee shops, event spaces and wide range of restaurants like sushi restaurant, Mediterranean restaurant etc. are some of the new business ideas that can be experimented with based on further data analysis.
Individual investors looking for investment in residential plots can further explore areas in cluster 0 and cluster 2 of Visakhapatnam.

For Code: Github

References: