An introduction to the Computer Vision services on Azure

A look at the Azure computer vision services including AutoML for Imagery.

Ethan Jones

Published in

Geek Culture

10 min readApr 9, 2022

Introduction

Azure is quite an extensive cloud platform with what sometimes seems like an endless number of different services; today we’ll be focusing on a small subset of these — Computer Vision or CV services. Azure offers CV services that are low-code / no-code, e.g. Custom Vision as well as programmatic, e.g. AutoML for imagery. We will also briefly touch on the Azure CLI for creating resource groups and resources.

Options on Azure

At the time of writing, we currently have the following CV options on Azure:

Azure Computer Vision — Part of the Azure Cognitive Services offering, Computer Vision offers users the chance to get hands-on with out-of-the-box algorithms for a variety of use-cases.
Azure Custom Vision — Custom Vision offers users the chance to get hands-on and train their own custom models in a low code/no code environment whilst offering an easy-to-use SDK for programmatic development.
AutoML for Imagery (Public Preview) — Part of the Azure ML offering, AutoML for Imagery offers users a code-first approach to training and deploying models.

Now we’ve been introduced to the services, let’s dive a bit deeper into each. Feel free to jump straight to the service that you're interested in as opposed to reading the whole post!

Azure Computer Vision

Azure’s Computer Vision service is part of the Azure Cognitive Services offering and allows users to interact with a range of specialised, out-of-the-box models via API endpoints. The Computer Vision service offers OCR capabilities as well as image and spatial analysis.

Creating the resource — Azure CLI set-up

Before we dive into the example, we first need to create a Computer Vision resource within our Azure subscription — you can create a free subscription here. We’ll be creating this resource using the Azure CLI, but it can easily be done in the portal as well.

Once you’ve installed the CLI from the link above, we can get started with creating a new resource group and also a Computer Vision resource.

az login

First off, we want to sign in using the command above. Upon running, you should be taken to the Azure login page and once you’ve logged, a list of your subscriptions should be displayed.

az group create -l uksouth -n medium_blog_3

Next, let’s create a resource group using the above command — be sure to replace the location and name flags to match your needs.

az cognitiveservices account create -n "Comp_Vision_Medium" -g "medium_blog_3" --kind "ComputerVision" --sku "S1" -l UKSouth --yes

Next, we’ll create a Computer Vision resource using the above command. This should yield something like this when looking in the portal:

From here you will need to fetch the key and endpoint from your resource by navigating through the Azure portal to the Keys & Certificates tab within the Computer Vision resource.

Programmatic example

Now we’ll have a look at a programmatic example of using some of the Computer Vision API. Please bear in mind this example is in Python so assumes that you have it up and running on your machine.

Now that may seem like one huge block of code, but we’ll be breaking it down — don’t fret! Here I have also followed the guidelines and advice I set out in my blog about docstrings to help with code readability — check it out here!

Lines 1–14 cover the module’s docstring and also imported libraries that we’ll need for the script! Just after this point you can see the global variables that will store your key and endpoint that we fetched at the end of the last section.
Lines 20–39 cover the command line arguments from the script that include any image path or URL, as well as the custom print function for this script.

Lines 40 and beyond is where the script begins to be fleshed out…

The next function defined is the auth_client() function which uses the key and endpoint to authenticate our client ready for querying any API endpoints.
Lines 49–76 is where we’ll find our first API call — here we’ll be querying the OCR endpoint using a remote image. As you can see, we pass in the authenticated client as well as the optional parameters of the image URL and also whether we want the bounding boxes printed. Line 62 onwards is where the request and processing of the response occurs.
Lines 78–103 is pretty much a carbon copy of the remote image function but for a local image instead — the difference being on line 89 where we call a different endpoint.
After that we find the main() function which dictates the flow of the program.

Azure Custom Vision

Azure Custom Vision is an image recognition service that allows users to build and train their own custom model — the service allows users to specify their own labels and subsequently detect them. The service is available in the form of an SDK or a no-code friendly web portal — you can create, test and train models in either interface depending on your preference.

Creating a Custom Vision resource

To begin with let’s create a Custom Vision project by navigating over to the portal over at Custom Vision — Home. Once signed in, you’ll be prompted to create a resource and subsequently a project — for this example we’ll be choosing an object detection project and selecting General [A1] as our domain.

No-code development of a model

So, we’re all set-up with our resource and project which means we can now start creating our model! For this example, I’ll be creating a model to detect the 3-ball from an American pool set of balls.

To begin with, we need to upload our images to the project which intuitively can be done using the ‘add images’ button. From there, the next step is to now tag our images with our own labels — adding a new label class can be done by selecting an image and then entering the label in the text box at the top right of the box.

The UI makes it quite easy to label your images similarly to the functionality in the Azure ML Studio — we just drag a box around the object we wish to detect and then add a label for that bounding box… simple right?

Once we’ve labelled our images, we can click the Train button to train the first iteration of our model. Here you will get the option to select quick or advanced training — without going into too much detail, advanced training unfreezes the last couple of layers in the under-lying neural network whereas quick keeps them freezed.

Once the mode has been trained you will be presented with the metrics — Precision, Recall & mAP (Mean Average Precision). From here you can hit the Publish button at the top left of the UI to publish the model so it’s accessible via the prediction API if you are happy with the results.

For this example, though, I will be just using the Quick Test functionality just to check the model’s output on a test image. At this stage you can test the model using a local or remote image and change the threshold to see the model’s predictions.

If there’s enough interest, I can do a deeper dive into the Custom Vision API & SDK — I didn’t want this blog to go on for too long by including it!

AutoML For Imagery (Public Preview)

AutoML is an Azure Machine Learning feature that empowers both professional and citizen data scientists to build machine learning models rapidly.

Recently, the added support for Vision tasks was announced in Public Preview. We are now able to easily generate models trained on image data for scenarios like Image Classification (multi-class, multi-label), Object Detection and Instance Segmentation.

Training and deploying a model using AutoML for Imagery and Azure Kubernetes Service

We’ll be developing and training our model programmatically in a notebook using Azure ML Studio so intuitively will kick-off this section by creating an Azure Machine Learning resource — I’ll be doing this through the portal.

I won’t be spending time going through the Studio itself, but if you wanted to learn more then I’ll leave some links to resources in the references section.

To begin with we’ll head over to the compute section to create a lightweight compute to run our notebook cells. At this point, I’ll add in the warning DON’T FORGET TO TURN YOUR COMPUTE INSTANCE ETC. OFF — you have been warned!

For our data, we’ll be using the data labelling functionality from within the Azure ML Studio to label and export our data into a registered dataset — for guidance on preparing data see the documentation page. I’ll be using the same images from the Custom Vision example for ease.

To label your data, head over to the Data Labelling section of the studio and click New Project. From here, you’ll have all the options to define the labelling project i.e., Media type & labelling type. For my project I will again be choosing object detection. Follow the instructions onscreen and upload your images until you see a UI like so:

This dashboard gives a nice overview of the labelling project and will update as well go. To begin labelling, head over to the Label data tab and then you’ll be greeted by a UI similar to that of Custom Vision.

Once the dataset is labelled within the Azure ML Studio, we need to export it and register it as an Azure ML dataset. To do this head back into your project and head over to the Export tab and select Azure ML Dataset. If you now head over to the Datasets tab along the left-hand side, you should be able to see the exported dataset!

Out of fear of this becoming my dissertation, I will direct you to this public repository for some fully-fledged notebooks — I’ll just be going over the steps at a higher level. Similar to the Custom Vision SDK, if there is enough demand for a separate blog for AutoML for Imagery, I will happily do another post specifically for it.

To begin with, we’ll want to define our workspace and create our experiment — this experiment acts as what I like to refer to as the ‘housing’ for all of the different runs. It allows us to easily compare the metrics of different models and hyper-parameters, aiding us in choosing the model we want to deploy.

Next up, we’ll need to provision a GPU to train our model on, this can be done in the UI or programmatically using the SDK available. Once provisioned, we’ll want to read in our dataset and set-up the run — which looks something like this:

image_config_yolov5 = AutoMLImageConfig(
task=ImageTask.IMAGE_OBJECT_DETECTION,
compute_target=compute_target,
training_data=training_dataset,                        hyperparameter_sampling=GridParameterSampling({'model_name': choice('yolov5')}),
iterations=1)automl_image_run = experiment.submit(image_config_yolov5)automl_image_run.wait_for_completion(wait_post_processing=True)

From the code snippet, we can see that I’m using the YoloV5 model to train atop of. For some clarity, compute_target is my GPU instance and training_dataset is my labelled dataset.

Once the run has completed, we can navigate over to the Experiments section of the studio to look at the metrics of each of the child runs.

It’s here that we hit a bit of a crossroads in terms of what we can do next — we can either submit another run with some hyper-parameter tuning or look at deploying our model to a Kubernetes cluster depending on if we’re happy with the model’s performance.

For the previous, we set-up and submit the run in almost an identical way to last time, with the code now looking like:

Once we are happy with our models’ metrics, we can look at deploying to AKS. To begin this process, we have to start with provision an GPU inference cluster — which again can be done either using the UI or programmatically using the SDK. See here for further details on deploying, but here’s my code from my notebook to deploy my model:

After the model has been deployed, we can query its endpoint to test its output using some test data using the requests library — the model’s endpoint is just a REST endpoint, so can be used as such.

If you are looking at downloading the .ONNX file for the model and deploying that on an edge device, you can find out more details here.

Conclusion

So, there we have it! We have looked at all the computer vision services on Azure from Computer Vision and its API to AutoML for Imagery.

As always, take care

~ Ethan