Your analytics platform gone rogue, Part 2: Machine Learning

Finding the right Azure technology for your ML use case

Marjam Bahari
Data Science at Microsoft
16 min readJul 6, 2021

--

Photo by Markus Winkler on Unsplash.

Introduction: Decision-making for ML capabilities

Machine Learning has become a common part of today’s data analytics environment. Many companies are embracing ML capabilities to get new insights from data and advance their analytics practices. Developments in ML and the ways they are used in real-life cases are a source of inspiration to many in the field of data management.

If you are examining ML capabilities for your platform, or have examined them before, you might have noticed that it can be difficult to understand where ML capabilities fit within your analytics platform and which one to use. There are many perspectives to consider, such as your end goal, the effort needed to reach it, the people and products you need, and whether the investment will be worth it. And even if you’ve already started using ML as part of your analytics platform, continuous evaluation of whether you’re reaching your end goal — and in the most effective way — is still necessary.

I’ve written this article with the following goals in mind:

  • To help increase your understanding of the complex environment of ML solutions and, at a high level, when to use which product.
  • To provide insight into what is possible with ML by walking through some different types of ML use cases and problems.
  • To present my own overview of Microsoft Azure’s AI & ML solutions for you to consider.

I start with a deep dive into the multitude of available ML solutions, especially those in Microsoft’s Azure portfolio, as I’m most familiar with them. I discuss the differences among products, what they are typically used or designed for, and the resources required to create related ML capabilities along the lines of time, people, and operation. I also switch perspectives and discuss common types of ML problems and the solutions that can be used to solve them. By taking different perspectives, I hope to provide not only an understanding of possibilities in ML, but also help you think about which ML solution would fit your use case. Please keep in mind this article reflects my own opinion based on my work in this area and does not necessarily reflect the opinion of my employer, Microsoft.

A bird’s eye view of ML technologies in Azure

Let’s start with a definition. Machine Learning (ML) refers to the process of using mathematical models of data to help a computer learn without direct instruction. ML uses algorithms to identify patterns within data and uses these patterns to create a data model that can make predictions. With increased experience from being fed new data continuously, the results of ML can heighten accuracy — much like how humans improve with more practice.

And who doesn’t want their IT, predictions, and solutions to self-improve? To address the growing need for ML solutions, many products are created to enable you to include ML in your data analytics platform. While making a choice among solutions depends strongly on the business problem, other factors also come into play, like who will build and maintain the ML solution, the skills of your ML engineers and their previous experience, whether you want to make use of cloud or on-premise technologies, and the kind of support you need while building the solution. In this section, I provide a bird’s eye view of ML technologies in three categories — pre-built ML capabilities, in-database ML solutions, and overall ML platforms — and discuss when to use them. Building on the architecture presented in the first article of this series, the figure below shows this architecture with ML capabilities added. The ML technologies will be discussed in the next paragraphs.

Architecture of a common modern data platform with ML capabilities.

Pre-built ML capabilities

Pre-built ML technologies are a great option for infusing ML into your analytics platform but without having to build and maintain your own models. With this approach, you typically don’t need ML engineers or data science expertise, as the technology works through APIs and SDKs. Adding a few lines of code to your application gives it access to pre-trained models that analyze its data. It’s important to note, however, that these capabilities are designed mainly for very specific problems. But if your use case fits within the scope of a pre-built solution, it can be efficient and effective to use them.

Pre-built ML technologies generally excel at processing images and videos, dealing with speech and language, automating conversations, analyzing text or documents, helping with decision-making (such as by creating recommendations), and optimizing search. Because these use cases are universal, company-specific data is not necessary to build the model. In fact, creating your own model that rivals the accuracy and performance of pre-built solutions can be quite challenging. Before duplicating what exists elsewhere, consider trying pre-built ML technologies to see whether the results are good enough for you.

Some solutions also offer semi–pre-built technology, allowing you to improve pre-trained models by using your own data. The increased flexibility broadens the use cases for this technology. You can read more about Microsoft’s pre-built ML capabilities by looking into Azure Cognitive Services. Microsoft has also created Azure Applied AI Services, which are services built on top of the Cognitive Services APIs that fit specific scenarios. I delve into each of these services later in the use cases portion of this article.

In-database ML

Technologies that can be accessed in a non-ML product are another category. They are useful because sometimes the answer to your ML needs can be met by tapping into the existing ML capabilities of the product you are already using. Many databases, data warehouses, and data analysis services have ML functionalities or extensions that can be added to solve ML problems. While their applicability depends on your use case, it’s easy to underestimate their capabilities.

For example, Azure has a feature called SQL Machine Learning that you can use in both on-premises and cloud databases to add ML capabilities, such as predictive analytics, by writing and executing Python or R scripts in the database. Another example is the Machine Learning Extension for Azure Data Studio. This new feature — still in preview — can be used to import ML models, make predictions, and create notebooks for analyzing data on SQL databases. In Azure Synapse, Microsoft’s data warehouse for big data workloads, there are two options for using ML: You can use either Spark pools or SQL pools to create and deploy models. In the Spark pool you can use libraries such as Spark MLlib to create models, or you can tap into AutoML (more on this in the next paragraph).

You may have noticed that there is a skillset required to use ML in these solutions, and it’s true that creating code and knowing how to create an ML model is necessary. These solutions, however, are helpful in incorporating ML capabilities into your analytics platform without needing to use many new products or create a full ML environment.

ML platforms

Do you want to build a complete and independent ML environment? Does your team have deep ML knowledge and skills to create something special? Do you have a problem that’s not easy to solve? In these cases, looking into ML platforms is likely the way to go.

ML platforms typically offer tools to build and deploy models, organize your ML activities, manage model lifecycles, and connect them to other products. They often include notebooks for your code, clusters for processing, libraries with algorithms, interoperations with popular ML open-source tools, SDKs for your preferred environment, and an interface to organize your activities.

Microsoft’s most comprehensive ML platform is Azure Machine Learning (AML). This is a cloud-based environment for training, deploying, automating, managing, and tracking ML models. AML enables a wide range of ML technologies (e.g., classical ML, deep learning, and supervised and unsupervised learning). You can also choose your preferred creation environment, including writing in Python or R, using the SDK, or using Azure Machine Learning Studio, which provides low-code and no-code options for building models by using a visual design interface.

Besides enabling you to build the models yourself, AML also offers the option of AutoML, which aims to accelerate the time-consuming process of developing ML models by automating several parts of the process. AutoML tries out different algorithms and parameters during model training. It iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score indicating how well the model fits the data. AutoML can also be leveraged in Synapse by connecting it with the Azure Machine Learning (AML) workspace.

Databricks is another solution in this category, an Apache Spark–based analytics platform that is designed for massive scale data engineering and collaborative data science. Microsoft offers this solution in a version that it is optimized for Azure: Azure Databricks. In addition to many features for data engineering, it includes an integrated end-to-end ML environment with services for experiment tracking, model training, feature development and management, and feature and model serving.

Machine Learning use cases

It’s useful to classify ML use cases (e.g., ML problems) into several broad categories. This provides a general overview of what’s possible with ML and allows a discussion about how ML technologies enable these use cases. The categorization is based on both the underlying ML technology and the problem to be solved.

There are many ways to solve an ML problem. As I’ve discussed, depending on the type and complexity of the data, you can use built-in ML solutions that extrapolate the data into predictive values, use automated ML solutions to easily build models, or create your own ML environment. While choosing an ML technology depends on the problem you’re trying to solve, other factors also play a role, such as the amount of your data, your plans for the model, and how much time and resources you have to build the model. In categorizing the options, I describe example use cases and (mostly Azure) technologies applied to them, to provide an idea of directions to consider.

Prediction and classification

In prediction use cases, the aim is typically to predict variables based on data obtained in the past. In ML terms, prediction refers to the output of the algorithm trained on a historical dataset and used on new data to forecast the likelihood of a specific outcome.

Use cases include forecasting demand for suppliers or stores, predicting customer likelihood of disengaging from a service (i.e., churn), visualizing a company’s future sales performance, or examining the potential popularity of games or movies. Prediction problems usually involve finding the shape of a model or line that’s as close to the data as possible.

An important subcategory is classification, a different form of prediction in which the desired output is categorical. With classification, the aim is to predict the group to which an observation belongs. For example, can recent user behavior on a shopping website be classified as churn or not, or can a specific received email be classified as spam?

Many databases or technologies have built-in prediction capabilities that don’t require you to build a model, such as Power BI and Microsoft Excel. These solutions, however, aren’t a great fit for analyzing huge amounts of data and creating a model that’s invoked by applications. In that case, AutoML and AML are a better choice. These no-code tutorials show how to apply prediction and classification: “Creating a predictive model with AML designer to predict car prices” and “Create a classification model to predict adult income level with AML Studio.”

Anomaly detection

Identifying specific data points within a dataset is a common ML use case that can be implemented in a variety of ways. Anomaly detection focuses on finding observations or events that differ significantly from most of the data. Such observations are important to find because these data points indicate uncommon activity that can bring issues to light, or they might indicate a malfunctioning data collection method. Example use cases include fraud detection at financial or governmental institutions, IT environment intrusion detection, or healthcare patient monitoring.

An easy way to enable anomaly detection in time-series data (i.e., data that is collected at different points in time) is through Microsoft’s Anomaly Detector API. This API ingests time-series data and finds the best-fitting anomaly detection algorithm based on accuracy. You don’t need to know how ML works as the API automatically identifies and applies the best-fitting models to the data. It detects anomalies such as spikes, dips, and deviations from periodic patterns.

Some technologies also have built-in anomaly detection features. For instance, Azure Stream Analytics, which enables analysis and processing of fast-moving streams of data and delivering real-time insights for mission-critical scenarios, has built-in ML-based anomaly detection capabilities for monitoring the most commonly occurring anomalies.

For non–time-series data, AML is the main playground for creating models for detecting anomalies. This example illustrates an experiment for creating a model with AML Studio to predict credit risk anomalies, which gives insight into building these type of models yourself.

Search and ranking

Where anomaly detection aims to find outliers, search use cases involve finding specific data quickly and efficiently. We are all familiar with search engines on the internet, but sometimes we need similar functionality in our internal IT environment. The technology for finding data generally works by making use of two main parts: indexing and querying. With indexing, a digital library of information is created by loading content into the search service and making it searchable. Crawlers continually look for new data and collect the information needed to index a source correctly. Querying (sometimes called ranking) refers to creating a request to retrieve the information you need and the process that follows for finding the right information. Algorithms are used to retrieve the most applicable results.

Many technologies exist for solving search problems. Microsoft offers multiple options to enable search capabilities. Azure Cognitive Search has the widest range of features for building a search experience. It’s an Azure Applied AI service that offers APIs and tools for building search capabilities over private, web, mobile, and enterprise applications. Common use cases include catalog or document search, e-commerce site search, or knowledge mining for data science. The Cognitive Search service offers optional AI enrichment that creates text-searchable content out of images and raw unstructured text.

If you need a solution tailored toward the internet, the Bing Web Search API may be a straightforward solution. This API enables building web-connected apps and services that find web pages, images, news, locations, and more. Several in-database solutions might also fit your use case. For instance, SQL Server has a full text search feature that lets users and applications run full-text queries against data in SQL server tables, and Cosmos DB makes use of indexes that can be queried. The Azure Cognitive Search indexer is the way to go for the richest search experience. The indexer automates data ingestion and retrieval from Azure data sources and prepares the usage of Cognitive Search features on that data. And if your content is in SharePoint, Microsoft Search is created for Microsoft 365–authenticated users who need to query content in SharePoint.

Recommendation

We encounter recommendation engines often, knowingly or not. Shopping websites, search engines, content streaming solutions, and social media use them to show content tailored to individuals’ online experiences. Suggesting relevant content or products has become a crucial part of obtaining and retaining customers in today’s business world. As a result, there are many solutions for obtaining such capabilities. I highlight a couple solutions in this category related to Microsoft Azure.

Personalizer is an API for decision-making from the Cognitive Services series. It allows applications to choose the best content to show users based on their online experiences, such as by making product suggestions for shoppers or determining the optimal position for an online advertisement. The solution evaluates the effectiveness of the content by monitoring user reaction and relaying a reward score to the Personalizer service. Based on this mechanism, the ML model making the recommendations is continuously improved.

If Personalizer doesn’t fit your use case (read more at “When to use Personalizer?”) or if you want to build the model yourself, you can use Azure Machine Learning services to build your own recommendation engine. This GitHub repository includes best-practice examples for how to train, test, optimize, and deploy recommender models on Azure with AML. Several notebooks show how to run the recommender algorithms in the repository on Azure. When creating your own model, you can also make use of Databricks (shown in this example of building a movie recommendation system), where model training and evaluation can be performed on the managed Spark cluster. This is effective if you need to autoscale up and down to help reduce resources and costs associated with scaling a cluster manually.

If you don’t plan to use Spark or you have a smaller workload where you don’t need distributed training, consider using Data Science Virtual Machine (DSVM) instead of Azure Databricks. This is an Azure virtual machine with deep learning frameworks and tools for ML. See this example of building a movie recommendation system with AML and DSVM to get an idea of the process and architecture.

Image and video

While improved processing power has accelerated innovation in ML technology in general, developments in image processing are truly astonishing. Making sense of how we perceive the world with ML is a universal use case, which makes it possible to create a variety of solutions. From identifying cats in pictures to blurring faces in videos, ML makes many things possible. Because the performance of visual ML solutions depends mainly on the size of the data used to train the model, it’s difficult to create a model that works better than those used in solutions created by companies with huge image databases that use costly processing power to quickly analyze images. Therefore, I don’t recommend building your own solution for visual analysis with, for example, AML. Only if your use case is very specific and the existing solutions don’t work well enough should it be considered.

Azure Cognitive Services offers several APIs in the vision category: Computer vision API, Customer Vision Service API, and Face API. Computer vision API provides access to advanced algorithms for processing images and returning information based on the visual features you want to analyze. It distinguishes optical character recognition (OCR) that extracts text from images, image analysis that can extract a wide variety of visual features from the images, and spatial analysis that analyzes the presence and movement of people on a video feed to produce events. These solutions are relatively easy to implement by writing some lines of code and using a REST API or client library. See this explanation for using OCR to read text from images. No model building is required.

If the models used in the computer vision API do not perform sufficiently, consider using Azure Custom Vision to build, deploy, and improve your own image identifiers (i.e., labels representing classes or objects based on the visual characteristics). The service uses a pre-built ML algorithm to analyze images, but also allows you to specify the labels and train custom models to detect them. See this tutorial for building an object detector with Custom Vision to get an idea of how it works.

Several Azure Applied AI services, which are built on top of the Cognitive Services APIs, should also be mentioned in this category. Azure Video Analyzer (now in preview) enables you to use video intelligence without building ML models. It can generate real-time insights from video streams, process data near the source, and can be used to enhance IoT solutions with video analytics. Azure Form Recognizer makes it easier to deal with images of text such as invoices, IDs, and receipts by enabling identification and extraction of text in such images.

If you want to play around with a free ML application that can be installed on your device, consider trying out Lobe, which Microsoft acquired in 2018 and whose goal is to make AI more accessible. Lobe is currently in beta and free to download. It enables people with no data science experience to import and label images. Lobe automatically trains the model with this data. Users can directly evaluate the model’s strengths and weaknesses with real-time visual results and give feedback to improve the model’s performance.

Language and conversation

In the last couple years, a range of possibilities has unfolded in in the fields of language, speech, and conversation. Chatbots have become common solutions for customer care, mobile phones react when they hear a certain name, and laptops have taken over the job of notetaker with the technology to write down speech. The application of ML for communication-related purposes increasingly has an impact on our daily life.

This is also where Artificial Intelligence (AI) and ML can be distinguished. While ML is used to create Artificial Intelligence, AI is a specific field of technology, focused on making computers behave in ways previously requiring human intelligence. For example, perhaps a person you thought you were speaking with on the phone to make a hotel reservation was actually a bot. Many of the capabilities of this technology are related to visual perception, speech recognition, decision-making, and translating among languages. Creatively using these types of capabilities can help improve the customer experience, automate certain tasks, and make life easier in general by improving communication.

Microsoft offers various solutions for communication-related use cases. The Cognitive Services series includes a Speech API, which is able to convert speech to text, text to speech, and perform speech translation. It also has several language APIs to enable you to do more with text. Language Understanding (LUIS) applies ML to a user’s conversational, natural language text to predict overall meaning and pull out relevant information. It can, for example, be used to create a Commerce chatbot that offers a conversational interface for banking, travel, and entertainment scenarios. QnA Maker allows developers to build a question-and-answer service from semi-structured content. Text Analytics provides natural language processing over raw text for sentiment analysis, key phase extraction, and language detection. Also, Translator provides machine-based text translation in near real time.

Several Azure Applied AI services should also be mentioned in this category, such as Immersive Reader and Bot Service. Immersive Reader is built to implement proven techniques to improve reading comprehension for language learners and people with learning differences such as dyslexia. Its functionalities include displaying pictures for common words in a text, reading content aloud, and translating content in real time. Bot Framework Composer (i.e., Bot Service) provides a visual canvas to enable developers to create, test, and manage conversational experiences. The composer environment enables easier modeling of conversational experiences compared to working only with code. It also includes the language-understanding models and QnA capabilities of the APIs described above.

Conclusion

With ML offering so many opportunities in areas such as boosting analytics capabilities, improving customer satisfaction, creating the most appealing products, and so much more, it is important to understand the type of use cases that can be realized. When you decide on a use case or problem to solve with ML, selecting the right technology to enable it is the next step. All the available options can make selecting the most effective technology more difficult. With this article I hope I have increased your understanding of ML opportunities for different use cases and provided some guidance for technology to consider.

Marjam Bahari is on LinkedIn.

To read the first article in this series, check out the following:

--

--

Marjam Bahari
Data Science at Microsoft

AI customer engineer @ Google | Ex-Microsoft | Opinions are my own