Develop Machine Learning Project with MLaaS

Christian Caruso
Eni digiTALKS
Published in
9 min readFeb 23, 2023

Several typical approaches for ML projects, leveraging technological answer of Machine Learning as a Service (MLaaS) like Azure ML Services

Photo by Andrej Lišakov on Unsplash

Local-to-cloud development

Developing a Machine Learning project could be complex and even in enterprise organizations it doesn’t follow strictly a standard approach like other software development projects or initiatives.

Some features in a Machine Learning project may represent bottlenecks or challenges: data volume, power computation, tracking and sharing performance assessment model.

In this paragraph three different approaches are presented in order to give an overview of the problems and the related solutions faced in ML use cases: user workstation, on premise servers and cloud computing.

Different approaches to develop ML Project

User workstation

To develop on a user workstation adequate hardware should be provided in order to guarantee proper performance for developers in every project phase so that a valuable artifact (in this case the ML model) is produced.

These workstations also need to be kept updated and could be used in a limited extent in some project phases. In addition, based on the user workstations number, the purchase may represent a cost to be made and sustained.

On premise servers

A first step to avoid local development could be having an infrastructure on-premises servers that can be used to execute massive jobs. This is the first step to solve the problems exposed before, for example the data volume and power computation. However, with this solution remains a physical infrastructure to be managed and maintained, for example with continuous updates on development libraries.

Cloud computing

The third and last approach is a cloud solution, using its fundamental principles, provides resources on demand simplifying and removing the descripted constraints by guaranteeing scalability, ideally infinite, to be used only when necessary.

Hybrid approach

The three approaches just exposed before can be mixed in a hybrid approach where coding and development are demanded to a user workstation and massive jobs are demanded to a cloud computation.

For example, in a real ML use case the most onerous phase is the training job, since that job is usually massive and long-lasting. Instead, using a hybrid approach, it is possible to work on personal workstations to develop and prepare the training job, while this last, could be performed on a temporary cloud instance, which exploits correctly sized hardware and a defined software (environment).

However, even if the resource obstacle is overcome, two main issues remain uncovered: it’s necessary to internally manage the distribution of specific software for developers and sharing projects details (es. results, metrics) should be defined to manage the project life cycle.

Development with MLaaS

To overcome aspects and issues described in the previous paragraph, thanks to an ever-growing interest in Machine Learning over the years, several products designed specifically for these initiatives have been developed and made available on the market.

These are solutions based on a set of integrated cloud services that solve not only the infrastructural issues but also the matter related to development and management of a ML project.

These products type take the name of “Machine Leaning as A Service” and under one hat can respond in an end-to-end way to the needs of a machine learning project: data processing, training, evaluation, deployment and inference are managed in a collaborative way and shared with a project team.

Limited to a few, but major, cloud providers the MLaaS products offerings are:

  • AWS: Sagemaker
  • Microsoft: Azure Machine Learning Services
  • GCP: Vertex AI
  • IBM: Watson
Examples of MLaaS products offered by the major cloud providers

There are also multicloud products, therefore usable on different cloud providers. One of the main and very popular one is Databricks, which offers a complete platform for data analysis and model management tools with MLFlow.

Examples of multicloud MLaaS products

In addition to Databricks, other examples of multicloud products to develop Machine Learning are C3.ai or Dataiku.

Azure Machine Learning Services Overview

The cloud service dedicated to Machine Learning proposed by Microsoft, available through its cloud Azure, is Azure Machine Learning Services (aka AMLS) that is added in the offering of other Artificial Intelligence services.

The first version of AMLS has been released in 2014 and, in these last years, thanks to a continuous evolution and maturity, has become a product that integrates services and features for an end-to-end support for the users in all phases of development of a machine learning project: data analysis, development, modeling and deployment.

With its monitoring and tracking system of daily tasks, this product provides a fundamental tool for sharing advancements and results with the project team.

Main workflow phases in a ML project

These aspects are enablers to the MLOps methodology: which is an extension of DevOps specific for Machine Learning that automates some operations (like retraining jobs). This methodology helps and simplifies the model lifecycle, keeping it always up-to-date and performing. In the next paragraphs more details related to the AMLS are exposed.

Target Users & Skills required:

Thanks to a wide variety of features offered with different levels of abstraction, the AMLS can be used by the different people involved in the project even if they have different roles and technical skills in Data Science.

AMLS allows its users to develop a solution in different ways:

  • By writing code (e.g. Notebook, SDK), obtaining a full custom solution;
  • Using UI tools, by designing workflows with pre-built components (e.g., Designer);
  • High-level tools that can auto-generate models (e.g., Automated ML).

Otherwise for a set of common Data Science use cases such as image recognition, natural language processing etc. it is possible to use off-the-shelf services included in Azure Cognitive Services, not addressed in this article.

Azure Machine Learning offered tools with target users

Infrastructure

Usually, AMLS is referred to as a single service but its infrastructure is composed of four additional cloud services:

  • Storage Account: storage service for both the data produced within the AML service (logs, artifacts, internal) and user datasets;
  • Key Vault: security service for managing credentials, such as those of Datastores;
  • Container Registry: repository that contains docker images used for jobs on Compute Instances/Clusters (training) and for models deployed as Rest API;
  • Application Insight: metrics service for monitoring logs and performance of models developed and deployed via Rest API.
Azure Machine Learning Services Infrastructure

Once the AMLS service infrastructure is created, its beating heart becomes the Workspace that ties together the four services just described.

The Workspace manages the objects created within it such as Datastores (to give access to data) and Compute Instances/Clusters (VMs configured ad hoc for Data Science and intended for both development and job execution). In addition, the Workspace manages a number of other Assets that are created both in the modeling phase and in day-to-day tasks (e.g., experiment tracking, model versioning, job execution pipeline).

Working with AMLS: Studio (the UI way)

The Azure Machine Learning Services Studio is the web GUI for interacting with the Workspace.

Within the Studio is it possible:

  • Collect run metrics, logs, results, and reports;
  • Develop algorithms with Notebook or scripts;
  • Manage all workspace assets (described later);
  • Define AutoML jobs;

A complete overview of the AMLS components is shown in the image below. These components are divided into three categories: Linked Services, Managed Resourced, and Assets.

AMLS Components

Linked Services:

  • Datastore: store the connection information for a data service, like Data Lake Storage or SQL Database, and guarantee users to access data underlying without connection string, secrets or authentication.
  • Compute Targets: external resources linked inside AMLS but not directly managed by service. For example: Databricks or Azure Synapse

Managed Resources:

  • Compute Instances: like a virtual machine on the cloud preconfigured with an environment for Data Science activities. You can use the instance as a training or inference resource for development and testing. Usually it is assigned to a single user.
  • Compute Cluster: like compute instances is a managed-compute infrastructure that allows users to create a cluster of CPU and/or GPU compute nodes in the cloud. Usually shared among the project team.

Assets:

  • Datasets: registers and makes versioning of specific data sets mapped to a Datastore.
    It abstracts access to the data contained in a Datastore. There are two categories: tabular datasets (structured data) or file datasets (unstructured data such as images).
    With Datasets, you can easily access data by name without worrying about connection strings or data paths.
  • Jobs: It allows users to monitor all jobs (experiments) performed in the workspace by displaying information about runs and metrics of the results obtained.
  • Components: Reusable and shareable code. It is treated as a building block analogous to a function that executes a step in a machine learning pipeline.
  • Pipelines: Workflow composed of various processing steps. The various processing steps can be built in components or custom components and help formalize a transformation process making it reproducible and shareable.
    It can run as Pipeline Jobs for single executions and Pipeline Endpoints to be retrieved from external systems via Rest API.
  • Environments: Registers and makes versioning of (Python) environments used in different jobs.
    They are of two types: Curated Environments managed by cloud provider and Custom Environments defined by a Docker image with all code dependencies (Python packages or specific software).
  • Models: Registers and makes versioning of trained models in workspace. It’s possible register model from experiment or from external (local upload).
    Every model is enriched with metadata: experiment that has created the artifact, endpoint that use the model for inference, all in easy way to track a lineage.
  • Endpoints: It manages endpoint addresses for model inference via Rest API for models deployed via AMLS.
    Endpoints are of two types: Batch Endpoints (for high volumes of data contained in a storage) or Realtime Endpoints (processes data contained within the processing request).

Working with AMLS: “Developer” (the code way)

The AMLS service provides other options to interact and work with the project Workspace. Again, it offers several approaches that use different languages to perform the same operations, but in this way, it is usable by developers with different backgrounds.

In fact, the interface provides common developer languages:

  • SDK (Python)
  • Rest APIs
  • CLI
Azure Machine Learning Services development interface

Through these tools it’s possible to fully interact with the services to perform all activities in terms of creation, configuration, and asset management of the AMLS instances and in terms of development such as coding, execution and job tracking.

Another tool that is useful for the developer is an extension for the IDE VSCode that allows users to navigate the services and components within the workspace and facilitate the code production.

Conclusion

The importance placed on data is growing. This has increased the focus within companies by requiring new processes and challenges.

Thanks to AI and Machine learning, the value of data is further amplified, and Data Science is one of the most in demand disciplines.

We have seen that, in order to develop a Machine Learning solution in a structured and functional way, several factors must be considered but adopting modern, purpose-designed solutions such as MLaaS, the needs of developers and project management can be met in a timely manner.

Azure Machine Learning, one of the most popular MLaaS, is an example to get an insight on the functionality of these solutions. In future articles we may deepen on some aspects like MLOps methodologies or specific adoptions of these solutions.

— — — — — — — — — — — — — — — — — — — — — — — — — — — —

Bibliography & Reference

MLaaS:
https://www.geeksforgeeks.org/machine-learning-as-a-service-mlaas/

Data Science Solutions on Azure (Chapter 4):
https://link.springer.com/book/10.1007/978-1-4842-6405-8

AMLS Product:
https://azure.microsoft.com/en-us/products/machine-learning

AMLS Documentation:
https://learn.microsoft.com/en-us/azure/machine-learning/

AMLS Skill Levels:
https://azure.microsoft.com/en-us/blog/azure-machine-learning-ml-for-all-skill-levels/

AMLS Workspace:
https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace

AMLS Extensions for VS Code:
https://code.visualstudio.com/docs/datascience/azure-machine-learning

--

--