Modern AI Stack & AI as a Service Consumption Models

Sriram Subramanian
Mar 2, 2018 · 7 min read

If you are a developer developing new AI capabilities or LOB applications/ services leverage AI capabilities, you have various tools at your disposal. A necessary and sufficient collection of such tools can be visualized as the Modern AI Stack.

Artificial Intelligence (AI) refers to the tools and techniques that enable machines/ software to exhibit intelligence. Machine Learning (ML) refers to a subset of AI that enables a computing system to learn with data, without the need to be explicitly programmed. Artificial Neural Networks (ANN) are computing units inspired by neural networks of animal brains, that form the core of machine learning systems.

It is interesting to note that techniques/ algorithms used for AI haven’t changed in the past 20+ years. However, the tools have. If you were an AI researcher just ten years ago, you had to write code to build and train your neural networks. Not anymore. Gone are the days when you have to build such stack from scratch — now you have ML platforms, libraries, computing, and data platforms readily available as software platforms. Some of these capabilities are also available as a service that you can consume directly. This post provides an overview of such a stack and different consumption models available to consume AI capabilities as a Service.

Modern AI Stack

Modern AI Stack

Modern AI Stack consists of two components — infrastructure and developer environment.

Infrastructure refers to the tools, platforms, and techniques used to run store data, build and train AI/ ML algorithms, and the algorithms themselves.

Developer Environment refers to the tools that assist in developing code to bring out AI capabilities.

LOB applications and services are technically not part of the AI Stack. They derive value from the AI Stack.



Compute refers to the raw computational power required to run AI/ ML algorithms. One has a wide choice of physical servers, virtual machines, containers, specialized hardware such as GPUs, cloud-based computational resources including VMs, containers, and Serverless computing.


Data makes an important component of machine learning system. Just like how one is made of what one eats, machine learning is only as good as the data it is trained with. One has a wide choice of data platforms — structured and non-structured databases, big data platforms, managed databases, and cloud-based databases.

Machine Learning Algorithms

Machine learning algorithms are of three categories — supervised, unsupervised and reinforced, with more choices of algorithms under each category.

Supervised Learning refers to learning to find the best fit function that maps input data to output data, based on training data of input-output pairs. Learning continues until a desirable accuracy on the training model is obtained.

Unsupervised Learning refers to learning to find the best match from an unknown category of data that was not encountered before (‘unlabeled’ data).

Reinforced Learning refers to learning based on trial-and-error.

Machine Learning Platform

Machine Learning Platforms/ frameworks provide necessary capabilities to enable one to develop ML capabilities. Such platforms usually accept different types of sources of training data, provides a choice of training algorithms, and support multiple programming languages. Commonly used ML platforms include Apache MXNet, TensorFlow, Caffe2, CNTK, SciKit-Learn¹, and Keras².

Developer Environment


You have a variety of libraries at your disposal — whether to leverage advanced mathematical operations (NumPy), or to add specific cognitive capability, such as computer vision (OpenCV), language translation (OpenNMT), etc. Particularly, if you are building cognitive services, say smart video surveillance services, you can use these libraries along with ML platforms.


Whether you are developing ML models or applications/ cognitive services that leverage underlying ML platform’s capabilities through APIs, you will be developing a good amount of code. An IDE would make your job easier.

There is a variety of integrated development environments (IDE) available for you, such as PyCharm, Microsoft VS Code, Jupyter, MATLAB etc. It is to be noted that IDEs for AI/ ML may not have the advanced debugging capabilities one is used with procedural or object-oriented programming languages.


As we noted above, data makes an important component of machine learning. Naturally, data visualization plays an important role. One could argue that it is not an essential component of AI Stack, but given the importance of the datasets, we consider it to be an important part of AI Stack. Visualization choices include MATLAB, Seaborn, Facets, or data analytics platforms such as Tableau.


We include workflow tools in the AI Stack as they make sharing, collaboration, and automation much easier. As more developers start leveraging AI/ ML capabilities, developer collaboration becomes more important. A variety of workflow automation tools are available, such as Jupyter, Anaconda, GitHub, VSTS etc.

Consumption Models

Modern AI Stack — Comparison with Cloud Services Consumption models

Public cloud service providers are making more AI/ ML capabilities available as a service. This removes the need for having the entire stack deployed/ implemented from scratch. Such capabilities are also available at different levels of abstractions, enabling one to consume at the level that one prefers. As of now, AI/ ML services can be consumed through following ways.

AI Stack

This is the reference consumption model where every infrastructure component (ML platform, algorithms, compute, and data) is deployed and managed by the user. The user builds, trains, and deploys ML models. The user is also responsible for installing and managing all components of the developer environment.

This model is analogous to the consumption of on-prem/ private cloud services.


AI-aaS refers to AI infrastructure services being offered by the services providers that one can consume directly. In this model, one continues to use their models, algorithms, types of data stores, compute resources as they would do with the AI Stack model. But they don’t install or manage the infrastructure components. They can leverage ML capabilities that are available as a service (Google Cloud ML Engine, Amazon ML), along with IaaS for compute and data requirements.

This model is analogous to the consumption of IaaS capabilities. Naturally, you will see a lot of Lift & Shift :).

Managed AI-aaS

It turns out that it is not trivial to build and train ML models. When ML models get complex, managing the supporting compute/ datastore is also not trivial.

Wouldn’t it be better if there is an easy way to train ML models? Wouldn’t it be better if compute resources are automatically allocated/ managed as the model requires? In short, wouldn’t it be more efficient if the developer can just focus on getting value out of ML without having to worry about the underlying infrastructure?

Managed AI-aaS services such as Google Cloud AutoML, Amazon SageMaker, Azure ML Studio belong to this category. They make it easier to consume ML by removing these pain points.

This mode is analogous to consuming Managed IaaS capabilities.


Cognition-aaS refers to the consumption model where advanced cognitive capabilities themselves are available as a service. For example, if one has to build a video surveillance application, one can consume video recognition capabilities that are offered as service (Amazon Rekognition Video, Google Vision, Azure Compute Vision, etc). There is no need to build these capabilities using computer vision libraries and ML.

With such cognitive capabilities being readily available, an application developer can focus on business logic without having to worry about the underlying AI infrastructure components at all.

Last year, I had postulated that

Cognitive computing capabilities available as a service will double approximately every year

Cognition as a Service — A Comparison (Dec 2017)

With more Cognition-aaS capabilities getting enabled by service providers and niche players adding more cognitive capabilities to the mix (, Grammarly), this trend will continue.

Cognition-aaS is analogous to consuming PaaS capabilities. I am not seeing this as SaaS category as some might do, because, these capabilities are not complete solutions like a SaaS offering would be³.


Choose the right consumption model based on your application needs and in-house ML expertise.

If you want total control and everything in-house, choose AI Stack/ On-Prem.

If you want to build/ train ML models, but don’t want the overhead of managing ML platform/ underlying infrastructure components, choose AI-aaS.

If you want to leverage ML capabilities, but don’t want to manage infrastructure components, choose Managed AI-aaS.

If you would like to just focus on the business value, choose Cognition-aaS.

  1. Please note that some might consider SciKit-Learn to be a library, not as an ML framework itself
  2. Please note that some of these could be used with other (for example, Keras with TF)
  3. Solutions like Amazon DeepLens don’t fit into these models — they are more like appliances/ HCI. When more such offerings are available, the AI Services Consumption model would be expanded to include that category.


CloudDon - catalyzing modern enterprise IT transformations

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store