What is the difference between Artificial Intelligence, Machine Learning and Cognitive Technology?

Recently there has been an explosion of buzz words related to Artificial Intelligence. My goal with this article is to discuss the differences between Artificial Intelligence, Machine Learning and Cognitive Technology. I invite you to join in, it is very likely I am missing elements that you might be more familiar with.

Artificial Intelligence

Artificial Intelligence (AI) is the hype at the moment. In most cases, the hype is related to a specific subclass of AI known as narrow AI. It is called narrow due to it being applied in a reduced (narrow) environment. Think of a chess game. There are clearly defined moves, rules and win/loose situations.

A narrow AI Agent will seek to maximise the likelihood to win the game. In order to control the environment and ensure good decisions are made (those that maximise the chance of winning), the decision space is usually mapped into a representation that an algorithm can compute on. In most cases, this is a decision tree codified in a way, that an algorithm can parse and manipulate it in order to find a value maximising path to an end game node (for example a game-winning board configuration may have the value 100 and lesser preferable board state a lower value). In this example, the world of the narrow AI agent is the tree representation of the chess board. That’s it nothing more. It is like you were born to only play chess and your world is a chess board, you have to act based on pre-set rules and your only purpose for existing is to predict your next move based on calculating if it maximises a given value. That is a pretty narrow world you are operating in. However, it is complex. It is hard to image how complex.

To fully understand your world (the chess game) you would need to have a representation of every possible board state and every possible move that can be done. A complete view of all possibilities in your world. Theoretically, you could then accurately predict what outcome each of your moves has on the overall game. Even though that is mathematically possible it is not practical and with that not economic. An algorithm would take too long to parse all states and pre-compiling the whole tree would equally be very complex and expensive. To overcome this challenge narrow AI can learn to approximate outcomes. Basically, make educated guesses based on the information available at a certain point in time. This approach is known as Machine Learning. Before discussing it, we first have to mention a more science fiction related class of AI know as general AI.

That’s the whole Matrix movie thing and relates to all the talk about machines taking your job and replacing us humans some day (the word singularity is mentioned here a lot which is the moment at which a machine will be equally intelligent as a human). There is an interesting deeper Philosophical debate behind general AI. Narrow AI agents are based on know situations, known data, and have limited abilities to adapt to new situations. Comparing this with human-like abilities (which would mean general AI) effectively means that the general AI agent will need some sort of higher cognitive model that represents self-consciousness and a sense of purpose.

The narrow AI example has one single pre-designed purpose, which is to maximise a value based on mathematical predictions towards an end state. The narrow AI has no concept of it playing a game or knowing what chess is. Or even being aware of such, it’s simply a mathematical function seeking to maximise variables towards a target value. To reach general AI we would first have to establish an understanding that the world and everything in it can be modelled mathematically after which we would have to figure out how we would design an AI agent that wants to live for the purpose of existing. In a simple way of asking: How on earth can we model this world? The core problem therefore is, that we human haven’t figured any of that out ourselves yet.

Machine Learning

The subject of Machine Learning has been around for a long time. The core idea, that machines can learn via mathematical representation, can be traced back to the 17th century. Arguably, the most influential work comes from Alan Turing and his paper discussing if machines can think (http://www.loebner.net/Prizef/TuringArticle.html). In a nutshell, Machine Learning can be seen as the art of applying statistical approaches to predict the outcome of an ‘unknown’ situation. Machine learning is part of Narrow AI and is based on correlating data. You take ‘a lot’ of data that is labelled with a certain outcome. For example the weather data of the last 50 years. You label each day with the outcome, such as Sunny or Rainy. Then you choose a bunch of variables (aka features), such as the amount of rain, hours of sun, direction and velocity of wind and so forth. In this simplified example, these observations form a matrix with numeric values (data points) for each feature (e.g. 3 hours of sun, 150 mg of rain, 3 mph wind) in which each row representing one day of the last 50 years relating to an observed outcome (label) e.g. Sunny or Rainy. This data now becomes your training data — your ground truth. Now you need to choose a probabilistic model that represents your data and helps you correlate new ‘unknown’ data points with a predicted outcome.

In Machine Learning the selection of a ‘model’ is a rather big deal. It helps you overcome issues, such as overfitting, which simply means your modelling approach is too good. It may seem counter-intuitive, but a model that is too good (overfitted) can only perform well with predicting outcomes (based on new ‘unknown’ data) that are very similar to the data it was trained on. Therefore, an interesting aspect of Machine Learning is that you want to take an approach that allows a margin of error. Precisely by allowing that error the machine can learn. For example, once your model identifies new data and is not sure about it (high error rate) it can add it to the training data and try to work it into the underlying model/experience. Typical models are Decision Trees, Naive Bayesian and Deep Neural Nets.

Now you may think what’s the big deal? Why is every so excited about Machine Learning?

Two reasons: 1) Data and 2) Computer Power. To do the whole prediction thing with a reasonable accuracy you need to have enough data. Weather is an interesting one. In the example here we only used a handful of features. A weather prediction system can have millions of features that consider everything from ocean temperature to earth axis angle to gulf stream temperature. That needs a lot of computer power. Or take a self-driving car. First, you will need a lot of training data so the car learns to react to all the possible situations in a reasonable manner. Staying with that example it’s very hard to grasp the amount of variables (features) a self-driving car model needs to be trained on. There are cats and dogs crossing the road, crossroads, falling trees, different surfaces, different traffic, different other cars and so on. In practice, a self-driving car runs on thousands of differently trained Machine Learning models. One for road type, one for weather type and so on. All working together to get you home safe. Complexity on that level can only be computed if you have access to a lot of computer power. The more complex the environment in which the Machine Learning agent is engaging the more data is needed to calculate prediction models.

Cognitive Technologies

Cognitive technologies are mainly commercial products that are based on some element of Artificial Intelligence (AI). These applications are therefore using mature and commercially applicable Artifical Intelligence that have left the lab and can stand up to scalable commercial usage. To gain an overview of the application areas see this Delloit consulting chart.

An increasingly interesting and rapidly growing application space are Cognitive Technologies based on Natural Language Processing. This includes Chat Bots, Content Topic Detection, Content Marketing Applications to only name a few.

In my next post, I will focus on the different Cognitive Technologies one by one, how they work and where they can be found.