As executives, it is essential for us to clearly know what is AI, and whether our firm uses AI, and which products of our firm use AI. To do that, we need an agreeable definition of AI, which has been elusive until recently. We now have that definition, thanks to EU’s High-Level Expert Group on AI, and a perspective on AI from DARPA.
But let’s start simple.
When machines exhibit intelligent behavior, they are said to have Artificial Intelligence (AI).
The above definition of AI adapted from an Nvidia blog succinctly defines AI. ‘Machine’ here could be an autonomous car that is behaving intelligently by changing lanes to overtake another car. ‘Machine’ could be a computer program that recognizes people from a photograph. ‘Machine’ could be a robot that can recognize and pick up only red lego blocks.
But is the above definition sufficient? While the ‘machine’ part is clear, what is ‘intelligent behavior’? This is where AI definitions get murky. How do we define ‘Artificial’ Intelligence when we can’t even agree on just Intelligence.
To further aggravate the problem, when John McCarthy and team coined the term Artificial Intelligence in 1956, they itemized some aspects of the term, but did not really put forth a clean definition. So, the term AI has been vague from the start.
This post is about removing that vagueness from the definition. Let’s start with the formal definition.
Definition of AI
For a formal definition of AI, I am going to use the one recently put forth by EU, more specifically, European Commission’s High-Level Expert Group on AI. The full definition is verbose. While, I will close this post with it, let’s start with a pictorial representation included in the definition document and this abbreviated definition:
Artificial intelligence (AI) refers to systems that display intelligent behaviour by analysing their environment and taking actions — with some degree of autonomy — to achieve specific goals
See below for the pictorial representation of the full definition. Be warned however. In order to keep a consistent thread across this post, I will be using this picture over and over again.
There are two reasons for using this definition. First, the definition is clean, contemporary, and comprehensive. Second, and more importantly, it’s a definition coming from part of an entity with large regulatory and legislative authority. If we are going to find an official definition, this is the one. I will supplement this definition with a perspective of AI from DARPA, one of, if not the most forward looking agencies in the world.
Back to Intelligence
While we now have an AI definition we can use, we still need to agree on what is intelligence and what is not, so that we can look at a computer system or a machine (e.g. Robot) and tell if it’s using Artificial Intelligence or not.
Defining ‘Intelligence’ is a monumental task. One could do a PhD in it, only to be less intelligent for doing so. Crowdsourcing the definition hasn’t worked — Wikipedia has a long list of definitions for intelligence. And to further illustrate the futility of this, researchers collected over 70 good definitions of intelligence and tried to find a good composite of the 70 definitions.
And then there’s the temporal element of intelligence. 30,000 years back, a man that could light a fire would have been considered intelligent, and that’s hardly the case in 2019. Or if you do not know how to carve the kill of the day today, you will not be considered less intelligent, but you would be hard pressed to find a date 30,000 years back.
Despite these complications with defining intelligence, EU’s definition document discusses the notion of rationality. The ability to be rational is a significant component of intelligence. We could therefore use rational behavior capability as a measure to determine if a system exhibits intelligence.
Rational behavior is about (a) given a goal to achieve and information about the current environment, (b) use the knowledge we have at our disposal (c ) and have the ability to choose the best course of action to achieve that goal. Note that best course of action implicitly means that there could be multiple courses of actions, and we are using our knowledge to pick the best one.
rationality: goal-knowledge-best course of action
Let’s use this definition of rationality and hence of intelligence on some examples.
Example 1: Rat or Cat Detector
Let’s say I installed a camera system in my backyard farm and wanted the system to tell rats from cats so that I can decide if I have a rat problem. The goal of the system is to tell if it sees a rat or a cat. This is the ‘goal’ part of rationality.
Let’s assume that I have somehow encoded the knowledge on how to tell if the image is that of a rat or a cat into my camera system. This is the ‘knowledge’ part of rationality.
The system then captures an image of the animal, and converts the image data into usable form (e.g., pixel values). It then uses the encoded knowledge to make the decision as best as it can on rat vs cat. This is the ‘best course of action’ of rationality.
This system satisfies goal-knowledge-best course of action of the definition. Therefore, it’s behaving rationally, and hence being intelligent. So this system has Artificial Intelligence. Artificial because the camera system is not a human being, but a machine.
Example 2: Calculator dividing 666 by 3.14159
Again, the calculator system also lacks a knowledge base to infer or reason the output number from. It’s simply an electronic circuitry. So, it does not have AI either.
Example 3: House Pricing Model
Let’s say I found a way to encode house pricing knowledge in a formula such as price = 2000*size + 50000*school quality-1000*house_age…
I then built a system around this knowledge. Given a new house with features (size, quality of schools, how old the house is, etc.), the goal of the system is to predict the price of the house.
The system can use the encoded house-pricing knowledge, which happens to be in the form of a formula, and provide best estimate of the price. This system satisfies the conditions, and hence uses Artificial Intelligence.
So we now get a gist of how to determine what is Artificial Intelligence and what is not.
So, AI is about imparting machines with knowledge, so that given a goal, the machine can use it to make a rational decision by itself. A ‘rational decision’ is the best course of action given the current state of the surroundings or the problem.
The key here is the knowledge. How does one give a machine knowledge? There are two predominant techniques to doing that. Each technique is a box inside the bigger AI box in EU’s definition. Let’s hold off on the Robotics topic for now.
The two techniques are called Machine Learning and Symbolic AI (this is labeled ‘Reasoning’ in the image above). Machine Learning is also called Subsymbolic AI, Connectionist AI, or Statistical Learning. As you are aware, Machine Learning is exceptionally popular now, and rightfully so — It has been stunningly effective in building solutions this decade.
Let’s start with the technique that is less talked about these days, but not less important — the Symbolic approach.
As the name indicates, in the Symbolic approach, we give machines knowledge by hand crafting it in some symbolic form, like text. There are a few ways to do this, as listed in the illustrations below.
Knowledge Representation & Reasoning form of AI
In the example below, we have hand crafted some knowledge.
Now let’s say, we have some information about David as shown below
Now, given this information, our goal is to answer some questions about David.
Is David over 18 years old?
Can I serve Hamburgers to David?
In a Symbolic AI system, the machine will take the input about David, and the goal of answering the questions. Then, it will use the Symbolic handcrafted knowledge and ‘reason’. The machine can start with the input that David’s a good driver, and then go through the knowledge map. From the hand-crafted knowledge, the machine can see that for him to be a driver, he needs a license, and to have a license he needs to be at least 21 years old. So David is over 18 years old. Similarly, it can reason the answer for the other question as well that I shouldn’t serve Hamburgers to David.
Now imagine we expand the knowledge base to include all the knowledge about the world. May be the machine can participate in a quiz competition like Jeopardy and beat the all-time human champions? That’s exactly what happened.
Search and Optimization form of AI
Another way to impart knowledge to a machine is to give it information about a particular environment — what is allowed, what is not allowed, what is the best outcome towards a goal, etc., For e.g., Chess. Let’s say we give the machine information about what is an allowed move in Chess. And when comparing board layouts, which layout is beneficial, and which is less beneficial.
With that information, we can have the machine play chess. If you are familiar with Chess, given a board position, there are many options one can play.
We have given the machine knowledge on how to evaluate a given board layout. The evaluation metric could simply be assigning weights in the order of importance of the pieces and evaluating the board based on total weights.
The machine can then try each possible move. Given that move, it can then assume that the opponent will play what’s a best option for him. Given the opponent’s move, evaluate all possible moves, and so on. So the machine can take one possible move, and keep going down the tree of choices up to a certain depth (let’s say 3 moves). Assign a score to that solution based on the pieces left on the board. Do this for the next possible moves.
This way, the machine is searching through a large number of solutions and picking the optimal one. (This is similar to how we humans would play chess. We look ahead to the extent our mind allows). Since the machine is searching through a large number of solutions, this technique is often called Search & Optimize.
Optimize comes from the fact that, in general, a Search technique will require a way to optimize. For example, in chess, the number of solutions get unmanageable especially if you look ahead more than a few moves. In such a situation, it is very time consuming for the machine to evaluate each and every possible option. We impart some knowledge on how to Optimize such that it can skip some of the solutions, and still be effective.
In summary, we give the machine a model of ‘Chess’. And then we use that model to have the machine make decisions when it is playing chess.
Planning & Scheduling form of AI
Planning and Scheduling is an extension of search and optimize. In Planning and Scheduling, we impart the machine with similar knowledge like we did in search and optimize. Except, here the problem scenario requires planning not just one move, but a few steps and scheduling those steps. Scheduling simply means when to execute each step.
Let’s look at an example of building parts of AI for autonomous vehicles.
Our autonomous car needs to take this Exit 7. The car is in the middle lane, and there’s a BMW in the right lane. (I use BMW just so that I don’t have to keep saying car A and car B). Our autonomous car has a few solution options:
Slow down and let the BMW pass us. Then change lane towards the exit
Speed up and get ahead of the BMW, and then change lane in front of the BMW
Give indicator, and wait to see BMW’s response. Does the BMW slow down to give us way
All options are unsafe. Skip the exit altogether and plan to take the next exit. (I know, who does that!)
Like Chess, we have given our machine ability to itemize options (solution space), and information about speeds and trajectories. The machine can then use that information to evaluate each of the options and assess the best option, given the criteria. The evaluation criteria could be a weighted average of various factors — time to destination, safety, fuel consumption, etc.
As with the Search and Optimize approach, our AI will search through the options, and pick the best one, and plan and schedule. The plan could be (a) Accelerate now for 2 seconds to reach a speed of 63 mph; (b) at the 3rd second, give right turn indicator; (c ) at the fourth second, change lane; (d) at the eighth second take exit.
Limitations of Symbolic AI
As powerful as Symbolic AI has been, there are some limitations, and it has inherited an unfortunate name of Good Old Fashioned AI (GOFAI). Talk about getting dumped!
In our example of Knowledge Representation AI, it is very difficult to represent all the knowledge in the world. Even if we are trying to represent a very narrow domain (let’s say Car Insurance), there is just too much to represent, and there is additional work every time the information changes or new information is added.
In our example of Search & Optimization or Planning & Scheduling, not all problem scenarios will have clear indications on the possible solution space or a possible evaluation approach of potential solutions.
These techniques therefore become very challenging to use, if not outright impossible when considering problems such as: Can we build an AI solution to predict the price of houses? There’s no clear way to model a house price like we modeled Chess.
Machine Learning (Subsymbolic AI)
Enter Machine Learning, also called Subsymbolic AI, or Statistical Learning, or Connectionist AI. With the availability of large amount of data and virtually unlimited computing, we are seeing the incredible power of this approach to AI.
Let’s start with a frivolous human comparison. There are two means by which we exhibit our intelligence. One is through accumulated knowledge. We have knowledge that red apples are sweet and green apples are tart.
The second is what we have learned from examples or experience. We learned by identifying patterns. We know that apples with brown spots are most likely rotten. We have learned from experience to detect the nature of the spots that indicate rotten apple, versus spots that are just coloration of the skin of the apple.
Symbolic AI that we have been talking about so far, is like the accumulated knowledge part. There’s some knowledge and it is used to reason.
Machine Learning is like the learning from experience part. Once we have seen enough examples of apples with spots, we know which spots were indication of rotten apples and which ones were just colorations. We are then able to generalize in our mind about what spots are most likely signs of rotten apples.
Example: House Pricing
No machine learning discussion is complete without the canonical house-prices example.
Our goal is, given some features or information about a house, can we build an AI system that will tell us what the price (selling price) of the house could be?
We intuitively know, this cannot be represented as knowledge, as there are no hard and fast rules like there were in chess. We know bigger houses command higher prices, but there are just too many dimensions. School districts matter, city vs rural area, crime data, what are the taxes, how many bedrooms, how many baths, what’s the flooring like, and so on.
What if we look at some examples of houses that were sold recently and see if we can extract a formula that could explain the prices of those houses? For the formula to explain the prices of those houses that are in our examples, the formula must output a price that is not very different from the actual selling price of the house. Or to put it differently, the formula, when used on our examples, should output prices that have low errors. For example, if the actual price of the house was 650,000, we want our formula to output, say, 649,900. So the error is only $100.
Let’s say we have the data above. We have some features of the house: Size of the house, school rating, which is on a scale of 0–5 with higher number indicating good schools in the area. House age is how old the house is in years. The actual selling price of the house is in USD.
We know the house prices is driven by these features, size, quality of schools in the area, and how old the house is. Obviously, there are hundreds of other features, but for the sake of our example, we’ll just assume these three features. So maybe we can approximate the price of the house as
where the numbers w1, w2, and w3 determine how much the price depends on each of the features. If the price depends a lot on the school rating, may be w2 will be a high number. If the price gets lower for older houses (i.e., higher age), may be w3 should be a negative number.
Now the problem statement is to find w1, w2, w3, such that when the formula is applied to my data, the price it outputs has less errors, that is, it’s close to the actual price.
We can do this by trying different numbers for w1, w2, and w3. Let’s say I tried many numbers and I arrived at the following values: w1 = 200, w2 = 50,000 and w3 = -1,000.
These numbers output prices for houses that are less by 10,000 each. So if I add a constant value of 10,000 to the formula, the formula is now outputting the price of the house correctly.
Now we have a formula that is able to explain the prices of the houses sold recently. With some assumptions and testing, we can conclude that given that this formula is explaining prices in the past, it can predict prices of future houses.
We now have the house prices knowledge in the form of a formula. Our prediction machine will carry this formula as its knowledge. When presented with the goal of predicting the price of a new house, given some features, it’ll use this knowledge (i.e., apply the formula), and output the predicted price.
On a very high-level this is how machine learning works. The effort lies in discovering the weights w1, w2, and w3. We try millions and millions of combinations of weights and we use a computer to do that. Besides, with calculus and other techniques, there are systematic ways of discovering the appropriate weights.
Deep Learning (or Neural Networks)
One of the key reasons of explosion of Machine Learning approach has been due to the power of deep learning. Given that this article is not about how deep learning works, I will keep it brief in that it is loosely modeled after how neurons are connected in a human brain.
A deep learning network contains layers and layers of such artificial neurons, much like how a human brain contains interconnected neurons.
Deep Learning and Neural Networks are used interchangeably. The accurate distinction is that Deep Learning is a Neural Network with more than one hidden layer. We saw in the house pricing example that there is one weight per feature (e.g., w1 for the size of the house). Because of the way a Neural Network is structured, each feature is associated with very large number of weights and in combination with very large number of other features. This enables the NN to retain more information about the data that it is trained on.
Reinforcement Learning is an incredibly powerful technique, which when married with deep learning, has been yielding some unbelievable results the past few years. In reinforcement learning we teach the machine knowledge (again, in the form of mathematical formulas if we are using deep learning along with reinforcement learning) by teaching like we would teach a child — rewarding good (well, expected or preferred. Good is subjective!) behaviors and penalizing bad behaviors.
Let’s say we want to teach a robot to tidy up a room. We use a system of rewards and penalties. The software is set to the objective of maximizing the positive rewards and minimizing the negative penalties. As the robot picks up objects it is supposed to pick up, it receives a signal indicating a positive reward and visa versa. When the software is let to do this a very large number of times, it learns to pick up only items it is supposed to pick up.
As a ground-breaking example of reinforcement learning is this work by DeepMind (Google), Mastering Chess and Shogi by Self-Play with General Reinforcement Learning Algorithm , the authors were able to build a super-human level chess player by letting the machine self-play and learn from rewards and penalties.
If we were to compare Machine Learning to Symbolic AI, in Machine Learning, we represent knowledge mathematically, in the form of a formula (there are some exceptions to this in non-parametric models). In symbolic AI, we are representing knowledge as symbols, for example, text.
Limitations of Machine Learning approach
Since machine learning approach is looking for patterns in the data, the results are only as good as the data that is used to learn the knowledge. If the data has bias in it, the knowledge will carry that bias. The model’s prediction will hence be biased. There are many examples of bias in machine learning models. My next post will be on that topic.
The other critical challenge with machine learning, specifically deep learning is explainability. The mathematical formula gets so incredibly large that it is impossible to tell why the formula is predicting what it’s predicting. As an example, even a toy-like, trivial neural network will result in a formula like the one shown below. The formula shown below has 12 weights. Imagine this with tens of millions of weights (Ws), which is the case in real world problems.
There are some other challenges as well. This paper Deep Learning: A critical appraisal itemizes the challenges.
While there are challenges, the results of deep learning cannot be overstated. Deep Learning has enabled us to do things we could not have imagined just a few years back.
A DARPA perspective on AI
I want to close this post with a DARPA perspective on AI, which aligns with our structure of discussion. I also encourage you to watch the 15 mins video in the DARPA page linked above. It is one of the most eloquently put together overview of AI that I have seen.
If we were to map DARPA’s perspective to the EU’s AI definition, here’s what it would look like
Wave-3 of DARPA, called Contextual Adaption is about simultaneously using the two approaches of AI: Handcrafted Knowledge/Symbolic AI and Machine Learning/Statistical Learning/Subsymbolic AI. This has some critical benefits. It combines the power of machine learning and the explainability of handcrafted knowledge, thus enabling us build models that have better explainability.
And the combination enables us to build models that are not possible with one approach alone. For example, in many cases there are limitations in how much data is available. In such cases, combination of the two approaches is a great way to solve the problem.
We have been using this pictorial part of EU’s AI definition so far. Let’s briefly talk about the last box in the image, Robotics.
Robotics is a field that uses both the approaches to AI — the symbolic and increasingly, machine learning. For example, a robot’s vision sensors may use machine learning, it’s motion actuators may use Symbolic AI (e.g., planning and scheduling).
A robot will also include various other techniques outside of the field of AI (e.g. PID controller for physically moving parts of a robot) to integrate it into the physical world. Because of the use of non-AI techniques as well, the robotics box is sticking out of the overall AI box in the definition picture.
Here is the verbal definition of AI from EU’s definition document. While the language below is verbose, it should now make sense.
The formal definition:
“Artificial intelligence (AI) systems are software (and possibly also hardware) systems designed by humans that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal. AI systems can either use symbolic rules or learn a numeric model, and they can also adapt their behavior by analyzing how the environment is affected by their previous actions.
As a scientific discipline, AI includes several approaches and techniques, such as machine learning (of which deep learning and reinforcement learning are specific examples), machine reasoning(which includes planning, scheduling, knowledge representation and reasoning, search, and optimization), and robotics (which includes control, perception, sensors and actuators, as well as the integration of all other techniques into cyber -physical systems).”
In closing here’s my annotation of the definition
You mean apart from this absolute clarity on what is AI?
The takeaway is about what DARPA’s perspective calls the third wave of AI. If you are serious about cutting-edge AI, building hybrid/combination models that combine the two approaches is the future.