What is machine intelligence & how can we measure it?

Published in

binaryandmore

7 min readDec 20, 2018

Introduction

The field of Artificial Intelligence has seen tremendous development in the last two decades or so. At the heart of the discipline of AI is the idea that one day we will be able to build machines that are at least as intelligent as humans. Such systems are often referred to as Artificial General Intelligence.
These problems raise fundamental questions that challenge our understanding of intelligence — What exactly is intelligence? Can we have a mathematical equation which measures the intelligence of arbitrary machines?

In the paper, Universal Intelligence: A Definition of Machine Intelligence, Shane Legg & Marcus Hutter take a dig at this problem. They extract essential features from a number of well known informal definitions of human intelligence & mathematically formalize them to produce a general measure of intelligence for arbitrary machines.

The essence of intelligence

Considering various informal definitions of intelligence, we observe some similarities in them:
1. Intelligence is seen as a property of an individual who is interacting with an external environment, problem or situation.
2. An individual’s intelligence is related to their ability to succeed or “profit”.
3. Intelligence is not the ability to deal with a fully known environment, but rather the ability to deal with some range of possibilities which cannot be wholly anticipated.

Bringing these key features together, Shane & Marcus come up with their own informal definition of intelligence which gives us the essence of intelligence in its most general form:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

A definition of machine intelligence

The informal definition mentioned above has 3 basic components: An agent, environments & goals.
The agent & the environment must be able to interact with each other & be able to send signals to each other. The signals sent by the agent to the environment are the actions and the signals sent by the environment to the agent are perceptions.
The definition also requires there to be some kind of goal — which is nothing but the objective that the agent actively pursues by interacting with its environment.

The existence of a goal raises another problem of how the agent knows what the goal is. One of the possibility could be that the goal is known in advance and this knowledge is built into the agent. This however limits the agent to one goal. We need a more flexible way to inform the agent about its goal. This is done easily with humans through language. However, we cannot assume that the agent possesses a sufficiently high level language for us to communicate with it.
To overcome this difficulty, we define another signal, that indicates how good the agent’s current situation is. We call this signal as the reward. The agent’s goal is then simply to maximize the reward. In some sense the goal is fixed, but we are not limiting the agent as we have not defined what causes different levels of reward to occur.

The perceptions contain the rewards and a non-reward part, the observations.
Hence, the goal, in a broader sense is defined by the environment as it defines when and what rewards are generated.
This is a commonly used framework in reinforcement learning.

Formalizing the framework

Agent-Environment Interaction:
The agent sends information to the environment by sending symbols from some finite set which we call the action space denoted by A. For example, A := {left, right, up, down}. Similarly the environment sends the signals to the agent with the symbols from a finite set called the perception space denoted by P. Every perception consists of two separate parts, an observation & a reward. The reward space R is a subset of the rational unit interval [0, 1] ∩ Q. For example, P := {(cold, 0.0), (warm, 1.0), (hot, 0.2)}, where the first part in the tuple is the observation & the second part is the reward.
To denote the actions, observations and rewards, we use a, o, r and we index them such that a¹ is the agent’s first action, o¹ is it’s first observation & similarly r¹ is it’s first reward. The agent & the environment take turns at sending symbols which produces a history of observations, rewards & actions such as o¹r¹a¹o²r²a²o³r³a³…

The Agent
Formally, the agent is a function denoted by π, which takes as input the current history and chooses the next action as the output. We can then represent the agent as a probability measure over actions conditioned on the complete interaction history. Thus, π(a⁴|o¹r¹a¹o²r²a²o³r³a³o⁴r⁴) is the probability of the action a⁴ given that the current history is o¹r¹a¹o²r²a²o³r³a³o⁴r⁴. In AI the agent will be a machine and so π will be a computational function.

The Environment
The environment denoted by μ is defined in a similar way as the agent. Specifically, the probability of o⁴r⁴, given the current interaction history o¹r¹a¹o²r²a²o³r³a³ is given by the probability measure μ(o⁴r⁴|o¹r¹a¹o²r²a²o³r³a³).

The Measure of Success
We now need to formalize the idea of “profit” or “success” for an agent. Informally, we know that the agent must try to maximize the amount of reward it receives, but this requires us to determine how we value the reward in the near future versus reward in a more distant future. In some cases we might want the agent to perform well fairly quickly & in others we might only care that it eventually reaches a level of performance that is as high as possible.
We define a value function V which gives us the expected future value for a given agent and environment. V is the expected value of the sum of the rewards into the infinite future. We want the rewards returned by the environment to have the temporal preference factored in. To solve the problem of valuing the rewards, we impose another condition on the rewards returned by the environment, that gives us:

Here the expected value is taken over all possible interaction sequences between the agent π and the reward summable environment μ. By adding the additional condition that the reward returned by the environment can never exceed 1, we normalize the rewards such that their sum is always finite. This also helps us weigh the reward at different points in the future — which in effect defines a temporal preference.

A formal definition of machine intelligence
In order to define an overall performance measure, we need to find a way to combine an agent’s performance in many different environments into a single overall measure. As there are an infinite number of environments, we cannot simply take a uniform distribution over them. Mathematically we must weigh some environments higher than others. For this, we weigh the environments based on their complexity.
We now need to measure the complexity of a given environment. For this, we use the Kolmogorov Complexity. The Kolmogorov Complexity of a binary string x is defined as being the length of the shortest program that computes x:

Where, p is a binary string which we call a program, l(p) is the length of this string in bits, and U is a prefix universal Turing machine called the reference machine.

An important property of K is that it is nearly independent of the choice of U. To see why, consider what happens if we switch from U, in the above definition of K, to some other universal Turing machine U′. Due to the universality property of U′, there exists a program q that allows U′ to simulate U. Thus, if we give U′ both q and p as inputs, it can simulate U running p and thereby compute U(p). This makes K an excellent universal complexity measure.
Using a simple encoding scheme, we can express each environment as a binary string which is the description of the environment. This lets us define the complexity of an environment μ to be K(μ). We will use this complexity of the environment as a weight associated to the future value of a given environment.

We can now define the formal measure for intelligence of arbitrary machines as:

Where, E is the space of all computable reward summable environmental measures with respect to the reference machine U, and K is the Kolmogorov complexity function. Υ(π) is the universal intelligence of agent π.

Basically, the universal intelligence of an agent is the weighted sum of the performance of the agent in a wide range of environments.

References

Legg, S., & Hutter, M. Universal Intelligence: A Definition of Machine Intelligence. (arXiv:0712.3329v1).

If you learnt something new from this article, please hit the 👏 icon to support it. This will help other Medium users find it.
Share it, so that others can read it.