A brief overview of the quest for machine intelligence

Published in

Incogito

12 min readApr 26, 2017

When a student discover the field of AI in its current form, it is hard to imagine the incremental discoveries that led to its modern formalisms. In this short summary of the story of AI I will try to present key concepts in its evolution through the history of search algorithms in discrete and deterministic state-spaces.

Problems in discrete worlds

Our conception of an intelligent agent is historically one that can solve problems. One classic type of problem consist of a relaxation (approximation) of our continuous and infinite reality into discrete state worlds.

Such worlds are defined by a certain number of clearly defined states, the “state-space”, and a set of actions to go from one state to the next. Those problems constituted the core of AI research for the most part of 20th century and still play a very central role for intelligent systems.

In such approximated spaces, a problem can be defined by deciding on a initial state, and a goal state. Those problems become complex exponentially fast due to the combinatorial explosion of the possible sequence of actions to reach a certain goal as the number of states and actions increase.
Games for example are naturally described in term of such discrete problems . They are one of the main area of research of AI, and victories against world champion are key landmarks for its evolution. In chess the victory of Deep Blue against Garry Kasparov in 1996 shook the world, as for the first time the supremacy of the machine over man, in a game of “intelligence”, was publicly recognised.

In the game of Go, supposed to be decades away to be beaten due to the shear complexity of the state space, saw in 2016 the victory of Alpha Go against Lee Sedol, a 9th dan Go player.

Besides games, solving problems (such as the 15-puzzle invented by Noyes Chapman in 1870, the 8-queens problems, or the missionaries and cannibal problems etc.) was also a strong incentive and benchmark for algorithm development.

Search in an atomic state space

Research on state space search originated in the 50’s with work of Newell and Simon’s “The Logic Theorist” (1957) and GPS (1961). As an example, one of the simplest of the “uninformed search algorithm”, breadth first search was formulated by Moore in 1959.

Fig2. Two example of uninformed search algorithm

The breadth-first search algorithm, that explore every branch at each level of the tree before going deeper, will always find the optimal solution given that it exists and that the algorithm can run for an unlimited amount of time with unlimited memory.

An uninformed search algorithm is one for which the structure of the problem does not inform which branch of the tree should be explored, the different algorithms will always explore the tree in a specific way whatever the problem (i.e breadth first search will always explore the tree from left to right, one level at a time, in the order given initially).
Research went on in the 60’s and a strong theoretical framework was solidified by Nilsson in 1971 in his book “Automated Problem Solving”. Algorithm got better with time, for example through caching of previously observed state such as in graph search, or in A* search (Hart, Nilsson and Raphael, 1968) by combining both a notion of the explored path cost and an approximation of the distance remaining to the goal.

Numerous alteration to classic algorithm were tested and published, such as Pohl dynamically weighted A* search (1973). Of course, besides the algorithm, available computation power has been increasing exponentially and problems that were intractable for computers in 1960 became progressively accessible.

Factored Representation

Not all problems are created equals and some classes of complex problems, such as NP-Hard problems for which an optimal solution is found only at a cost growing exponentially with the number of parameters (Non-deterministic polynomial time problems). Those harder problems benefited from research into what is called “informed search”. Informed in the sense that the structure of a specific problem will affect the choice of the search algorithm in its exploration of the state space. Those algorithms use what is referred to an “heuristic” to give a value to the states they have explored that should approximate their distance to the goal. It was proven early on that relaxation of problems, by reducing the number of constraints, allow to define accurate heuristic that still encode relevant informations about the problem. For example, the 15-puzzle problem has the constraint that two numbers cannot occupy the same cell, and that a number can only move to a neighbouring cell. By “relaxing” the problem, relieving it of some of its constraints, one can build an heuristic of the distance remaining to the goal without having to explore the complete tree.

Fig3. Only the 7 is in the right position

In the problem presented in Fig 3. if the two constraint are relaxed the heuristic will simply be number of misplaced number, here 14 given that only the 7 is in the right position.

For the Fig 4. the same heuristic would give 2, which is understandably a better value to this state. This is an example of when such approximation fail, indeed this approximated heuristic would mislead us into thinking this is a better state, but in reality this specific configuration is unsolvable.

The goal of those approximation heuristics is to guide the search and reduce the number of explored branch. And here also the march of science is moving at an iterative but assured pace, classic problems such as the Traveling salesperson for example that has been proven NP-Hard by Karp in 1972, has been relaxed into a full polynomial approximation in 1998 by Aurora. In this context, a polynomial approximation means that the search using this approximated heuristic takes a time that grows proportionally to a polynomial of the number of parameters as opposed to exponentially with the number of parameters.

Approximations and Heuristics

While the state-space representation is enough to describe any discrete system, the language through which such state space is defined also evolved to represent specific class of problems. Factored representation consider the states has a collection of variables instead of an atomic element (i.e. each state is indivisible). Those representations and the algorithms developed around it integrate at least in part the structure of the problem and help reduce the state-space search.
One such type of problems are the constraint satisfaction problems. In such problems the variables that define the states constrain each other, and the goal is then to find a set of values for those variables that meet the constraints.

An example of CSPs is the old mathematical problem of graph coloring. If represented on a map, the problem consists on coloring each country with a limited number of color, say red, blue, and green, in such a way that no two neighboring countries have the same color.

One characteristic of such problems is that the order of actions has no effect on the outcome, making the solution an unordered combination of variable values. A famous algorithm to solve called backtracking, solves those kind of problem by moving back up the tree each time the constraints cannot be met to expand a sibling of the current node from a parent node. If that parent node had no admissible child node (i.e. which respects the constraints) the algorithm would back up once more and so on until a solution is found or the problem is proven infeasible. In this way the search is more cost effective and a large portion of the search space is eliminated.

Fig6. Short video where you can see the trial and error of a backtracking algorithm

This algorithm was formulated by mathematicians in the 19th century but implemented for solving CSP by Bitner and Reingold in 1975.

Knowledge-based agents

Closer to how we represent problems, other representations can formalize higher order interactions between variables, or allow for the definition of objects enriched with properties and functions upon those objects.
This knowledge based representation allows to solve complex problems in which actions and states can be defined by a certain knowledge of the world. Agents implementing such representations are referred to as knowledge-based agents, they are better suited to solve real world problems and because they can reason about the world based on a set of higher level rules, a structured knowledge, they can build a dynamic state space (declarative approach), and not have the search space hard coded initially (procedural approach). Usually the goal of those representation is to be able, inside of the framework of the language, to derive new unknown facts from other known facts.

One example of a declarative representation language is well… our natural language. For example in english you could say “If you drop an object, it falls to the ground”, and that will inform you on any situation, anywhere, with any object. You would know, if you believed me, that if you drop it it will fall to the ground and you could predict the future for any object dynamically. In a procedural language, taking our atomic state-action world for example, you would have to create a state for every object in the world like: “Has apple in hand”, “Has lemon in hand”, “Has computer in hand” etc. Then you would have to create every action for every object you were holding: “Has dropped apple” (…) and finally you would have to create every new state “Has no apple in hand and apple is on the ground” (…). You would need all those states and action to make inference, that is predicting how the world would evolve, and which state you will be in after the action is done. We easily see how structured representation might come handy in the real world, and for our sanity.

The beauty of the human brain is our one shot learning. This should be the subject of another post, because it is fascinating in itself. But imagine you are on a distant planet, you see your buddy walk on a weird blue algae and immediately die in agonizing pain. To help you avoid such fate in the future this horrid vision surely would start a sequence of learning through reflection, reviviscence, and inferences. You would revisit everything that happened leading to that event. You will think of everything your buddy did, every little element of the environment, and you would flag those actions in that environment as dangerous. But you would also go higher up, you would identify the algae by its shape, form, and color. But at the same time, if you were to see a red algae instead of a blue one, you would not walk on it. You would also probably infer that it is not ok to touch the algae with any appendix of your body, or try to smell it. And finally you would probably make a mental note that there are things on the planet that can kill you and you should be cautious of everything out of the ordinary. This vast knowledge, aquired from a single visual stimuli, would enable you to synthesize your knowledge with a simple:

“DANGEROUS ALGAE. YOU MIGHT DIE.”

That is one of my main opposition to people who say that human communication is slow. It is true that the bitrate might seem slow, but it is misleading to think we are slow communicators, because our language is the expression of the shared structure of our brain and common memories, which contains a vast amount of information. In that sense natural language is not only a beautifully compressed form of communication, but it is a partial form of communication that makes the bet that a common structure is present at the receiving end and that it will be able to deploy the same knowledge as you from a partial message. You do not have to say “do not walk on the algae” or do not smell it or what not. If the receiver has the same past experiences as you, or if he knows that you do not usually joke about weird algae, he will get it and will build by himself those inferences.*

The language of logic, as another important example, is one able to define a certain domain of knowledge about the world. The first known systematic study of logic dates back to Aristotle in posthumous treaty Organon assembled in 322 B.C. Since then logic was formulated into a formal mathematical system by George Bool in 1847, and the first comprehensive exposition of propositional logic (and first order logic) was developed by Gottlob Frege in 1879. McCarthy in 1958 introduced first order logic as a tool to build AI systems and Robinson’s in 1965 developed a complete procedure for first order inference. The two main algorithms developed to solve first order logic problem, the forward (start from the initial state) and backward (start from the goal) chaining were developed around 1970.

Research in the field of logic helped develop more powerful factored representation language based on it such as the Planning Domain Definition Language (PDDL). This formalism uses objects and functions to represent complex planning problem in which the state space is vast but strongly structured. The language was formalized by Ghallab in 1998, and has since then been used in international competition to define planning problems.

Fig7. Note that Cargo and Plane are objects and actions use variable to describe complex conditions and effects irrespective of the specific plane or cargo. © S. Russel, P. Norvig

The power of representation

The current state of AI rests upon the shoulders of giants, a long lineage of scientists from many field, philosophy, mathematics, and more recently computer science. Through history it is evident that it is how we represent problems that is the real bottleneck. The representation, the structure of problems, the language we use to talk about it comes first, then a robust solution can be found and progress can be made on the resolution algorithms.

Although some of the representation introduced here are able to code complex knowledge about the world (e.g PDDL) real world problem often escape such approximations. The real world is stochastic, continuous, ever changing, filled with interacting and unpredictable complex systems. Our inability to embed the world in its full complexity gives the renewed trend of neural networks and deep learning all its meaning. Indeed through deep learning, a knowledge that we did not know how to represent can find an embedding (i.e. a learned representation) through supervised learning in a vastly distributed and structured system of single unit processors, the neural network, probably not unlike how our brain does it. That is what they are good at, through their structure and associated weights and given some constraints of smoothness, a neural network with one layer of unbounded size could represent any function whatever its complexity.

Universal approximation theorem - Wikipedia

In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed…

en.wikipedia.org

It might not be able to learn the function during your lifetime, because learning also depend on how you make it learn, but it can in theory represent it. The rebirth of neural nets was probably spurred into action by the 2012 ImageNet challenge when Alex Krizhevsky et al. from U. of Toronto achieved far better results than the state of the art in image classification using a convolutional neural network. Since then, researchers from academia and the private sector have shown that given enough examples of quality data, and good learning parameters, deep learning can probably “embed” any sort of knowledge you could think of. The structure of the problem (the invariance in the data) extracted by the NN to form a knowledge base is often incomprehensible to us due to its distributed complexity. But it is an embedding nonetheless, a structured and approximated representation, able to guide the decision of the machine.

To conclude I believe that the future of AI will rely both on those loose and complex embedding, but also on structured knowledge gathered by mankind across history that we will share (or encode) through structured language to our learning agents. The way we will represent this knowledge seems key and rather than a declarative representation, I believe it could be structural in nature, akin to the notion of distributed representation (or the more crude connectivity patterns) as seen in the brain.

*DISCLAIMER: If you happen to be living on a weird planet with a child you will have to forbid him to go out on his own or if old enough explain him in more detail.