History of Artifical Intelligence (AI) — defining key milestones from 4th century B.C. to 2017

17 min readMay 1, 2024

In this article, I am outlining a detailed history of Artificial Intelligence (AI) development going as far back as 4th century B.C. and covering modern times up to 2017. By reading it you will learn what the key eras in artificial intelligence development are, who the key contributors to AI’s progress were and what AI achievements marked key milestones in shaping the evolution of this transformative technology.

As explained in my previous article “AI — what artificial intelligence really is, explained in detail through science”, artificial intelligence at a high level can be defined as any of the three: 1) set of advanced technologies that enable machines to perform highly complex tasks with the aim to solve problems encountered by humans in a most effective manner, 2) branch of computer science investigating different problem areas like for example speech recognition, robotics, image and video processing by developing computer programs exhibiting intelligent behaviour or 3) computer-based analytical work process of machines that aims to replicate or surpass in abilities/ perform tasks considered to require intelligence if performed by humans. You can find more details on what AI is, how we categorise AI techniques and where they are currently applied in that article.

Current state of Artificial Intelligence is a sum of continuous thought- provoking discussions, versatile multidisciplinary research, and improvements through trial and error over the span of few decades. The overall evolution of this discipline can be divided into 4 periods: Pre-Computer (before 1950), Beginnings (1950- 1970s), Winters (1970s-2000s), and Reawakening (2000s- onwards). The Table 1 below summarises major contributions to AI field which are then discussed in detail.

Table 1 AI History Periods and Major Contributions

Pre-Computers (4th century B.C to 1950)

Not surprisingly the beginnings of Intellectual Tradition of AI originate in ancient Greek philosophy with the work of Aristotle- the “Physics” and “Logic” written in the 4th century B.C. In the former ‘The Father of Wester Philosophy’ built foundation for the ideas of data abstraction and symbolic calculus by making a distinction between matter and form. In the latter, he argued (claimed, proposed, interpreted) that the basis of knowledge is indeed in the study of though (Kornienko et al., 2015). Those and other contributions of the Greek philosopher became background for studying formal axiomatization of logical reasoning (Russel and Norvig, 2016).

In the 13th century Raymond Lull, proposed that mechanical artefacts can be used for reasoning conductivity, believing that new truths can be uncovered (deducted) through different variations of known concepts (Kornienko et al., 2015). Four centuries later, the “Discourse on the Method” by Rene Descartes became the grounding work on the modern concepts of thinking and mind. This among his other studies played crucial role of connecting different intellectual traditions of AI research.

In his quest to identify the foundations of reality and being, he proposed that the physical worlds, mind, and most importantly their interactions are a “necessary condition of being; therefore, it is necessary to find a way to reunite them”, which in a way, AI researchers try to achieve Kornienko et al., 2015).

Lastly, of great contribution was early 20th century work by Bertrand Russell and Alfred Whitehead. The “Principa Mathematica” they published was a milestone in formal logic, that set foundations for formal representation of mathematics. By successfully using logic in the form of symbols to represent mathematical expressions, it allowed Alan Turing in 1942 to show that machines can process all forms of mathematical reasoning using 0s and 1s (Bulusu and Abellera, 2018).

The progress of thought was accompanied with the developments in technology. First mechanical calculator was conceived around 1500 by Leonardo da Vinci, and the first calculating machine called “Speeding Clock” (1623) was constructed by Wilhelm Schickard. This preceded Blaise Pascal’s “The Pascaline” (1642) capable of adding, subtracting, multiplying and dividing two numbers, and of Gottfried Wilhelm Leibniz’s “The Stepped Reckoner” (1694). The very first successful, mass-produced mechanical calculator was “The Arithmométre” (1820) created by Charles Xavier Thomas de Colmar, which allowed for long multiplication and division and was manufactured for almost 1 century. While at around the same time Charles Babbage envisioned the first programmable computing machine (1830s), the IBM 650, the first ever mass-produced computer came c. 120 years later (Miklós, 2013).

The Beginnings (1950 to 1970s)

It is widely accepted that Alan Mathison Turing pioneering British mathematician, and famous crypto analyst (code-breaker) is considered to be the father of theoretical computer science and artificial intelligence (Beavers, 2013). He is well known for developing the principles of modern computer with the invention of Turing Machine in 1936, and playing a major role in breaking ciphers during II World War which inspired the historical drama “The Imitation Game” (Hodges, 2012). It was in 1950 when Turing wrote his philosophical paper on machine thinking “Computer, Machinery and Intelligence”, launching and inspiring much of the Artificial Intelligence development and philosophy (Paschek et al., 2017; Hall and Pesenti, 2017). In it he explored the meanings of “machines” and “thinking” and asked a question of whether a program can be truly intelligent. He also invented, what came to be known as the Turing test- an experiment in which a human communicates with two non-visible partners, a program and another person via a computer terminal (Turing, 2009; Hall and Pesenti, 2017). At the end, the test person must choose which one is human. Turing proposed that if a machine communicates (conducts a conversation) well enough, so that it could not be distinguishable from a conversation with a human being, the machine could be attributed (said) to be “thinking”.

The very earliest AI research aimed to explore Turing’s idea. To do so various tools and techniques were developed focused on symbolic programming initiated by McCarthy group’s paper “Recursive Functions of Symbolic Expressions,” (McCarthy, 1960). The key idea they introduced was In the words of authors that “People write formulas that say how to manipulate formulas”. Therefore, in addition to formulating functions of number and strings of data, we can also formulate functions of expressions allowing programs to write programs wherein they treat themselves and other similar materials as data source.

In fact, many special purpose languages that machines used to manipulate
expressions in their own programming languages were written at that time including for example LISP in US as well as POP-2 and Edinburgh Prolog in UK (Hall and Pesenti, 2017).

Ever since the beginnings of AI field, public’s attention was directed towards AI performance in games as an indicator of its progress. Christopher Strachey was the first one to write a programme to play draughts (1952 or 1951?), as well as a combinatory algorithm that resulted in a love letter poem considered the first piece of digital literature ( Wardrip-Fruin, 2005).

Another early success was MENACE (Machine Educable Noughts And Crosses Engine) developed by Donald Michie, a machine composed of 304 matchboxes filled with coloured beads that was able to play and progressively learn to become unbeatable (one could only draw) in Noughts and Crosses (1961). (Michie and Chambers, 1968).

Turing’s work indeed started a very vivid discussions which led to the first academic conference (8 weeks summer workshop) on the subject, in the summer 1956 at Dartmouth college, which was attended by some of the key thinkers like John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon. To describe it the term “artificial intelligence” was coined by the McCarthy, who defined it as ““the science and engineering of making intelligent machines, , especially intelligent computer programs.”27” (McCarthy, 2007).

During the 1960s variety of more practical approaches and areas were being explored by scientists, including: automated reasoning processes used for problem-solving, developments in robotics, used for process automatization and natural language processing, used for retrieving information from documents and word sense disambiguation. Indeed, the first digitally operated and programmable robot, Unimate developed by George Devol in 1954, became the world’s first industrial robot to be used in General Motors Assembly line in 1961 (Robotics Online, 2018).

3 years later the natural language processing (nlp) program by Joseph Weizenbaum gave birth to ELIZA, worlds first chat bot, that simulated conversations through “patern matching” and methodologies of substitution (Weizenbaum, 1976; Norvig, 1992).

Finally the first robot “aware” of its surrounding, Shakey was built in 1970 and Freddy I and II in the following years were the first systems to merge vision, movement, versatility and intelligence. (creating what some call the first fully integrate ai system) (Computerhistory.org, 2018)

Winters (1970s to 2000s)

Despite huge early enthusiasm and hype around artificial intelligence, 1970s started reoccurring periods of research stagnation, referred to as “winters” in the literature, that lasted until 2000s.

Overall the trials to build fully developed AI systems in line with the early philosophers’ principles of logic failed due to lack of computing power, access to large enough amounts of data, and difficulties of dealing with ambiguity and uncertainty. Despite continuous advancements, the significant real world (commercial) applications were few and far between. This led to cuts in funding in the 70s as research backers, first and foremost US government, (“a long winter of minimal Federal research funding ensued from 1975 to 2000”) were tired of waiting (Chui et al., 2017). Also UK government limited its support to only 3 higher education institutions (Edinburgh, Sussex and Essex), partially as a result of Sir James Lighthill’s paper “Artificial Intelligence: A General Survey” published in 1973 (Hall and Pesenti, 2017). In this report, he evaluated academic research in the field, and concluded that it failed to address (solve) the issues of combinatorial explosion, crucial for real world problem solving (Lighthill, 1973). This was very important as the dominant approach at that time, modelling complex reasoning by a search in decision trees, was exposed to this fundamental problem- the number of possible combinations that one needs to examine grows exponentially, so fast that “even the fastest computers will require an intolerable amount of time to examine them.” (Tsang, 2005) . Lightwill was thus sceptical about the possibility (ability) of AI techniques to scale up to solve very complex problems, and shed a pessimistic view for years to come.

Nonetheless, research did not stop there.

Advances in symbolic programming enabled to better understand the nature of high-level problem-solving intelligence. In the 1980s progress in tools supporting and simulating complex expert reasoning in relatively well structured domains led to the development of “expert systems” that were centred around knowledge and relied heavily on early philosophers concepts/work (thinking). Expert systems, a type of knowledge-based systems, were essentially computer programs/ applications, that made logical inferences, assessing sets of facts within their stored expert knowledge database with the aim to support human decision making and offer solutions for problems (Chui et al., 2017; Guo and Wong, 2013). These systems, along with the advances in natural language processing and interfaces, allowed AI to find industry applications in diverse range of fields like for example stock trading, commodity price predictions and medicine. This in turn moved the fields focus from producing machines that “think” to creating machines that perform tasks, which if undertaken by humans would be said to require intelligence (Hall and Pesenti, 2017).

Because of this the main approach to problem solving using AI, were “brute force” methods, which were nothing alike human reasoning. We can say that it is a trial and error methods that involves crunching a lot of numbers through exhaustive effort (using brute force) rather intellectual endeavour. In the context of games that would mean playing (analysing) all possible game scenario that could occur after every possible move and then picking the choice with highest probability of winning. Such process requires large amounts of computer power that wasn’t universally available at that time. Therefore, two techniques were developed that allowed playing board games to be generally tiresome: heuristics and production systems. The former were algorithms that apply rules of thumbs, by limiting brute force searching by choosing (selecting) solutions that were good enough, not necessarily the best. The latter were programs that applied complex and complicated pre-programmed rules to categorize symbols. Altogether this approach allowed AI to defeat humans in backgammon in 1979 and chess in 1997, when IBM’s Deep Blue supercomputer defeated an undisputed world champion grandmaster of chess Gary Kasparov (Berliner, 1980; Campbell et al., 2002).

In this period other important marks took place. In 1986 Honda’s research into humanoid machines capable of interaction with humans began. 3 years later MIT created Genghis, hexapodal robot, famous for its cheap and quick production (Simpson and Jordanides, 1991). Towards the end of this period U.S. Defence Advanced Research Projects Agency further drove the automatic information extraction research as a result of CIA and DOD (The Department of Defence) concern about the ability to read and summarise electronic documents due to exponential growth of data produced each year (Shih, 2016). As a result they developed Tipster program aimed at

- “finding documents containing information of interest from a stream of text going by or a group of documents that somebody might have “acquired””,

- “Locating specific information within a large collection of text” and

- “Extracting the key ideas that summarized a text” (Shih, 2016).

To further encourage and evaluate research progress, annual workshops were organised called Text Retrieval Conference, that kept increasing difficulty of data used for testing and also started question answering track (“computer programs had to answer factoid list, and definition style questions”).

Nonetheless most of the technological capabilities, and techniques from this era were very costly to implement in real world problem scenarios, and the access to the advances AI field made remained solely for the largest corporations and academic groups. (remained accessibly only, or constr.).

Things have changed though, with the creation of neural networks.

Reawakening (2000s to 2017)

The beginning of 21st century marked an era of increasing public and commercial interest in AI due to the progress in the research of “deep learning” and evolution of neural networks approach to problem solving. Such approach simulates higher level processing comparable to how the human brain operates, with different layers of neurons organized in a “network of pattern-recognition modules” that evolves and changes through the process of “learning” (Ayoub and Payne, 2016).

This along with the exponentially increasing amount of data widely available that could be used by machines as well as the speed and power of computers convinced investors and researchers that finally it might be profitable to work on AI in practical real-world problems domain (Chui et al., 2017).

In this era, Sony developed one of the early domestic pet robots in 1999, AIBO, which was the first emotional AI, capable of displaying 60 emotional statuses (Kaplan, 2000). It also released the first version of ASIMO in 2000, the most advanced humanoid at the time (Sakagami et al., 2002).

In 2004 DARPA’s Grand Challenge, scientists competed to produce autonomous vehicles for prize money and NASA’s Spirit and Opportunity, robotic exploration rovers, were exploring Mar’s surface autonomously (Thrun et al., 2006; Lemmon et al., 2004). 3 years later DARPA pushed driverless cars to obey traffic rules in Urban Challenge, which eventually led to Google’s work on their self-driving car in 2009 (Poczter and Jankovic, 2014; Waymo, 2018). What followed was global commercial adoption of AI technologies, accomplished by Tech-giants. The award winning machine learning for human capture used in Microsoft’s Kinect device for Xbox 360 allowed millions of users to interact in a completely new way (Han et al., 2013). Apple’s launch of Siri in 2011 followed by Google Now and Microsoft’s Cortana proved that natural language processing is ready to be used by millions of smartphone users and perform simple tasks like making recommendations and conducting web search (Canbek and Mutlu, 2016).

Along with global adoption came concern about AI practices ethics and dangers. This resulted in an open letter signed by some of the most known individuals like Elon Musk, Stephen Hawking and Steve Wozniak, along with other 24 thousand researchers and endorsers, to ban development and use of autonomous weapons (Future of Life Institute a, 2018). Moreover, 23 Asilomar AI Principles were formulated in 2017 at the Asilomar Conference on Beneficial AI, to guide future AI development (Future of Life Institute b, 2018).

However up to date, the 2 most important and famous achievements of this area were that of IBM’s Watson and Google Deep Mind’s AlphaGo.

In 2011, IBM T.J. Watson Research Center, the same that built Deep Blue supercomputer which defeated Gary Kasparov, developed “Watson” AI program to compete in Jeopardy!.

Jeopardy! is an American tv show, in which 3 contestants compete in 3 rounds in 6 categories and have to come up with a question for the answer they are given. Participants confidence in their answers is an important variable in the process, as for the correct once points are added, but for wrong deducted. The whole game is based around play with words, slang, pans, ambiguous clues, and requires participant to really understand the language. It pose a real challenge to develop machine not only capable of competing in such a setting but also successfully taking on human champions.

Firstly, in contrast to deterministic approach that rely on pre-determining a small selection of answer types and traditional software development projects that include gathering of requirements, analysis of the problem and writing of a detailed specification,

game play required the machine to operate in a very broad/open domain (High, 2012; Shih, 2016). It was impossible to anticipate all possible questions- in the past there were 2,500 distinct question types and for each thousands of things could be asked. Therefore, because Jeopardy did not have clearly associated requirements (there was no obvious mapping to the way queries are expressed, meaning no closed form solution) deterministic(traditional) approaches would not yield enough correct answers. Program had to discover requirements in real time based on the clue given instead, and undertake probabilistic approach to giving answers.

Secondly, due to complexity, ambiguity and high context of the language used in the show, Watson required enough data to be able to generalize the query pattern well enough to predict how the meaning in the query might be expressed in the content- read understand what he should ask for.

Thirdly, the system had to be able to accurately assess its own confidence in his answers, which was needed to decide whether to hit the buzzer or not.

What’s more, Watson had to be fast enough to scan though its huge database of over 200 million text pages, both structured and unstructured, inside 4 terabytes of data, that required more than 100 algorithms working in parallel using around 6 million logic rules, in order to give answers in real time. Lastly it had to be incredibly precise to challenge human champions who were able to answer around 70% of all questions with 90% precision (Jackson, 2011). Indeed, IBM’s team had to innovate and combine multiple technologies and techniques from advances in natural language processing, knowledge representation and reasoning, statistical machine learning, information retrieval and others.

As a result of 5 years of work, in January of 2011, IBM’s Watson supercomputer have won against the Jeopardy long-run champtions Ken Jennings, and Brad Rutter scorring $77,147 against Jennings’ $24,000 and Rutter’s $21,600 (Shih, 2016).

This proved that machines were capable of “understanding” human language and led to more practical applications of Watson in the fields of healthcare, education and customer experience.

In 2015, Google Deep Mind built Alpha Go program to compete in the ancient game of Go.

Go is an ancient Chinese strategy board game invented more than 2,500 years that is compared to the European Chess even though they are completely different.

Deep Mind’s achievement of creating Alpha Go in our times, could be seen as equivalent to IBM’s Deep Blue from 20 years before. However in reality, the two are an example of how AI research transformed from “brute force” approach to “neural network” approach to winning.

Go is much more complex game then chess with 19x19 board compared to 8x8 in chess and increasing number of possible board configurations as the game progress, compared to deceasing one in chess (Dyster et al., 2016).

None of the go piece, are inherently more valuable than any other, but rather their value depends on their position on the board relative to all the other pieces. These pieces are called stones and are played at the corners of each grid, with the objective to surround as much territory as possible. The complexity of the game can be easily explained with this famous comparison: The number of Go game positions which equals to 10¹⁷⁰ is more than the number of Atoms in the Universe, which is 10⁸⁰ (Senseis.xmp.net, 2018) Moreover, the game involves high degree of subjectivity, as the value of each move depends on the whole board configuration and is sometimes hard to explain even by experts.

These characteristics made creation of Go-winning machine a very daunting challenge.

To address it a novel decision-making algorithm was developed that employed “value networks” to evaluate board positions and “policy networks” to choose next Go moves (Dyster et al., 2016). So instead of calculating each stone’s positions individually it looks for patterns on the board that might offer good tactical opportunities in a process similar to image classification and facial recognition. Deep Mind’s approach uses deep neural networks (DNNs), computational techniques originated for analysing complex visual information. DNNs are made up (consist) of layers of different computational systems, called “neurons”, that are tiled on top of each other and are running in parallel (Dyster et al., 2016). This allows the network to analyse the same problem from different perspectives with each layer processing the board/information by different criteria.

Choose legal moves, identifying uncontrolled areas, Tracking for how long specific area has been occupied, are just few out of 48 layers of neurons used. Each of them uniquely evaluates the board configuration which has been rendered into the DNN as a 19x19 image (Dyster et al., 2016).

The network was then trained by a novel combination of supervised and reinforced machine learning strategies, that allowed it to make future predictions based on previously observed patterns without being explicitly taught how to do so. First, the program was instructed on moves by experts and exposed to over 30 million positions (Dyster et al., 2016). This allowed the AI to build decision policy network, that could predict and mimic Go professional’s tactics. Then the machine went through reinforced learning process, by playing against itself more than 1 million rounds with the goal to maximize the chance of winning against its previous policies (versions) (Dyster et al., 2016). Running regression on the data collected from self-play created its value network, for predicting game’s final outcome based on any board position/configuration previously encountered in the reinforced learning training.

With its policy and value networks combined with a new optimised search algorithm Alpha Go was able to evaluate potential moves and choosing the best ones to play.

As a result, Alpha Go’s version Lee defeated the 18 times world’s Go champion Lee Sedo,l in a 4–1 series in March 2016, marking a new era when artificial intelligence have outperformed humans in the most complicated game ever played (DeepMind, 2018).

The program’s improvement did not stop there however, and in 2017 the latest version Alpha Go Zero was released.

Its main difference was that, contrary to previous versions which were trained on thousands of human games, it skips the supervised learning phase and simply learns by playing against itself only. In this novel form of reinforced learning where Alpha Go Zero becomes its own teacher, the program starts from zero, not knowing anything about the game and playing completely random (Silver et al., 2017). As it plays against itself, it uses one neural network, combined from previous 2, which is constantly updated to predict moves and eventual winner. Afterwards new, better version of the program is created by recombining the search algorithm with its updated neural network. This process is then repeated thousands and millions of times, that keeps improving Alpha Go Zero’s performance and accuracy of its neural network (Silver et al., 2017).

In only 3 days of self-learning, it won against the Alpha Go Lee version by 100–0. And after 40 days, it defeated Alpha Go Master version 89–11, which is known for beating Go’s top player Ke Jie in May 2017 (Silver et al., 2017).

Such an amazing feat was achieved due to the fact that the machine was not limited by the human knowledge and errors embedded in their games. Over millions of games played against itself in the matter of days, AlphaGo Zero it matched and outperformed the knowledge and experience accumulated by humans over 2.5 thousand years, coming up with completely new creative and unconventional strategies.

Closure

Looking at the past we can say that history of AI is in a way history of knowledge representation and game competitions against humans. The latter was a big driver for the research progress and achievements of Deep Blue’s, Watson’s and Alpha Go’s represent the evolution of approaches toward machine problem solving, marking major milestones in the AI progress.

Alpha Go Zero’s achievement marked the start of a new era which I call Revolution (2017 — current day). I will cover this era and its major milestones in future article acting as a sequel to this article.

Note from the author

Please note that the contents of this article are excerpts from my scientific paper “To be or not to be — linking Artificial Intelligence with Strategic Decision Making — the Analyses of United Kingdom’s AI scene” originally published in April 2018.

How to reference this article:

Lukowicz, Marcin. (2024). History of Artifical Intelligence (AI) — defining key milestones from 4th century B.C. to 2017. [online] Available at: https://medium.com/@marcin.lukowicz/ai-what-artificial-intelligence-really-is-explained-in-detail-through-science-f2ebb32e7188.