SkyNet Journeys [3]: Hardware, Software, Data, Intelligence

18 min readJul 31, 2024

After discussing the Turing Test in the previous installment of this series, let us open today’s chapter not with AI, but with a story from the world of animal intelligence, specifically a creature named Clever Hans. This story posed a riddle to many people in the first years of the 20th century, a riddle that was only answered several years later.

Clever Hans was — a horse. In exhibitions beginning shortly after the turn of the century and led by his trainer, Wilhelm von Osten, Hans demonstrated what appeared to be “human” intelligence by answering questions. Von Osten would ask him questions, and Hans would answer with hoof-clicks on the ground, according to the trainer’s instructions. He could answer questions in basic math, such as addition and multiplication, but also more complex stuff such as “what is the square root of 4?”. There was also a set of non-mathematical questions he could answer— von Osten would assign a letter of the alphabet, and ask Hans to spell his name by tapping out the code for the relevant letters, in order. In this way, Hans both astounded the general public and the leading scholars of the day, and planted in people’s minds the possibility that perhaps mankind is not the only or even the most intelligent species on the planet. Perhaps horses are as smart as we are, and all that stands between us and a higher intelligence that is not human is the barrier of language.

As the horse became more widely known, he attracted the attention of noted investigators, and von Osten agreed that they should come and examine the horse for themselves. From what I have read on the subject, von Osten was not a fraud or anything of the sort — he was genuinely enthusiastic about the fact that his horse was so smart, and that he had been able to teach it so much, and so was happy to show it off wherever and whenever he could. In the bottom line, the horse posed a puzzle: since Hans was capable of displaying intelligence in such a convincing manner, does this prove that he was intelligent in the human sense of the word? And conversely, if he was not really that smart, how was he able to answer such a wide range of questions with such high precision?

This puzzle plagued the minds of the men of that era until it was finally resolved in 1907 — we will reveal the end of the story later in todays installment. For now, I trust you see how this story relates to our season, for what happened here was very much like applying the Turing Test, only to horses instead of computers.

Let us return to 1950, smack in the middle of the twentieth century, when Turing proposed the Turing Test for the identification of artificial intelligence. As I mentioned in general in the previous chapter, Turing also gave his own prediction for meeting this goal in the same paper:

I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent, chance of making the right identification after five minutes of questioning.

Turing’s estimates were a bit optimistic. It took in the end 75 years, not 50, to get where we are today, and as far as his estimates of storage capacity are concerned, he was off by several orders of magnitude. A billion bits is half a gigabyte — about the size of a movie you might download from the Internet, while models like GPT and its ilk require a thousands times more space to run. But when compared to other predictions of people in the past on various subjects, the truth is that Turing’s estimate was quite impressive in its accuracy. Think of it that in Turing’s time the computer world was in its infancy, so he could never have imagined the scope of what has been built here since.

To connect the Turing test to the principles of AI today, let us now devote a little time to a quick survey of some milestones along the way from there to here. Steps that have taken us from that infant stage to the more adult state we are in today in the field of computer science.

The world of computing began long before Turing. If we define “computer” as a device that performs calculations mechanically, then such things were already in existence in the nineteenth century. At the time you would find not electrical systems but mechanical ones — that is, all calculations were performed by a brilliant organization of physical components and physical interactions between them. For example, at the end of the nineteenth century Albert Michelson, a Polish-born American, developed an instrument capable of calculating the Fourier Transform, one of the most central analytical tools in the field of signal processing. It was a completely mechanical device, using gears, springs, pendulums, and other components to automatically calculate sines and cosines.

I will add here a link to a video for those who would like to see how it works — even if you have no idea what a Fourier Transform is, there is pleasure in seeing this device do… something… in an elegant manner. In any case, it was only much later, around the time of World War II, that digital computers based on electrical signals began to emerge — and even there, the electrical flow simply replaced the forces of gravity, friction, and the like. In modern parlance, one could say that all the computers of that era were all hardware-based. Just as you cannot take a knife and turn it into a spoon, so every computer did one thing only, which was defined by its physical structure.

The truth is, our generation too was privileged to know and use such hardware-based computers, such as the plastic calculators that are sold to children today in toy stores or when they enter grade school. These are capable of nothing more than simple arithmetic. I was curious, so I checked — the first such device came out in 1971, produced by Busicom and sold at a price of $395, no less, and was considered a luxury item for a while.

The capabilities of the 1971 calculator were all determined by its hardware, and for the machine that computed the Fourier transform in the nineteenth century with pendulums and springs there is no question that this was so. But in the next stage, alongside the hardware, software appeared. The hardware was designed so that different software programs could be installed on it, each capable of performing radically different functions. Multiple software programs could run concurrently on a single hardware platform, and frequent software upgrades could easily improve functionality. This and more was made possible only by this separation of hardware from software. Of course, to enable this change, revolutionary discoveries and innovations were required in the ways in which hardware was built, the invention of an entire domain of programming languages in which software is written, and so on and so forth.

When we turn our attention to these programs, we will immediately realize that they can be classified along many axes, and for our purposes I would like to focus on one: the interface, or relationship, between the program and the data that the user inputs into the system, and how the program’s function is dependent upon and affected by that data. At one end of the spectrum are programs that respond to the user’s data in a localized fashion: at the time of use, and no further. An example of such a program is the simple calculator, which knows how to solve mathematical questions by using universally known formulas. This calculator, of course, applys the same exact logic on any computer or smartphone it is installed on — if it gave different answers to each user on different devices, I think we would all uninstall it with the feeling that it had a serious bug. Another such program is the word-processing program Word — each user will type different content on it, but the interface is identical on every computer and unaffected by the question of whether one is typing a poem, a novel, or a rough draft of a Medium post.

At the other end, however, there are also systems that are considerably impacted by the data they are exposed to, data that comes from the user and from the world. For these systems, it is precisely this dynamism that is their strength. Consider a Google search — there is a sophisticated layer of software that helps to find and present search results, but the results it presents are influenced on the one hand by the user’s search history and on the other by the information that is out there, i.e., what the vast database called the Internet provides Google. Another example familiar to all of us is a service like Netflix — the shows and movies we choose to watch today are subjected to an automated analysis, from which the app will infer what shows we will want to watch tomorrow. What Netflix will suggest to me will be different from what it will suggest to you — and unlike the case of the computer calculator app, we tend to be pleased with this behavior.

And herein lies the connection to artificial intelligence. For what do we have, after all, in these Google and Netflix programs? Computer programs that know how to look at data, and to adjust their behavior accordingly. And consider this: when people do something similar— that is, observe reality, absorb the new information and adjust their behavior accordingly— what do we call this process? We call it — thinking. A person who can impressively and efficiently adapt to what he sees around him will be considered an intelligent person, and the more impressive and effective the adaptations, the more we will appreciate their intelligence. And so intuitively we can understand if someone would propose to us that “systems that can do such things are intelligent”.

Let us elaborate on this point. Two powerful analytical tools that we humans apply are induction and deduction. Induction is the process of identifying patterns from individual elements and formulating rules that capture those patterns, while deduction is the process of applying those rules to particular cases and drawing operational conclusions. An example of induction is to see the sun rise every day for a month and conclude that this is a general rule, a natural law that will continue on in the future. An example of deduction is to use the rule that the sun rises every morning in the east to infer that tomorrow, a few hours from now, the sun’s rays will find their way into my room through the eastern window, penetrate the curtain and wake me from my slumber.

These two abilities, induction and deduction, would be considered by most of us (possibly all of us) a critical part of what makes someone smart or intelligent. In addition, you will be pleased to hear that one can use the Turing Test to prod whether the interviewee can perform such logical operations. One simply asks the mysterious figure on the other end of the chat a suitable question: say, one gives it a simple series of numbers and checks to see if it is able to identify the recurring pattern by induction. Alternatively, one might ask it a riddle that requires the application of prior general knowledge to a particular case, in a way that demonstrates deduction. And so, based on both our intuitions and the Turing Test, it will not surprise us to find that many AI systems try to incorporate these abilities.

[Side note for those of you who come from the AI field: if you have not thought of it before, consider this — deduction can be thought of as the logical analogue to Forward Propagation, while induction is the logical analogue to Backward Propagation. I’ll leave you to ponder this connection in your spare time. Back to the main story!]

Since we have already introduced two concepts here, let us add a third, another word that can be used to describe the processing of new information into existing knowledge: learning. When a baby girl is born, her brain is empty of information about Paris and planets and advanced medicine. Throughout her life she will be constantly confronted with the reality around her and will learn new things, which will enable her to deal better with her environment. Induction and deduction are tools that we use in that process.

As with induction and deduction, let us ask — how does “learning” square with the Turing Test? Well, it is quite reasonable to assume that no computer could pass the Turing Test without a basic learning capability, since if at the outset of the conversation I told it that I was allergic to peanuts, without any learning it could inadvertently ask me immediately thereafter if I liked peanut butter, and thereby rather blatantly reveal itself to be a computer rather than a human being.

In the beginning of this post I was a bit hard on Turing for being overly optimistic in his predictions of when the test he described would be passed. But let me tell you, the man was a genius and a visionary, and this insight, regarding the importance of learning ability, he articulated already then. More than that — he proposed that this would be the foundation for building a machine that would pass his test. In a talk he gave to a few scholars in 1951 but which was never published, until it was found in his papers posthumously, he said the following — I am quoting selected passages here with some omissions:

If the machine were able in some way to “learn by experience” it would be much more impressive. If this were the case there seems to be no real reason why one should not start from a comparatively simple machine, and, by subjecting it to a suitable range of “experience” transform it into one which was more elaborate, and was able to deal with a far greater range of contingencies. This process could probably be hastened by a suitable selection of the experiences to which it was subjected. This might be called “education”… I suggest that the education of the machine should be entrusted to some highly competent schoolmaster who is interested in the project…

What Turing describes here is the process of induction that we have been discussing: learning from experience, extracting general rules from a set of particulars. Later, he even shares some thoughts he had about how this learning process would actually take place, and I will now read you his quote on the subject. For those of you who know how AI systems are trained today — “training” is the modern term for what Turing called “education” — if your mouth is open, don’t close it, because you’re going to be gaping again momentarily. And this is what he said in 1951:

I suggest that there should be two keys which can be manipulated by the schoolmaster, and which represent the ideas of pleasure and pain. At later stages in education the machine would recognise certain other conditions as desirable owing to their having been constantly associated in the past with pleasure, and likewise certain others as undesirable. Certain expressions of anger on the part of the schoolmaster might, for instance, be recognised as so ominous that they could never be overlooked, so that the schoolmaster would find that it became unnecessary to “apply the cane” any more.

As promised, those of you who are familiar with today’s AI systems will find themselves excited by this quote, for Turing here foresaw one of the mechanisms that are used today to train such systems. You take an empty AI system, present it with data, and ask it to respond. For example, you present it with an image and ask whether it contains a cat. The system guesses something, and is given feedback — “You were right” or “You were wrong”. The system takes this feedback and makes adjustments to its internal configuration to reinforce the considerations it employed where it was right and to modify its behavior in cases like those where it was wrong, and so as it receives more and more examples, its performance improves. What is actually happening here, then, is a process of induction, of generalizing from details— of learning.

Let us connect this idea to what we saw in the previous chapter about the Turing test. I am going to do a bunch of hand-waving here, but bear with me — it’s the principles that are important here.

Let us suppose that we have succeeded in building such a machine, specifically one that knows how to use human language and also understands human language. If we want it to pass the Turing test, this is how we’d do it: we’d give it a lot of examples of dialogues between two human beings, and let it guess what the appropriate response is in a given segment of the conversation. If the answer provided by the system is convincing — something that a normal person would say in that situation — we will give it a star sticker in its notebook, activate the pleasure mechanism that Turing spoke of, so that it will know to persist in such answers. On the other hand, in case of a mistake, we will send it to go to “timeout” and tell it what we expected it to say, so that it will learn for the future. And if we do this often enough, then since it is a learning machine, over time it will know how to sound like a human being. Turing test — here we come!

But — wait a minute…

Do you realize what we just described here as the learning process of AI systems today?

We have in fact described a continuous Turing Test. Consider the following alternative way to phrase the process I presented above:

Administer the Turing Test to the system
If it succeeds — we’re done! Huzzah!
If it fails — we give it feedback and repeat.

In other words: the entire training process of AI systems like ChatGPT is geared towards building machines that are capable of adapting themselves well enough to our expectations. They use induction to extract patterns, and use deduction to demonstrate the fruits of their learning to us. And thus, what we have basically arrived at is an entire process of training an AI system that is aimed at one thing: passing the Turing test.

Cool, right?

So, what do we think of this concept? It is certainly brilliant, but does it have any drawbacks or flaws? Well, try to look back at your high-school years. Remember when we were studying for exams, and we would complain that it was no fun to study, and that we were strategizing our studying with the singular goal of passing the exam, not to really understand the material? Do you recall how that strategy impacted the manner and depth of our preparation?

Well, this is exactly what is happening here. The process I have described, of a learning machine, is a process in which the machine is judged not by the depth of its understanding, but by its performance on a test given to it. True, as its teachers, we hope that the good test results reflect a deep understanding of the material, and we will attempt to construct tests that ensure that is the case. But in the final analysis, we reward the AI system only for test passing and not for “understanding”, which is a more difficult thing to measure directly. And now that we realize this fact, it should give us pause to wonder whether we are being had. Perhaps this “learning” thing we are seeing is not really what we thought it was.

This concern is well-founded, AI professionals are well aware of it, and we will touch on it again later in the series. In the meantime, to conclude this chapter, I would like to give here two examples where this concern was warranted. The first of which is, of course, our beloved horse, Clever Hans. What was the end of the story — you are probably dying to know!

Well, the matter was clarified in 1907 and documented in a report by Oskar Pfungst, a student at the Psychological Institute of the University of Berlin, who conducted a series of experiments with Hans. He concluded that Hans, in fact, understood nothing of arithmetic and was totally illiterate. Instead, Hans was responding to very subtle, probably involuntary, physiological cues from von Osten and the audience at his performances. He sensed when his trainer and the people around him were beginning to get excited, and based on that knew when to stop his tapping. The experiments that proved this were quite creative— for example, they made sure to ask Hans the questions without an audience, and with his coach hidden behind a screen. Perhaps the coolest experiment I have heard of was one in which Hans was asked questions that the questioner did not know the answer to — in such a situation, the questioner’s body language could not give Hans a clue as to when to stop. Cool, right? I mean, less cool than if Hans really could think like a human… but still cool.

And by analogy to our story — in the years prior to this revelation, people must have thought that Hans was passing the Animal Turing Test with flying colors. Even his coach believed this, and even after the truth came out no one suspected him of having deliberately deceived people. But in retrospect it became clear that Hans had indeed learned something — but not the things people thought he had learned. His success was like passing a test with a cheat sheet — there is no doubt that it takes special skills to cheat on tests, so I am not belittling his ability to read human body language; but the lesson from this story to us is obvious: that when a machine claims to be “intelligent”, we should stay vigilant and make sure that the machine is the one that is intelligent, not the human operators, who are providing overt or covert hints to it about how to act. And as I said — AI professionals are quite familiar with this problem and its many variations, such as data leakage, overfitting, and so on. Accordingly, they have developed multiple ways to address and defend against it.

So that was one story — and here is another, closer to the world of AI. In 2015, the internet was abuzz: Google had released a few months earlier a cool feature — the company’s AI algorithm went through photos in Google Photos, and automatically tagged all kinds of entities there. For example, tagging every picture to note which have cats, cars, and so on. A nifty feature, right? So what was the fuss about?

Well, it burst into the twittersphere when an African-American named Jacky Alciné tweeted that he was shocked to discover that the algorithm had tagged pictures of him and his partner as containing… gorillas.

Now, of course, no one then or now thought that Google had deliberately decided to classify African-Americans as gorillas. Google simply failed to notice the problem that had arisen incidentally from their work on the algorithm.

And here, too, the root of the matter is the same as what happened with Clever Hans. Google fed a lot of tagged images to their AI system, and hoped that it would figure out on its own how to identify different animals. Of course, “figure it out on its own” hides behind it a great deal of sophistication and depth of insight into how to build such a thing, so I am not trying to minimize the genius of the system. But bottom line, that’s what it was. They hoped that the machine would learn to accurately identify a gorilla. After all, humans don’t confuse them with people, so why should the AI system? Also, other animals were identified with high accuracy, and add to that the fact that Google’s AI was exposed to many times more images than the average person… it was a reasonable assumption that it should succeed at a task that any child would succeed at.

But without their noticing, the machine had learned something akin to the rule that “any humanoid face with a black hue is a gorilla”. Their test results were impressive, but not comprehensive. They had tried to square the circle, and in the corners of the square they had not touched, other insights had become embedded into the machine, insights that led to the aforementioned controversy.

So here again we see one of the weak points of the approach we have been discussing today: we have extensive control over the type of data we feed the machine, but our control over the conclusions it draws from that data is still severely limited. And so, with these limitations come the unpredictable behaviors of these devices we have built. These challenges overlap with an important domain of AI Safety and AI Alignment, which we will hopefully discuss later in this series — but that is for another day.

To conclude this example, it is worth adding that while in 2017 everyone probably thought that they would fix the problem with the face tagging quickly, it turns out to be a difficult problem to solve. This problem has cropped up again and again over the years, not just at Google, and from a brief search online I gather that at least as of 2023, Big Tech still refuses to automatically tag humanoid creatures, such as gorillas and apes, for fear that the tagging would lead to something offensive, as it did back then.

So — That’s it for today! I hope you enjoyed it and learned from it something. For the uninitiated, I’m sure a bunch of things here were new to you, and I hope you found the presentation clear and helpful. As for the AI professionals in the crowd— I want to once more emphasize the correlation between Induction/Deduction and BackProp/ForwardProp. I think that comparison has a lot of power and can be applied when designing your research.

Till next time…

SkyNet Journeys [3]: Hardware, Software, Data, Intelligence

Written by Elisha Rosensweig