Evaluating artificial intelligence: From Turing test to now

Vuong Nguyen
3 min readFeb 7, 2018

--

bgr.in

The Turing Test

Created by Alan Turing in 1950, the Turing test was designed to determine if a computer is “intelligent”. In other words, it tests a computer’s ability to exhibit intelligent behavior. At first, the test creates an imitation game between a human and the machine. The human judge will be evaluate the natural conversation without knowing which is the machine designed to generate human-oriented responses. If the human judge is unable to distinguish between real human and the machine, the machine is said to pass the test.

techtarget.com

The problems with Turing test

The Turing test was said to be simple and reliable. However, there was not a measurable metrics to evaluate the artificial intelligence of a machine. There wasn’t a specific number of interaction or exchange of sentences; there wasn’t a specific qualification being evaluated. It was a game in which a machine and a human try to trick a fellow human into thinking the text-based conversation was “natural” and human-oriented.

I find the Turing test rather problematic because it only bases on a human perception without a judging rubric. Think about it this case when you’re one an online group conversation with two other fellow humans. If their sentences seem to be incoherent, full of typos or sometimes meaningless, you may be thinking this person is drunk or not in the right stage of mind. You are less likely to judge if the two individuals are thinking properly, because human-like intelligence is much more complicated to measure.

Let’s look at a different experiment with Eliza — a computer program developed by a MIT computer scientist Joseph Weizenbaum. Eliza simply follows a scripts to interact with a person, similarly to modern intelligent agents such as Google Assistant or Alexa. For example, Eliza starts with “Hello, I am Eliza. How can I help you?”. If you reply with “I’m reading a book about world peace.”, Eliza will say: “How long have you been reading a book about world peace?”, and the conversation goes on. Eliza seems to show sympathetic responses by simply recognizing patterns in human language and responding accordingly. Eliza will says “Please go on” if it doesn’t recognize the sentence. The language processing of Eliza is much different from that of modern intelligent assistants. Siri or Alexa actually process our sentences to extract meaning, in order to provide a meaningful answer. Eliza doesn’t understand us. Yet most people took it seriously, thinking about its responses. To us, Eliza exhibits certain sympathetic behaviors. However, it doesn’t have human-oriented intelligence.

The Tokyo Test

A group of Japanese AI researchers is determined to create a program capable of passing the nation’s college entrance exams — including the entrance exam at the University of Tokyo. The goal of the Tokyo test is to trick college admissions officers into thinking that it’s a human. This test is considered extremely difficult since it would have to think across a wide variety of disciplines, from history to math to reading comprehension to writing essays. The Tokyo test is an adaptation of the Turing test, yet on a higher level of intelligence, perhaps even higher than that of human. Perhaps this goal is still far from reality, but it sparks a new discussion on evaluating artificial intelligence.

What make us “human”?

Perhaps what differentiates us from computers are the ability to express emotions, personality, beliefs and creativity. There have been many many research and projects to design artificial emotions and personality. Social robots are now able to recognize speech, gestures and human-like behaviors in order to interact with us naturally. Some of these traits which are considered parts of our unique intelligence are still very difficult to build with current technology. However, we’ve gotten very far from the Turing test, when we only consider artificial intelligence as text-based responsiveness. I predict in a near future, we will be evaluating the social intelligence of machines.

--

--