AI-resistant captchas: Are they really possible?

Published in

Idena

10 min readMay 8, 2019

Nowadays it is often hard to tell whether there is a human or a bot online: An AI can act and communicate almost like a person and easily solve captchas that are supposed to be a shield protecting from robots. This happens because the existing captchas are not AI-resistant.

An AI-resistant captcha should be a reverse Turing test, which cannot be passed by a machine but can be easily passed by a human.

Is it possible to construct an AI-resistant captcha that could be put into practice? Is it possible to resist the dynamic development of AI connected with neural networks and deep learning?

Actually, the possibilities of deep learning are overvalued, and the range of potentially solvable tasks is limited. Such AI-hard problems as understanding the meaning of a text, reasoning, and using common-sense logic are beyond deep learning. Over the past 60 years of AI development, progress on these tasks is close to zero. One of the founders of deep learning, Yoshua Bengio, draws our attention to this:

I think we need to consider the hard challenges of AI and not be satisfied with short-term, incremental advances. I’m not saying I want to forget deep learning. On the contrary, I want to build on it. But we need to be able to extend it to do things like reasoning, learning causality, and exploring the world in order to learn and acquire information.

Let us try to use the gap between narrow AI and human common sense to model an AI-resistant captcha.

Required properties of an AI-resistant captcha

We have identified the characteristics that an AI-resistant captcha must possess. The following sections describe them in detail.

1. Belonging to the class of AI-hard problems

We assume that for a captcha to be called AI-resistant, it should not fall under the class of “recognition” problems, which are solved, for example, by neural networks. Instead, the task of solving a captcha should fall into the class of AI-hard problems, such as understanding the meaning of a text, using common-sense reasoning, and so on.

An AI-resistant captcha should address people’s abilities to understand and interpret information. It should be similar to the way people interpret what they say to each other in the process of communication and are able to read the unsaid “between the lines.”

2. Created by humans

A crucial feature of an AI-resistant captcha is that the test should not be created algorithmically. That is, in contrast to the well-known Google reCaptcha and other captchas generated by algorithms, an AI-resistant captcha must be created by a human. Only then will a captcha not belong to the class of “recognition” tasks, will require for its solution an understanding of the meaning implied by a human, and, consequently, will be truly AI-hard.

Thus, an AI-resistant captcha is essentially an AI-resistant reverse Turing test that is not a captcha, strictly speaking, as the word captcha stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.” But as the term captcha is widely understood as any test that allows telling computers from humans, we will continue using it further in the article in this general meaning.

3. Unpredictability and an infinity of a possible set of captchas

The range of possible tasks should not be limited (similarly as in the tasks of understanding the meaning of a text, where there can be an infinite range of texts and meanings).

Free semantic content of a captcha will provide an endless number of AI-resistant captchas. To realize this, it is necessary to ensure the highest possible non-repeatability and unpredictability of meanings in the messages conveyed by AI-resistant captchas.

4. No major systemic vulnerabilities

A vulnerability in the structure of an AI-resistant captcha can make it possible to solve the problem algorithmically with the help of a narrow AI, without solving the AI-hard task. In this case, what we mean is not the vulnerability of one single captcha, but a systemic vulnerability, which allows the algorithmic solving of hundreds of thousands of captchas with high probability, above 80 percent.

Risk factors:

Limited or predictable range of meanings of AI-resistant captchas. Since we assume that the semantic content of a captcha is provided by people, the human factor may cause a homogeneity problem with the generated captchas. It is important that people, creating AI-resistant captchas, do not adhere to one template, but use fantasy in the context of a unique situation.
The availability of a large training data set that allows the building of a model to compare a given captcha with the training set. In this case, the training set should cover most of the possible semantic messages of AI-resistant captchas.
The form of the captcha allows the solution to be found without an understanding of the semantics, for example, on the basis of statistical language models.

Desirable characteristics of an AI-resistant captcha

Apart from required properties, there are some other characteristics which will allow for the wider and smoother application of AI-resistant captchas. Let us call them “desirable” and consider them in the following sections.

Marginal accuracy. The marginal accuracy of solving a single captcha using a narrow AI should be less than 80 percent. We want to have a captcha that can be solved with a probability higher than or equal to 80 percent (that is, for example, at least 16 out of 20 captchas) only by a human. It would be very desirable that the probability of an algorithmic solution of a captcha does not exceed 50 percent — the probability of a coin flip. However, in practice, anything under 80 percent can be considered a good result, because running a series of tests reduces the final probability of passing the test to an acceptable level. By acceptable level, we mean that the AI will not be able to pass a series of tests with a probability higher than 50 percent. For example, a successful solution of a minimum of 16 captchas out of 20 is 48 percent, which is an acceptable result for us.
Internationality. A person speaking any language should be able to solve an AI-resistant captcha.
Сlarity. An AI-resistant captcha is created by a human. People may have various world views and ethnic and social backgrounds. Despite all the differences, it is desirable for AI-resistant captchas to remain clear for any adult person irrespective of their background. Also, a captcha is not an IQ test, so the meaning of the messages embedded in an AI-resistant captcha should not be difficult for an average person to understand.
Ease of creation. It should not be difficult for a human to create an AI-resistant captcha.

It would be great to come up with a universal AI-resistant captcha, which can be easily solved by a person with disabilities or equally easy by people living in various socio-cultural environments. Despite this captcha property being desirable, for now, we leave out this requirement.

Examples of AI-resistant captchas

Though the vast majority of existing captchas cannot be called AI-resistant, we suppose that following the required and desirable characteristics proposed in this article it is possible to create a test that will allow us to tell humans and robots apart. Below we will consider one of the existing tests and recommend how it could possibly be reinforced.

Winograd Schema Challenge

In terms of the chosen criteria, the Winograd Schema Challenge (WSC) seems to be the closest to the AI-stable captcha concept: As stated by the authors, the test requires common-sense reasoning.

Consider the following WSC example:

The trophy would not fit in the brown suitcase because it was too big. What was too big?
1. The trophy
2. The suitcase

The highest known probability of solving a WSC test using a narrow AI based on a language model trained on 40 GB of text is 70.7 percent. A human successfully solves a WSC test in 92 percent of cases.

However, due to its textual representation, the challenge does not meet the criterion of internationality and thus has drawbacks that do not allow its wide use as a reverse Turing test in practice.

Winograd Schema Challenge 2.0 (Flip Challenge)

How can we fix the Winograd Schema Challenge’s problems? Let’s take the typical task of the WSC and try to get rid of the textual representation:

The trophy would not fit in the brown suitcase because it was too big. What was too big?

The meaning of the phrase is that the trophy does not fit into the suitcase. How can we encode the meaning without using text? We all “read” comics without text in our childhood. So, we could use pictures.

We can search for pictures on the Internet. It is not that easy to find a picture in which there would be a trophy not fitting in a suitcase. Instead, you can find something like this:

We could additionally come up with the context: why we actually encountered this problem of fitting a trophy into a suitcase. Suppose someone won a cup at a sports competition, then returned to the hotel room, began to pack a suitcase to go home, and faced the problem of the sports cup not fitting in the suitcase. There may be other contexts, but let’s dwell on this. It is likely that four pictures are enough to encode our story.

This is what we’ve got. You can come up with a completely different scenario and find other or even more suitable pictures.

Try to show this picture to people who have not read this article and ask them to “read” the story. The stories of different people may differ a bit, but they all will be able to see the logic.

Now we need to create a multiple-choice challenge from this story. The binarity of the Winograd Schema Challenge is based on the pronoun ambiguity: if the pronoun is wrongly attributed, the message will contradict common sense.

We have got a logical meaningful story in pictures. To create an alternative, we could distort our scenario so that the picture sequence would not make up a story or at least the new sequence of pictures would be less meaningful than the original. Let’s artificially create a semantic contradiction by rearranging the first two pictures. For a human, it is obvious which of these sequences is correct.

There can be many variants of distorting the scenario, but it is important that the incorrect one contains a clear contradiction to common sense and most probably cannot be chosen by a human as a meaningful message.

Now try to ask someone which sequence of pictures has a meaningful story behind it — left or right. Our experiments show that more than 95 percent of people easily flip the correct answer.

We call it the Flip Challenge. Flip is an acronym for “Filter for Live Intelligent People”.

The framework for people creating Flip Challenge tasks

The amount of possible flips is not limited if they are created by people and not by algorithms. However, it is important to provide a proper framework so that people could create new and unique flips each time.

We see setting individual assignments to people as a possible way to ensure the non-repeatability of flips. For example, two random words selected from a large dictionary can appear as an assignment, and the person should come up with a scenario related to these words. The words are a sort of associative hint for stimulating the person’s fantasy and do not impose any hard restrictions on the scope of possible scenarios.

This may seem a step back to the text representation. But we tend not to consider it as such. Separate words can be easily and automatically translated now practically with no meaning distortion. Moreover, this possible distortion of the word meaning will not have any effect as the words are only used as hints to inspire people’s creativity when making a flip. The words in the textual form do not appear in the flip.

Another important issue is creating a good distorted version of the scenario. Sometimes it can be even harder than creating the original story. The only correct way to make sure you have created a good flip is to have it checked by someone else, so the flip authors should be required to test the flips. If any of their friends, having seen the pictures, can unambiguously choose the meaningful scenario within 15 seconds, then it can be considered a quality flip.

Flip Challenge resistance to “narrow” AI

Further tests will show whether this new challenge is truly AI-resistant. We assume that the Flip Challenge will show results no worse than the Winograd Schema Challenge.

Artificial General Intelligence Threat to AI-resistant captchas

The AI-hard problem can potentially be solved with Artificial General Intelligence (AGI). It seems quite safe to assume that AGI does not exist at the moment.

Creating a full-fledged AGI will cause drastic changes in the world — we all know that from science fiction. As a result, many of the current things and values will lose their relevance. At the moment, discussions about AGI seem to be more of a way to attract attention in the hype associated with the active development of deep learning and its introduction into various business areas.

***

As a conclusion, we can state that tests that are widely used for captchas today do not allow to tell robots from humans. To perform this task, a captcha should be AI-resistant, which is to say that it belongs to the class of AI-hard problems, is created by humans and not by algorithms, allows for unpredictability and an infinity of a possible set of captchas, and has no systemic vulnerabilities. The suggested Flip Challenge meets all the criteria and thus in our opinion can be considered as an AI-resistant captcha.