The “IDENA AI” with 60% accuracy at solving FLIPS (Part 1/3)

Jan Moritz
3 min readJul 26, 2020

--

This is the first of a series of three articles in which I will provide some insight into the development of my FLIP solving AI. For those who are not familiar with the topic FLIPS are “AI-resistant captchas” that were designed by IDENA in order to provide a Blockchain based proof-of-person solution. To prove that FLIPs are AI resistant the IDENA Team issued a first $25,000 AI challenge that was latter replaced with a $55,000 AI challenge.

What is a FLIP?

FLIP stands for “Filter for live intelligent people” and as mentioned before FLIPs are “AI resistant captchas” and have some particularities:

  • They have binary answers, this means there are only two possible responses: LEFT or RIGHT.
  • They are generated by other humans, and not by a computer and therefor cannot be called captchas as CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”
  • They require a certain level of chronological reasoning. The assumption being that current AIs are not capable of logical reasoning.

Images are better than words, so bellow is the screenshot of a FLIP:

In the case of the FLIP above the correct answer is LEFT, as this is the only logical story. Now, obviously solving one FLIP is not enough to provide the evidence of being a human as the probability of having the correct answer is 50% when randomly picking a response. This is why during a typical IDENA verification session users have to solve 31 FLIPs. The probability of randomly picking the correct response for all of them is only one out of 2 billions.

Screenshot of an IDENA verification session

Let’s go back to the AI part, the most intuitive approach to have an AI solve FLIPs is to label the different images. I have tried several providers and I decided to use Google AI Vision as it provided the best results.

Once the images are labeled with keyword it is still necessary to find the order in which the images make sense. The obvious approach here is to use machine learning with a neural network, and this is what I ended up doing, but I first went for a more straightforward approach.

Having analysed many FLIPs I noted some patterns, and one of them is that similar images tend to follow each others. This can be verified with the example FLIP above, when labeling the images using Google AI Vision this is what we get:

I outlined the keywords that appear in more than one image

To prove that this approach works I wrote a little python script that would apply this methodology on the 12,422 FLIPS that I was able to collect from the IDENA block explorer and labeled using Google AI Vision.

The results was better than what I first expected. The model provided the correct answer for 5689 FLIPs, an incorrect answer for 3354 FLIPs and no answer for 2792 FLIPS. Assuming that we pick a random answer for the FLIPs the model has no answer for this gives us a total accuracy of about 60%.

correct: 5689
incorrect: 3354
draw: 2796
correct percentage: 59.86147478672185 %

Conclusion

This is simple demonstration outlining the weaknesses of FLIPs. I will publish 2 other articles in which I will describe how I was able to get much better results using two other approaches that gave me an accuracy of up to 76%.

Note that although I was able to reach this level of accuracy I was not able to win any of the cascade prizes from the IDENA AI challenge because the us of Google AI Vision is unfortunately not allowed by the current challenge rules.

Python script

--

--