CAPTCHA: Using a Black Box Causes More Problems

Published in

Arkose Labs

7 min readDec 4, 2014

Creators of the typed-in CAPTCHA are finally admitting what I’ve been saying for years now: CAPTCHAs cause huge problems. They drive away genuine users and let bots through. If you are a website operator, these CAPTCHAs lower the conversion rate of your online forms as your users get frustrated with twisty letters and leave, increasing the bounce rate of your signup or comment pages.

Recent attempts to kill the CAPTCHA have touted the use of a “black box”: a magical secret bit of code that sorts the users of your site into groups. If the user is put in the group deemed “probably not a bot”, they get no challenge, or one that is not very secure. If the user is put in the group deemed “probably a bot” or “not enough information to decide”, the user gets the old, nasty typed-in challenge that stops both bots and people from continuing.

This black box would be wonderful if it actually worked, but this idea seldom pans out, and I’ll try to explain why. As a website operator, you should ask some hard questions about any spam-blocking solution that relies on a black box.

Will the black box mistakenly treat genuine users as bots?

When the black box mistakenly sorts genuine users of your website into the group “probably a bot”, that is called a false positive. It’s like a medical test that mistakenly says a patient has a disease. Maybe the user’s IP address was used in the past by a bot. Maybe their system is compromised. Maybe they’ve gotten a lot of CAPTCHAs wrong in the past for their own legitimate reasons. Maybe the user was put into a “blacklist” database by mistake. They will probably never know why they are suspected as a bot, and never know how to fix it.

If a user of your site is sorted by the magical black box into the “bot” group, the user gets blocked entirely, or gets a CAPTCHA challenge just as nasty, frustrating, long, and difficult as ever — or even more so. Those users have a big chance of bouncing away from your site. And you’ll never know you lost them. Your site is like a hot-air balloon with a hole somewhere up there. Your signups and comments are not rising as fast as you think they should, but you don’t know where the leak is, or how to fix it.

Many developers are talking now about their bad experiences with the black box mistaking them for a bot, followed by impossible-to-solve puzzles. Their concerns are rising rapidly.

Will the black box mistakenly treat bots as genuine users?

When the black box mistakenly sorts a spambot visitor to your website into the group “probably not a bot”, that is called a false negative. It’s like a medical test that mistakenly says a sick patient is disease free. Maybe the black box is simply not very accurate. Maybe the bot has been deliberately written in a way to appear human. Maybe the bot is cleverly using the resources of a genuine user, like a ghost hovering over their shoulder. This all happens because spammers are determined, and bots can be adapted to fool the black box. The history of computing tells the story of this arms race over and over again, and the black box always loses.

If a bot visiting your site is sorted by the magical black box into the “genuine user” group, the bot gets no challenge, or a trivially easy challenge, such as ticking a box. It’s then very easy for the bot to pass that challenge, and get into your site free and clear. A bot that succeeds will usually signal this, and a torrent of bots will then come rushing in. Your site can get filled with spam overnight, taking weeks to clean up. Sometimes even the very creators of a black-box defense are getting hit with spam!

Again, many developers are talking now about how black boxes can be deeply analysed, which should allow spammers to design bots to get through the black box CAPTCHAs and create lots of spam on their sites.

Will the black box require users to use the internet a certain way?

When the black box sorts genuine users of your website into the group “not enough information to decide”, it has to assume the user is a clever bot, which creates all the problems of the false positive I described above. But why can’t the black box tell? You have to ask and experiment to figure out why. Many developers have already found this depends on the user’s browsing history, or cookies, or on whether the user is logged into a particular service. As one developer put it, these black box CAPTCHAs are a good way to test how much a company knows about you. It can depend on whether the user is running particular anti-snooping software, or using a browser that’s not very common. You may find that your most interesting and valuable website visitors also happen to be the kind of people who resist using the internet in a conventional way. Why drive them away just because they are not doing what the black box wants them to do?

Even if you personally find all this a bit paranoid, you have to consider how to accommodate your customers who have these concerns. Many find it creepy to find on your site a chunk of code that relies on knowing a huge amount of information about your user — what this observer called the “panopticon” and this one called a “habit of overstepping the limits of what consumers will allow it to learn about them”. They see it as trojan code that can be updated and changed without your knowledge by a company that openly says that it wants to thoroughly track user behavior across the web.

What’s the alternative to the black box?

An alternative to a black box is a transparent box, aligned with the open source ideal. For example, our alternative solution FunCaptcha blocks spammers without resorting to a black box. FunCaptcha is open (if not quite open-source) about its inner workings, and if you try FunCaptcha for yourself you can probably figure it out anyway. At the heart of FunCaptcha is a visual puzzle that is impractical for spammers to attack. (I’ll post more about that later, and share the positive things that security experts have said about FunCaptcha’s approach — it’s a whole other fascinating subject.) FunCaptcha will change the nature of its challenge based on a user’s history, but most importantly, that judgment is easy for you to understand. Furthermore, even if that judgment produces a false positive or false negative, there’s no harm done. A bot mistaken as a genuine user will still get stopped, and a genuine user mistaken as a bot will still get a challenge that is quick and easy to solve. All this sidesteps the secretiveness that makes the black box approach vulnerable.

If a bot tries to randomly guess its way through FunCaptcha, its odds of getting through are low — much lower than the chances of a bot getting through a typed-in CAPTCHA. The IP address of the user may be suspect, because it is on the Stop Forum Spam list or it has gotten FunCaptcha more often wrong than right in the past. If the IP is suspect, the FunCaptcha challenge becomes a little longer — more images to turn the right way up, or faces to move into the middle. When that happens, FunCaptcha’s completion rate remains extremely high — far higher than the completion rate for typed-in CAPTCHAs — and fifteen seconds long on average.

If a user’s IP address has a clean history, and your site’s FunCaptcha security setting is “Automatic”, then the user will get a short challenge — it could be just one image to turn the right way up, or face to move to the middle. On average that short challenge takes less than five seconds to complete. FunCaptcha is slated for a feature that makes it even easier and faster for an IP that has gotten FunCaptcha correct a few times in a row. At the easiest level, the user will get a “free pass” challenge: one click, with no wrong answer. This will make FunCaptcha’s completion rate even higher, and let users through even faster.

To put it simply…

There is an alternative to the magical black box: a transparent process that the creator is happy to explain to you, or you can quickly figure out by playing with the solution yourself. Make sure that when a spam-blocker sorts a visitor correctly, bots get stopped and users get a very easy and fast challenge. Even if it sorts a visitor incorrectly, you should be assured that bots can’t get far, and genuine users get a quick challenge with an extremely high completion rate. Don’t rely on black-box solutions made by companies that track every aspect of a user’s behavior. Don’t take a chance that your users are being quietly categorized as bots and subjected to terrible typed-in CAPTCHAs, making them leave. Don’t take a chance that a clever botnet will be given trivial challenges, flooding your site with spam. Use a solution that does not rely on a black box.