Everything you need to know about CAPTCHA

AlterEgo
5 min readApr 22, 2017

--

Even if some of you are not really familiar with the actual phrase “CAPTCHA”, I’m pretty sure everyone knows what it is.

The program has infuriated web users for almost two decades at this point: simply put, CAPTCHA is that annoying login test that asks you to type in an impossible-to-read sequence of characters, letters, and numbers in order to prove that you’re actually human.

Get just one character, letter or number wrong, and you’ll be denied access — you robot scum!

All jokes aside, before we go further down the rabbit hole, we should take a quick look at the history of CAPTCHA and examine some of its potential benefits first.

According to the article “Telling Humans Computers Apart Automatically”, the word CAPTCHA (Completely Automated Turing Test to Tell Computers and Humans Apart) was coined back in 2000 by a team of Carnegie Mellon University professors, who developed it for Yahoo. If the above-mentioned acronym sounds a bit confusing, it may help to explain what the Truing test actually is. The test is named after the famous professor Alan Turing, and it’s basically a standard test of an AI-based machine.

What is the turing test?

So the CAPTCHA, therefore, is an automated Turing Test — simple enough. The most well-known version of the test presents the user with a scrambled form of text, assuming that any average human being will able to decipher the text.

CAPTCHA in practice

For years, CAPTCHA has been viewed as a necessary evil by most internet users. And why is it so necessary?

Well, many websites out there require users to confirm that you are human in order to prevent brute-force attacks. These are choreographed actions from automated bots that overload a server with traffic.

Mainly, the program is there to protect the websites you love and visit on a daily basis from spam and abuse. Ticketmaster is an excellent example of this — the site uses CAPTCHA to prevent bots from buying hundreds of tickets before a concert sells out, and reselling them for higher prices.

So why does it have such a bad reputation among Internet users?

While the program has evolved since its early days, most of us still have to squint and take a wild guess at a distorted jumble of characters in order to prove our humanity. Just go on Twitter, look up “worst CAPTCHA ever” and you’ll see what we’re talking about.

https://twitter.com/robnia/status/821194287132057603

However, while CAPTCHA may be frustrating, are these Twitter examples just isolated incidents? Better yet, can CAPTCHA actually fool those attacks without introducing too much friction to the login process? And does it actually work? Let’s take a look at the numbers.

Is CAPTCHA keeping our accounts safe?

A couple of years ago, Stanford University conducted a study that examined just how effective CAPTCHA can be at lowering the friction. And what they discovered was pretty interesting, to say the least.

On average, a user needs around 10 seconds to solve a text-based CAPTCHA, on the other hand, it takes almost 30 to solve the audio version. And those numbers only relate to native English speakers, when it comes to non-native speakers, it takes even longer.

And when the process takes longer than 10 seconds, some people even give up in frustration. Although estimations vary, Dean Takahashi of Venture Beat suggests that 20% of users give up after a few tries.

Worst of all — no matter how undecipherable they seem, most CAPTCHA schemes are in fact, easily breakable. Don’t believe it? Let’s look at a concrete example, shall we?

A couple of years ago, Stanford University researchers created a program called Decaptcha that was able to defeat audio CAPTCHAS, leaving them vulnerable to automated attacks. During the trial run, the program was able to successfully decode audio CAPTCHA 50% of the time. And today, you have hundreds of similar programs that automatically collect email addresses for spam campaigns.

Additionally, there are some online services, operated from countries that don’t impose law effectively, which offer actual humans to fill up the CAPTCHA for your bot. Basically automation through people. They even offer price tiers and Service Level Agreements, just like any legitimate business would. Through their API’s, a human being half way around the world sees the code and fills it up within 5–6 seconds or so. Their success rate is regularly over 90%.

Invisible reCAPTCHA and the privacy issues ahead

A few years ago, Google decided to simplify the “prove-you-are-a-human” test by introducing the a simple “I’m not a robot” click that detects if you’re a bot based on how you tick the box (bots are more likely to click right than in the middle of the box).

However, if you’re thinking that this test is both more convenient and secure you’re gravely wrong. Recent advances in AI have resulted in smart machines that are now able to crack even the most difficult versions of the test with an astounding 99.8% accuracy according to recent Google research.

https://twitter.com/Leech/status/824709545771626499

And that’s why the last version of the program, the “Invisible reCAPTCHA” removed the clicking entirely. Instead, the program monitors how you’ve interacted with a certain website to date, how you move your mouse around and reviews any past data the company holds on you.

But according to Business Insider, it seems like the new version of CAPTCHA is collecting far more information than it suggests. In fact, it appears that the program is collecting personally identifiable data like IP address, screen size and CSS information from the web page you’re on.

The takeaway

The bottom line is this, when it comes to anti-account-hijacking capabilities, websites cannot rely just on CAPTCHA to prevent spam attacks. The test will continue to evolve and improve, but at what cost?

Also, just because a website has CAPTCHA it doesn’t mean that your data is safe.

By collecting so much personal information, the system is unfortunately contributing to the trend of making the Internet harder to use than ever for people who value their privacy.

AlterEgo team

We’ve developed AlterEgo for those who want to protect their personal data. It generates email addresses, passwords, usernames or any credential you need for subscriptions. Give it a try here, it’s free.

--

--

AlterEgo

AlterEgo is an identity protector. It generates full virtual credentilas (email addresses, strong passwords, usernames etc) so you don’t expose personal data.