reCaptcha Fights Spam And Sparked The Burger vs. Sandwich Debate
Ying Liu is a Software Engineer at Google and was part of the team that created reCaptcha nine years ago. reCaptcha has changed a lot since then and there are some really exciting initiatives she is working on at Google that she told us about at the 3rd Annual LDV Vision Summit in 2016.
Tickets for the upcoming Vision Summit are on sale — get yours now!
reCaptcha is an anti-abuse tool. Our mission is to keep the internet organic, green, free of spam abuse. Earlier I was talking to someone in the group, trying to describe what reCaptcha is. At first he didn’t get it, so I started verbally describing this reCaptcha, suddenly he got it and said “oh, it’s that annoying thing on the internet”. First of all it’s very sad to hear you associating that with reCaptcha’s brand. I hope by the end of the session, I will change your mind on this.
This is reCaptcha 7, 9 years ago. What we did at that time, is we distorted synthetic text so that computers can not read it but humans still can. As computer vision improves, OCRs are getting better, and machines are getting really good at recognizing this kind of distortion. As a result, we have to change the distortion harder and harder until it looks like this 3 years ago. I’m going to give you a second, try to transcribe what it says but don’t blame me if it hurts your eye.
What we did is say — let’s test this on humans. See how humans can solve them. And for the known humans, only one-third of them can recognize this. And then we were saying “Okay, how about machines?” Computer vision’s getting really good. So we trained these hard captchas on the advanced machine learning system inside Google. And guess what? They can solve it at 99.8 % accuracy. The whole game changed around. Now reCaptcha is easy for bots, for machines, and hard for humans. That’s the time we know that we have to totally change the game in order to get back into the game.
This is what we launched a year and a half ago in late 2014. This is the new reCaptcha experience. We call it “No captcha reCaptcha”.
Here’s how it works. You are presented with a check box. It’s a checkbox where it’s labeled “I’m not a robot”. What you do as a user is click on the checkbox to prove to us — reCaptcha — that you’re indeed a human. If we can verify that you’re a human, you’ll come back with this. But the story is not that simple. In the back end, we have an advanced risk analysis system that based on your click and several interactions with us, we can pre-classify you. Between a spectrum of human and bots. If we think you’re a human, a green check is returned automatically. If you’re a bot — we tell you you’re a bot- we reject you right on the spot.
For every other case where we’re not so sure or we think that you’re kind of suspicious, we give you different captcha challenges. Here I’m just explaining two examples.
The one on the left is a 3 by 3 grid of natural images. Where you as a user is to select all the common objects among them. The one on the right is harder. It’s actually given one picture and you’re asked to localize where exactly that object is. As of today in 2016, this is still considered a difficult task for an advanced AI. I know earlier in today’s session, people were saying “oh the image recognition is a solved problem.” Well unfortunately, it’s not solved to us. Until that we have some off-the-counter solution that says “I can recognize any object in the world”.
We launched a year and a half ago. How did this new captcha experience do? I’m going to share some of the insights.
In the past one and a half years, we grew our footprint over the internet. Now we have over a million 7-day active clients. And the captcha widget that we’re showing, another robot, speaks 56 languages and is covering 240 countries and regions. Everyday we receive hundreds of millions of captcha solutions. Among all the correct solutions, roughly one third of them are coming from the “no Captcha” experience. NoCaptcha is defined as the more direct pass with solving a visual test.
Our mission is to keep the internet free of spam abuse. To do that, we can not drive humans away. To improve the humans usability of reCaptcha has always been our top priority. So in View 1, because of the pre-classification that I was talking about earlier, in View 1 we can serve them much easier tasks if we pre-classify them as humans. That means it’s easier text distortions and we’re getting a 89% pass rate. Pass rate here is defined as the total number pass solutions over total number solutions. In View 2, that’s getting much better. The pass rate increases to 96% which means for the remaining 4% of humans, you can always try again.
Solving captcha has been much faster and faster for human users. Again, in View 1 because of the text distortion, you have to type in through a keyboard, which is particularly cumbersome for mobile users. In View 2, that becomes two mouse clicks or even screen touches. By doing that, we shorten the solving time of a captcha to almost a half. That’s a few seconds we’ve saved the internet users for every captcha solving. Cumulatively, that is 50,000 hours that we save the internet every day. 50,000 hours, that is almost 6 man years. This is a lot of time that you could watch cats and dogs videos online on YouTube rather than solving captchas.
Captchas is getting easier for human users. Here, we’re showing some stats from the bots analysis. For the pre-classified bots, we give them much harder captchas. Here we’re showing a significant attrition rate for bots. The blue bar is how many times they click on “I’m not a robot” and the red bar is how many times they actually attempt to solve a captcha. As you can see, only 5% of the clicks leads into a solution. And for the remaining 95% of the bots, they basically abandon the experience and walk away. We tried the same thing on human users. Is it because it’s a hard captcha and people walk away? It’s strange now, for human users more than 90% of the time, they actually try to attempt to solve the captcha.
This is the overall pass rate we observed globally from reCaptcha view one. Here is colored coded as red being a low pass rate; meaning most of the solutions failed. Green is high pass rate; most of them succeeded. As we move into the view two experience, the noCaptcha experience, the map is turning into a land of green. This is a very good thing for all internet users. Because whenever you encounter a reCaptcha View 2 on the internet, you’re most likely to solve it correctly. Unless you’re a bot, in which case you’re going to walk away.
To recap what I said just now, reCaptcha is getting easier and faster for human users and getting harder for bots. But this is not the end of our story. The other part I want to share with you is how reCaptcha is helping to improve and to make humanity better.
When we started reCaptcha 7, 9 years ago, it was an anti-abuse tool but most importantly it’s also to help to digitize books. So remember in view one, we’re showing two words. One is a text distortion where you’re using to verify that you’re human. The other word is actually coming from a book scan, so it’s a book word. If you answer the verification word correctly, we also think that you transcribe the book word correctly. So, in doing so, we have transcribed millions of books.
After the books digitization, we tried reCaptcha on street number and street names transcription. Here we gathered the largest image training that is online. And we have donated a significant chunk of it to the open research community. This is helping us to build a better maps experience and more accurate maps for the whole internet users.
You can pretty much guess what I’m trying to say here. As we move into the View 2, we’re showing natural images for labeling. We’re gathering the internet intelligence to help teach machines and making AI smarter.
We’re also celebrating holidays with internet users. Here are two example pictures from new years captchas.
The other thing that I didn’t show here — as I was talking to some of you during the break — there are some funny things happening at reCaptcha. We started the biggest debate on the internet about what is a burger, what is a sandwich. People love to argue about those things. Or is a cupcake a cake? Those kind of discussions. So doing those, we want a lot of internet love for reCaptcha.
To conclude, my whole talk, reCaptcha is making continuous effort to fight spam on the internet. We’re making the internet a better experience for all human users. We’re also pushing the boundaries of research and making AI smarter.
Originally published at www.ldv.co.