Anti-bot Test: CAPTCHA!

By Xah Lee. Date:

You've seen on the web CAPTCHAs Like this:

captcha

It is a test designed to prevent bot that spam websites. Software can be written to automatically fill web forms. That means, they can leave blog comments or create new web accounts. So, spamers use these software to create hundreds of comments or accounts by the seconds, and leave their advertisement or otherwise walware.

To prevent that, one needs something that computers can not do. Some sort of bot test. So, you have the distorted image, which computers cannot recognize well yet.

The name CAPTCHA is supposed to be: Completely Automated Public Turing test to tell Computers and Humans Apart.

Google's reCAPTCHA

There is a free captcha service called reCAPTCHA, now owned by Google, at http://recaptcha.net/ . It allows web masters to put captchas on their site.

There is one aspect about reCAPTCHA that's interesting. The distorted text are actually from the process of digitizing books. When digitizing books, a photo is taken of book's page, and the image is read by a Optical Character Recognition (OCR) software to translate the image into text. When OCR can not understand a word, that word's image became the source of google's captcha image. So, when you see the weird looking word from reCAPTCHA, you are helping digitizing books too. (OCR means Optical Character Recognition. It is the software designed to recognize text in image form, kinda the opposite of captcha.)

Google has a blog explaining their captcha service, at: http://googlewebmastercentral.blogspot.com/2010/01/protect-your-site-from-spammers-with.html. The blog also features a video, of [Luis von Ahn] ( https://en.wikipedia.org/wiki/Luis_von_Ahn ) explaining reCAPTCHA. Reading Wikipedia on Luis, he turns out be a well recognized genius with many awards.

Though, i must say, my experience with recaptcha is that often it is hard to understand. Often, after several tries i cannot pass. Apparently, many people felt the same as you can see their comments on google's blog. The severity of this problem is critical. Because, when a human fails twice in passing the captcha, there's high chance he'll just abandon whatever he's doing out of frustration. This chance of abandonment grows quickly, and the bad experience sticks. When he sees the recaptcha at some other site, he'll simply stop even giving it a try.

Artificial Intelligence

Captchas are quite interesting in several aspects. It is a simple artificial intelligence problem to differentiate human and machine. It is a good toy problem in image recognition. It is also a interesting problem of web site security.

Note, Wikipedia article cites that many research projects have broken captchas, and also there are alternatives such as image captchas. For example, showing you several images of animals, and ask you to pick one that's cat. Also, breakers has several methods to defeat captchas, including cheap human labor farm.

The history of cop vs robber game in the computing realm is itself a fascinating story.

Overall, i really don't think captchas are a good solution to the web spam problem, at all. It is frustrating to use, waste time, and isn't that effective in preventing spam. In fact, i'm quite surprised that spam just have increased and increased over the past 20 years, everywhere. In my yahoo email account, gmail account, in my several instant messagge chats, in my web logs, in spam blogs that use randomized snippet of text from my website. Today i even get spams from skype, about few times a week now. Captchas and spam is a phenomenon of the tech geekers trying to solve a human nature problem by technology. (See: Spy vs Spy; Tech Geekers vs Spammers)

BUY ΣJS JavaScript in Depth