Designing an unbeatable CAPTCHA

The CAPTCHA has been around for years, but most designs are still not perfect. In fact most can be defeated with some simple image manipulation and a decent OCR algorithm. For those of you that don’t know, a CAPTCHA is an challenge-response system in which the user is presented with a problem that they must solve in order to verify that they are human and not a spam-bot. There are many popular designs around (reCAPTCHA, Egglue, etc) but the important question is which is the best?

There are 5 main types of CAPTCHA:

  • Image – An image containing text is displayed to the user, and the user must provide a response based on the contents of that image.
  • Sound – The same as an image CAPTCHA, but with a sound clip instead.
  • Text – Text is displayed to the user and the user must write it out in a textbox.
  • Question – The user is asked a question such as “what colour is the sky?” and they must respond with an appropriate answer.
  • Completion – The user must fill in a blank to complete a sentence, for example “The water was ___ and therefore scalded his hand.”

I’m going to address each type individually, and explain the pros and cons.

Image

Image based CAPTCHA systems are perhaps the most widely used, as (when implemented properly) they make it very difficult for automated processes to break them. The most common type of image CAPTCHA involves displaying a set of characters and asking the user to type them into a text box. As a spam bot cannot “read” an image like it can read text, it cannot solve the CAPTCHA. After a while, spam bot writers began using OCR (Optical Character Recognition) algorithms that could analyse the CAPTCHA image to deduce the word or string of characters it contained. To combat this, modern CAPTCHAs distort the image in a way that makes it extremely hard for OCR algorithms to understand whilst still allowing a human to read the image. Some methods of image distortion were quite easily defeated using noise reduction and other techniques, whilst others remain less succeptable to these techniques.

An OCR algorithm works by taking an image and splitting it into individual parts that each contain one character, then analysing each character (using one or more methods) in an attempt to identify it. Splitting the image into individual parts (known as segmentation) requires the algorithm to be able to identify what area each character resides in. Many methods of analysis involve comparing a database of known character images to the input image in order to calculate which is the most likely result. Some algorithms split the image into sections of light and dark (i.e. dark is where part of the letter is) and use this map to generate a series of vectors that describe the character. It then compares these vectors to its database in order to work out which letter is shown.

There are several ways to make recognition difficult for OCR algorithms:

  • Background noise – Adding random dots, lines, shapes and characters in the background can make it difficult for OCR algorithms to differentiate between the text itself and the image background.
  • Text colour – Making each letter of the text have a different colour to make it difficult to identify characters.
  • Character font, size and rotation – Altering the font, size and rotation of each letter makes it difficult for OCR algorithms to correctly identify the character.
  • Wave distortion – Twisting and convoluting each character or the string as a whole causes the position and shape of the text to be changed.
  • Pinch/Punch distortion – Stretching the image in certain areas results in a similar result to wave distortion.
  • Foreground noise – Adding random dots, lines and shapes in the foreground puts small defects in the characters, making it hard for OCR algorithms to tell what the character says.
  • Anti-segmentation – Pressing characters together or drawing a thick line at a random angle through the text makes it difficult for the OCR algorithm to split the image into individual characters.

Some of these are somewhat redundant – gaussian noise can be filtered out and different text colours can be removed by converting the image to greyscale and altering the contrast. Anti-segmentation works well, but can often make it difficult for humans to read the image too. The same applies to distortion and foreground noise – too little and it is pointless, too much and it makes it hard to read for humans.

One of the biggest mistakes in creating CAPTCHAs is using real words. I know I’m going to take a lot of heat for saying that, as reCAPTCHA and many other popular systems use real words, but it’s a case of giving the spam bot more information than you need to. Consider how T9 predictive text works on phones – as you type letters it compares your input so far to a list of known words and uses it to deduce what you were trying to type. If a bot can deduce that your CAPTCHA has 7 letters and that letters 1, 2, 5 and 6 are “ob__qu_”, it then only has to decide between “oblique”, “obloquy” and “obsequy”. If the bot can calculate whether the second letter is more likely “s” or “l”, and the last letter is more likely “e” or “y”, it has solved the CAPTCHA. Even if it cannot definately (or even probablistically) say which word it is, it still has a 33% chance of choosing the right answer if it randomly selects a possible word. If you use random strings such as “K2Tv6p” the bot cannot use dictionary prediction to solve your CAPTCHA.

Warping your text using a wave or pinch/punch method with random parameters makes it hard for bots to compare the characters against a database, as the warp alters not only the position and rotation of the character but also causes straight lines to become curved.

Some CAPTCHA systems display an image that contains characters of two or more different colours and asks the user to only write down the letters that are of a certain colour. For example if the image displayed “QPT7XB5H6R” and asked them to write in the white letters, their response would be T7BHR. The problem with this is that you are telling the spam bot what colour it should search for to find the right letters, which makes segmentation much easier.

Another type of image based CAPTCHA involves showing the user a picture of an object (e.g. a cat, hammer, bowl, etc) and asking them to respond with the name of the object. Whilst this system is easy to create, its downfall is that it is very easy to request the CAPTCHA image repeatedly until you have a copy of each image. When the challenge is issued, the image file’s hash can be compared to it and the CAPTCHA is solved. If the CAPTCHA inserts subtle noise or modifies the image slightly so that the file hash changes, the bot can simply split the image up into blocks and calculate the average R, G and B levels for each. The image in the database that most closely matches the challenge image is chosen.

Sound

When designing image based CAPTCHA systems, accessibility must be considered – people that are blind or have reading difficulties are unlikely to be able to solve the CAPTCHA properly. Most websites that use a CAPTCHA give a link to a page where the administrator can be contacted in order to disable CAPTCHAs for a specific account or IP address. Providing an audio copy of the sound is not advised, since manipulating audio in PHP or ASP is difficult and it isn’t difficult for a bot to capture the audio and analyse it to deduce the contents of the CAPTCHA. In general, sound CAPTCHAs are not very secure.

Text

Some CAPTCHA systems must be installed on servers that do not have image manipulation libraries such as GD available, and so must resort to other methods of displaying a CAPTCHA. Text based CAPTCHA systems display a word or phrase and the user must type it out. Unfortunately this has the downside that the text (or at least a derivative of it) must be stored in the page’s code. To make it more difficult for bots to break, the CAPTCHA system can write the letters in separate elements and shuffle their positions round using CSS and/or JavaScript. This means that “<span>bought</span> <span>you</span> <span>cat</span> <span>a</span>” in the code could be re-organised to read “you bought a cat” when the browser displays it. By studying your script, however, the bot writer could easily reverse engineer your code to allow the bot to decode the solution.

Other CAPTCHA systems display the word in ASCII art to make it difficult for bots to understand. Mixed styles and random alterations can make it quite difficult for a bot to deduce what the text says, but by looking for the patterns between spaces and characters a map can be generated in order to effectively guess the solution.

Question

Question based CAPTCHAs may use text or an image to pose a question to the user which they must answer correctly to complete the CAPTCHA. The problem with questions is not only that they must be incredibly easy so anyone can solve them, but that many questions must be entered in order to make it hard for the bot to work out which one is being asked. In the case of text questions, natural language processing can usually deduce the answer, but in most cases one may simply make a template of every question (e.g. “What is 34 + 12?” becomes “What is x + y?”) and match the template in order to solve the problem. Other questions are ambiguous, for example most people would respond to “what colour is the sky?” with “blue”, but it is also black at night, orange at sunrise, white when it’s cloudy and a whole bunch of colours if you happen to live near the northern lights. This also requires people to know a good amount of Englsh.

Questions written within an image need to be obscured too, otherwise bots can simply use OCR to extract the question. The problem is that questions are written in a language and therefore dictionary prediction is possible. Obscuring a whole sentence may also make it difficult to read for the user.

Completion

Completion CAPTCHAs are interesting, because they have a very fluid structure – i.e. there are billions of possible challenges that the script can issue. Natural language processing of books and online texts may allow hundreds of thousands of appropriate sentences to be extracted and imported into a database. They usually require the user to be quite literate, but not so much that it prohibits users with limited English from solving the CAPTCHA. The limitations are still similar to question based CAPTCHAs, but since a sentence with a blank isn’t actually asking for a logical response (whereas “what is 5 plus 66″ is) it makes it harder for bots to deduce an answer.

In a nutshell

It is best to use an image based CAPTCHA with enough distortion and noise to make analysis difficult without making it too hard to read. Adding a line that is about the width of the lines in the characters through the text or squashing the letters together helps prevent bots from performing segmentation properly. Multiple fonts make a CAPTCHA more resiliant, whilst multiple colours often does not. Using words instead of random strings is a bad idea. It is a good idea to randomize the rotation and vertical positions of characters, as well as the character spacing.

If you’ve not got image manipulation libraries available, use ASCII art to display a random string instead. Randomize the characters you use to write the art with where possible, and insert a few characters in random places to throw off the bots. You can also jumble the ASCII around with CSS and JavaScript.

And that’s about it – go forth and create good CAPTCHAs!

Posted: September 23rd, 2009
at 5:54pm by admin


Categories: Development, Tutorials

Comments: 1 comment


Twisted Pair Design

I’ve started a professional web design company called Twisted Pair Design. I offer template based designs for as low as £120, and custom web design from £400. I’m also offering a 10% discount to businesses in Nottighamshire and Derbyshire. So, if you’re looking for a new website for your business or for personal use, check out Twisted Pair Design.

Posted: September 1st, 2009
at 10:15pm by admin


Categories: General

Comments: No comments


PHP Tutorial – Creating a Secure Login System

I wrote an article over at Vorbb.com about creating a secure login system in PHP, including all the pitfalls where people tend to fall down. Feel free to take a look.

Posted: June 17th, 2009
at 9:35pm by admin


Categories: Development, Tutorials

Comments: No comments


Digital Clock Project

I’m building a clock from TTL components and have just finished the simulation and testing in Multisim. It used counters and basic logic to detect when the maximum for each digit is reached. Schematic attached, displays are in reverse order (unit seconds on far left, tens of hours on far right).

Schematic:
Circuit schematic for a 24h clock

Posted: May 26th, 2009
at 11:42pm by admin


Categories: Hardware

Comments: No comments


Download Festival 2009

My girlfriend and I got tickets for Download Festival this year, as the lineup is excellent (unlike last year). We’re only going for one day, so we chose the Saturday as it has to be the greatest set of bands ever. Slipknot and Marilyn Manson are headlining, and there’s also Dragonforce, DevilDriver, Five Finger Death Punch, The Prodigy and for some reason Pendulum. Not sure why there’s a drum and bass artist at Download, but nevertheless. Definately looking forward to that.

It’s also my girlfriend’s birthday on the 4th. Not saying what I’ve got her, because she becomes some sort of private investigator when presents are mentioned. If you’re reading this Lidija, for all you know I could’ve got you anything from a cabbage on a string* to a large hadron collider. Now go away!

Anyhoo, I’ve also got back into coding Uplink mods. I re-purchased the developer CD as my old one went missing and the drive I had my backup on died, and have just downloaded the latest developer patch. Should be interesting to see what I can do.

*Last year I actually got her a cabbage on a string as a spoof present.

Posted: May 26th, 2009
at 11:53am by admin


Categories: General

Comments: No comments


Network communication using Pastebin

Communicating with botnets has always been an issue for writers, especially considering that they must remain anonymous. The usual solution is a P2P network, but this can be cumbersome and difficult to design. However, I came up with a somewhat unusual method of talking to a botnet without having to resort to such lengths. It uses Pastebin, a website that allows any user to post clips of text or code without logging in or signing up. Pastebin can be used in “private” mode, which means that given any subdomain the user can post stuff to others only if they know the subdomain. For example, if I post a clip in http://ncmd1234.pastebin.com/ you would have to literally type in that subdomain to get at the posts created in it. However, using something static such as http://mybotnet.pastebin.com/ would quickly be detected and the subdomain would probably be disabled. To get around this, I came up with an ingenious idea:

Choose a few high profile RSS feeds that are generally updated once or twice per hour. Things like the BBC, Slashdot or CNN are good ideas.
Have your botnet download the feeds in XML format (in .NET the WebClient.DownloadString() method works fine) and append them to each other in one big string and call this strXml.
Hash strXml with MD5 or SHA1, then take the last 8 bytes of the hash and convert them to hex. Use this as your subdomain for pastebin, e.g. http://bc3b775ecf4ce7bd.pastebin.com/

This means that every 10 minutes or so, the location from which the botnet receives commands will change. The botnet master can simply calculate the hash, then go onto the page using Tor and post a command. If the bots are checking the page every minute, then 80-90% of the time they should get updated with the latest command. To be sure, a repeat post can be made by the botnet master at the next URL when it changes (due to a new entry in the RSS).

Posted: May 15th, 2009
at 9:41pm by admin


Categories: General

Comments: 4 comments


Gadget Show ‘09

I was lucky enough to go see the Gadget Show Live this year, as my parents got tickets for my family. It was a pretty good show, with a more than amusing performance from the gadget show team themselves. I got to see some really awesome tech, including a PC with two separate water cooling systems for its multitude of insanely overclocked components.

The guys from XLeague TV were there too, showcasing a Counter Strike: Source tournament between a couple of professional clans. They also brought along the world champion Guitar Hero player to showcase his skills and I must say I was very impressed. He played Through the Fire and Flames by Dragonforce on expert level and didn’t miss a note, and did the whole thing without even looking like he was making an effort! It takes a combination of pure finger-bending skill and a gargantuan number of hours of practice to acheive that level.

Another high point was the showcasing of two rather interesting display technologies – OLED and Sony’s MotionFlow. Unfortunately, the OLED displays were 9″ and playing The Dark Night, so it was hard to judge their quality too well. From what I could tell it was pretty good, and the blacks and whites were very literal and crisp. The MotionFlow stand however delivered exactly what it said on the tin. They had three Sony monitors in a row, each with the same 1080p HD video clip on them. The first had MotionFlow turned off, the second had their +100Hz option switched on and the third had MotionFlow 200Hz. In panning images (especially smooth pans on brightly coloured images) the effect was dramatic. The first monitor showed glitches in vertical and horizontal synchronization and generally performed pretty poorly. The second monitor did markably better, but still had issues with bright colour transitions. The third monitor (the 200Hz one) looked perfect – there were no synch issues at all. So, if you watch a lot of action movies, this is the technology for you.

Posted: April 23rd, 2009
at 2:35pm by admin


Categories: General

Comments: No comments


Still Alive

I’m not dead. I’ve had serious PC problems and I’ve now had to resort to using my old PC that has a busted IDE controller (I’m having to run everything off its IDE RAID slots). My laptop is completely dead, and has been scrapped for parts. My main PC has at least a dead motherboard, if not a dead everything else. Hardware hates me.

Having limited use of a PC has allowed me to do a lot more work on old projects that I haven’t touched in a while, mainly due to my lack of access to an installation of Visual Studio. I s’pose I could break into my college and use their HE department computers (they have VS2005), but I guess that would be frowned upon.

I’ve been playing Diablo 2 a lot recently, as it’s a fun game that I can play without needing a decent PC. I’ve trained up a trap assassin and it’s currently at level 83 or so. It has a Ber’ed Shako, a pair of RF’ed Bartucs (one perfect Bartucs, one not, both RFs are perfect), Enigma, Maras, Magefist, Arachnid Mesh, Sandstorm Trek, a pair of SOJs, Assassin Torch and 6 skillers. My alt weapons are CTA and Spirit for BO when there’s not a barb around. I’m planning on making a sorc next.

Posted: March 20th, 2009
at 10:36am by admin


Categories: General

Comments: No comments


Lasers and Shaders

I’ve just bought a few bits of fun kit. Firstly, an XFX 8800 Ultra 768MB to replace the damaged BFG 8800 GTX OC2 768MB currently in my machine. It actually works out to be roughly the same GPU and GDDR clock frequency, so the performance difference shouldn’t be much different. It only cost me £160 too, which is great considering people are still trying to sell the BFG one for £350.

The other great thing I’ve bought is a pair of 50mW green lasers. I’m planning on using them for an electronic paintball-style laser gun game I’m working on. It’s basically just a small amplifier and comparator circuit that uses the small current produced by LEDs when hit by a bright light as a sensor. I’ve simulated the design and it works well, and I’ve begun drawing up a veroboard layout for the circuit. When my lasers arrive, I’ll get to building!

Posted: January 5th, 2009
at 6:45pm by admin


Categories: General

Comments: No comments


Poisonous hemi-parasitic foliage and fermented grape products

Deck the halls with pints of Fosters, tra-la-la-la-la, let’s all get pissed! It’s xmas day and I’m rather drunk. Got myself a nice 8GB pile of DDR2, some clothes, chocolate and a plush domo! For the win.

I fear I have said to much, due to my somewhat inebriated state. I can’t spell that when I’m sober, so who knows if I got it right. Wait, I did! Joyous. Anyhoo, that is all.

Posted: December 26th, 2008
at 12:54am by admin


Categories: General

Comments: No comments


« Older Entries