If you’ve never completed a CAPTCHA before, chances are this is the first time you’ve ever logged on to the world wide web. In which case, welcome! What kept you?! For the rest of us internet obsessives, however, encountering a CAPTCHA can occur multiple times a day.
From setting up a new email account to buying concert tickets, that trusty “I’m not a robot” check box is never far away. Nor are the incessant requests to “Select all squares with… vehicles / storefronts / street signs / traffic lights.” It’s enough to make you see red.
But what is the purpose of CAPTCHA and how is the fight against non-human “bots” evolving over time?
A CAPTCHA is a type of challenge-response test. One that is, in theory, easy for humans to complete but hard for computers. By checking a certain box or typing specific letters, it is possible to prove your existence as a sentient being and, moreover, a valued customer, contributor or content consumer.
Bots, by contrast, are the pests of cyberspace. They spam and manipulate. They pose a substantial threat that needs to be neutralised.
CAPTCHA – perhaps unsurprisingly – is an acronym. It stands for “Completely Automated Public Turing Test to Tell Customers and Humans Apart”. Catchy, right?
Whilst the term itself was coined in 2003, the most common type of CAPTCHA test - with alphanumerical characters in a distorted graphic - was invented over 20 years ago, in 1997. It is a form of reverse Turing Test, which requires users to type out the characters correctly before they can take any further action. Its rationale centred on the premise that sufficiently sophisticated software to read and reproduce the characters accurately didn’t exist, or at least was not available to the average user.
The Turing Test was developed in 1950 by Alan Turing - the creator of modern computing - with the aim of testing a machine’s ability to exhibit intelligent behaviour that is indistinguishable from that of a human. Turing proposed that a human evaluator would judge a text-only conversation between a human and a machine to see if they could reliably tell them apart. If they could not, then the machine had passed the test.
However, it is important to consider the supposed flaws in the methodology. For instance, the skill level of the interrogator could vary significantly from person to person. Similarly, the test is not designed to gauge whether a computer behaves intelligently, only if it behaves like a human. As such, it fails to account for the fact that some human behaviour is unintelligent, whilst some intelligent behaviour is unhuman. The complexities at play in the dynamic between human behaviour and intelligence therefore make it extremely difficult to deliver objective, conclusive results.
Almost 70 years have passed since the Turing Test was developed, and whilst some of the greatest strides in technology have occurred during that time, tricky new challenges have also emerged.
Soon after the creation of CAPTCHA, software was being developed to analyse patterns in the generating engine in order to reverse it. In 2014, Google engineers demonstrated a system that could defeat CAPTCHA challenges with a whopping 99.8% accuracy. This system relies on an algorithm developed by Google researchers to recognise house numbers from blurry images captured by their Street View mapping team. Whilst the algorithm can now accurately recognise 90% of house numbers on Google Maps, it would also be well-suited to cracking CAPTCHA puzzles – a highly-coveted prize for spammers.
It is therefore vital that different CAPTCHAs are presented to different users.
Most, but not all, rely on a visual test to take advantage of the sophisticated way in which humans can process visual data. If you’ve ever seen a shape in the clouds or a face on the moon, you’ll know that your mind is highly adept at associating previously-held information with new patterns and shapes.
Other methods include audio CAPTCHAs for the visually impaired, or contextual CAPTCHAs that test the reader’s comprehension skills.
A particularly novel way to increase the value of CAPTCHA can be found in an application called reCAPTCHA. The application harnesses user responses to verify the contents of a scanned piece of paper to digitise books. The program selects two words from a digitised image to present to users to verify. Specific words are used time and time again until the application has received enough responses to verify the word with a high degree of certainty. Whilst it sounds like a slow and painful process, it’s nice to know that we’ve been helping to safeguard the future of literary works in this digital age every time that annoying little text box pops up.
Whilst I am a staunch advocate for online security (it’s a big part of my job, after all!), I am fascinated by the cat-and-mouse games at play between those who create CAPTCHAs and the hackers who break them. Not least because it gives us an up-to-date and fascinating insight into the growth and development of artificial intelligence.
The hard task is to teach a computer how to process information like humans think. For the most part, programmers will reduce the complexity of a problem using a phased approach. For instance, writing an algorithm that converts the CAPTCHA image to grayscale to remove one of the levels of obfuscation.
Secondly, directing the algorithm to detect patterns within the remaining black and white image. The program then compares each pattern to a normal letter, looking for matches to complete the puzzle.
Every success in breaking a CAPTCHA is essentially a step forward for AI.
Machines are becoming ever-increasingly sophisticated and powerful, and by breaking down the barriers that CAPTCHAs put up, the line between humans and machines becomes that little bit more blurred.
Perhaps robots will be the ones testing us in another 70 years’ time…