CAPTCHAs are everywhere on the web now. They are the distorted text that you are asked to identify before being allowed to register for an account. The purpose is to prevent computer programs from gaining quick access to many accounts for nefarious purposes (spam for example.)
reCAPTCHA piggy-backs on CAPTCHA. You are asked to identify two words. The first is a standard CAPTCHA. If you enter the correct word you identify yourself as a human. The second is a word that has been optically scanned from a book that is being digitized. It has found its way into this reCAPTCHA because the computer doing the optical character recognition was not able to identify it. If you have identified yourself as a human via the first CAPTCHA, your answer to the second word is assumed to be correct and used in the digital translation. You are digitizing the book.
According to Wikipedia 20 years of the New York Times archive has been digitized with the help of reCAPTCHA. And, “provides about the equivalent of 160 books per day, or 12,000 manhours per day of free labor.”
The first reaction to this is obvious. The labor is not free. In fact it costs exactly 12,000 man hours. Lots of things can be produced with 12,000 man hours. Lots of leisure can be consumed in 12,000 hours. Is digitizing the New York Times the best use of this people-time? On top of that the reCAPTCHA is a tax which reduces the quantity of online accounts transacted and that is a deadweight loss.
But it is just a few seconds of your time right? Something about that seems to change the calculation. I bet most people would say that they don’t mind giving away two seconds of their time. Part of this is due to an illusion of marginal vs total. People are tempted to treat the act as a gift of two seconds of their time in return for a whole digitized library. But in fact they are giving away two seconds of their time for one digitized word.
A second part of this is due to a scale illusion. You may successfully convince said reCAPTHArer that she is just getting a tiny fraction of the book for her two seconds but she will probably still say that she is happy with that. But if you ask her whether she is willing to contribute 1000 seconds for 500 words, probably not. And, to take increasing marginal costs out of the question, if you asked her whether she thought digitizing the New York Times is worth how many thousands of woman-hours of (dispersed) ucompensated labor she again might start to see the point.
But still, not everybody. And I think there must be some sound rationale underneath this. I would not argue that digitizing books is the necessarily the highest priority public good, but the mechanism is inherently linked to deciphering words. True, we could require everyone who signs up at Facebook to donate 1 penny to fight global warming but A) it is never possible to know exactly what “1 penny toward fighting global warming” means whereas there is no way to redirect my contribution if I decipher a word. That is not a liquid asset. And B) two seconds of most people’s time is worth less than 1 penny (we are talking about Facebook users remember) and we don’t have a micro-payments system in place to go down to fractions of pennies.
Perhaps what we have here is a unique opportunity to utilize a public-goods contribution mechanism that transparent and non-manipulable and guarantees to each contributor that he will not be free-ridden on: everyone else is committed to the same contribution.

17 comments
Comments feed for this article
July 13, 2009 at 7:33 am
Todd
This real interesting thing about the Captchas is that it is wholly due to the inaccuracies of both the scanning and OCR performed on the book.
If you are going to go to the trouble to digitize books, manuscripts and the like you need to do it right the first time. For my money the solutions provided by Kirtas Technologies is best. They have the complete customizable solution from beginning to end. No Captchas at Kirtas. Say that three times fast 😉
Great Blog!
July 13, 2009 at 12:58 pm
Carolina
I loved your article, because it made me think. And I think reCAPTCHA is not a public good.
Isn’t reCAPCHA really a monopsony? They’re the only buyer who can take in labor supply in a captured market (FACEBOOK) in exchange for pay (FRIENDING).
July 13, 2009 at 2:07 pm
Sparky
I think CAPCHA’s are a necessary evil in today’s Internet environment (web security, preventing spammers, etc), so they might as well get some good out of it. I think what they have done is extremely innovative and impressive.
July 13, 2009 at 7:37 pm
Matt Rognlie
I remember Luis von Ahn (the creator of reCAPTCHA) saying that it actually takes less time than many other CAPTCHA systems, because it’s simpler for humans to read two real words than a random jumble of letters, even if the latter is shorter.
In this interpretation, reCAPTCHA is legitimate Pareto improvement over other systems: it both saves users’ time and makes their labor useful for society as a whole. This isn’t as implausible as it sounds — the whole point of CAPTCHAs is to provide text that bots aren’t able to decipher, and text from old books that OCR programs can’t read is an excellent candidate for this end.
July 13, 2009 at 10:48 pm
Divya
“I bet most people would say that they don’t mind giving away two seconds of their time. Part of this is due to an illusion of marginal vs total.”
I don’t think of this as “giving away” my time. I don’t care if a website uses CAPTCHA or reCAPTCHA. It is just something I need to do to gain access to the content I want to read (like the key to your door). reCAPTCHA is popular because it is free and caters for people who can’t see (unlike other CAPTCHA tools).
I really don’t see how the marginal time quoted in this article works in this scenario. We all spend 2 seconds of our day opening our house doors with keys, combining the 2 seconds for all the people of earth who have doors with locks that is a lot of man hours, how does that even count?
July 14, 2009 at 7:20 am
jeff
divya:
reCAPTHCA doubles the work you have to do for the “unlocking.” So it requires you to use an additional 2 seconds that are not necessary for the unlocking. 2 seconds go to unlocking and 2 seconds go to digitizing a word. (Imagine somebody going around and putting a second lock on everybody’s door.) With CAPTCHA it would just be 2 seconds for the unlocking. In that sense you are giving away those additional 2 seconds.
July 14, 2009 at 12:24 am
rbhui
I agree with the above comments. This situation reminds me of something Michael Ghiselin wrote about the division of labour. Labour is divided when the various tasks interfere with each other; plumbing and carpenting do, and so tend to be carried out by different individuals, while babysitting and studying naturally work fine together. In this case, the two functions are distinguishing humans from bots and digitizing books, and they are quite complementary. They interfere with each other so minimally that there really is a Pareto improvement.
July 14, 2009 at 9:37 am
Divya
@jeff
I understand now what you mean. But my only argument is people do not think about “saving books” or “getting access to digitalized content from NY Times” when they fill in reCAPTCHA. They just want it to get over with and access whatever they would like to. I bet less than 1% of users actually click on the url for reCAPTCHA linked from that tool. For website owners, they are looking for an effective CAPTCHA tool that is free. reCAPTCHA fills the need on both ends.
July 14, 2009 at 1:05 pm
jeff
An end user’s only alternative is not to complete the registration so she naturally pays no attention to the economics and just goes on. A web designer optimally adopts reC if there is no free alternative
But we can still ask whether the reC system should be replaced altogether by a free CAPTCHA system which takes half the time and doesn’t digitize books. The developers of reC are making the choice of imposing the tax and using I to digitize books. You the web site developer are complicit in that you pass on this tax to your users.
July 14, 2009 at 10:00 am
Ashley
@jeff
I agree with Divya. What she says makes absolute sense. It doesn’t matter. The users just wants to move on. Simple. I don’t mind spending another two seconds helping digitizing a book for all its worth and most people I know wouldn’t either
July 14, 2009 at 10:12 am
Ashish
I really don’t think that recaptcha actually doubles the time. One thing about recaptchas is that both of the words are dictionary words. It takes much less (or may be the same) time to type in 2 dictionary words than a weird looking 8 character string.
July 14, 2009 at 1:20 pm
Divya
@Jeff
The “tax” exists because you are assuming reCAPTCHA is slower than CAPTCHA, but is there evidence that it is so? As Ashish says, since it is using dictionary words, it might be faster (and there are rarely any numbers).
July 14, 2009 at 1:40 pm
jeff
Not being an expert, I am not aware of the full range of off-the-shelf systems available, but clearly reC is slower than a version which is identical to reC but uses only one (dictionary) word. The reC developers could switch to such a system and I imagine that such a switch would be trivial to implement.
July 14, 2009 at 1:47 pm
Divya
@Jeff
Some people don’t use reC’s at all. http:/zeldman.com is an example (he uses a simple question). It really depends on what kind of spambots the website faces. reC is the best strongest free solution available as far as I know (http://boingboing.net/2004/01/27/solving-and-creating.html on how spambots try to overcome CAPTCHAs).
July 16, 2009 at 2:05 pm
Gamesmanship of reCAPTCHA « Cheap Talk
[…] 16, 2009 in Uncategorized | Tags: game theory, incentives, the web | by jeff To remind you, reCAPTCHA asks you to decipher two smeared words before you can register for, say, a gmail […]
August 18, 2009 at 5:04 pm
TruthLluva
I’m sorry but after reading this article i can only call you an arrogant narrow-sighted pinhead.
September 18, 2009 at 12:43 pm
Google Acquires reCAPTCHA « Cheap Talk
[…] | Tags: economics, the web | by jeff We all work for google now. Previous posts on reCAPTCHA here and here. beanie bow: lance fortnow. Tagsart banana seeds blog books california chicago […]