Great Idea, But…

May 24th, 2007at 08:42pm Posted by Eli

I think I might be missing something here…

PITTSBURGH (AP) — A few simple keystrokes may soon turn blather into books.

Researchers at Carnegie Mellon University have discovered a way to enlist people across the globe to help digitize books every time they solve the simple distorted word puzzles commonly used to register at Web sites or buy things online.

The word puzzles are known as CAPTCHAs, short for ”completely automated public Turing tests to tell computers and humans apart.” Computers can’t decipher the twisted letters and numbers, ensuring that real people and not automated programs are using the Web sites.

Researchers estimate that about 60 million of those nonsensical jumbles are solved everyday around the world, taking an average of about 10 seconds each to decipher and type in.

(…)

”Humanity is wasting 150,000 hours every day on these,” said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon. He helped develop the CAPTCHAs about seven years ago. ”Is there any way in which we can use this human time for something good for humanity, do 10 seconds of useful work for humanity?”

Many large projects are under way now to digitize books and put them online, and that’s mostly being done by scanning pages of books so that people can ”page through” the books online. In some cases, optical character recognition, or OCR, is being used to digitize books to make the texts searchable.

But von Ahn said OCR doesn’t always work on text that is older, faded or distorted. In those cases, often the only way to digitize the works is to manually type them into a computer.

Von Ahn is working with the Internet Archive, which runs several book-scanning projects, to use CAPTCHAs for this instead. Internet Archive scans 12,000 books a month and sends von Ahn hundreds of thousands of files that are images that the computer doesn’t recognize. Those files are downloaded onto von Ahn’s server and split up into single words that can be used as CAPTCHAs at sites all over the Internet.

If enough users decipher the CAPTCHAs in the same way, the computer will recognize that as the correct answer.

It’s an ingenious idea, but… How does it handle the first few people to type in one of these snippets? Are they on hold until enough other blog users type in the same answer, or are the first few users to get that snippet just given a free pass? The whole premise of the CAPTCHA process is that there is an absolute right answer which the CAPTCHA system knows.

Also, what happens if the system passes a snippet that is genuinely unreadable, or in a different alphabet, or not even text at all?

Entry Filed under: Blogosphere,Books,Coolness,Technology


Contact Eli





Feeds

Linkedelia!

Most Recent Posts

Archives

Categories

Calendar

May 2007
M T W T F S S
« Apr   Jun »
 123456
78910111213
14151617181920
21222324252627
28293031  


Thinking Blogger

Pittsburgh Webloggers

Site Meter


View My Stats *