Bibliotheke points to the recent adventures of Greg Duffy, a talented Texas college student who figured out how to read entire copyrighted books in Google Print by “baking” the cookies (data sent from to your computer from a web browser to store preferences for specific sites and pages) Google uses to impose search limits on protected material. Duffy took on the challenge largely out of curiousity, but doesn’t deny that he fantasizes about his chutzpah landing him a job at Google. He hasn’t been hired yet, but he did manage to attract a great deal of attention and over 10,000 hits to his site from more than 60 countries. And in the sudden commotion, he mysteriously disappeared from Google’s web search results, only to reappear shorly after Google Print had been fixed to repel the hack. Any connection between the two events was cheerily denied by a Google representative writing in the comments on Duffy’s blog under the nom de plume “Google Guy.” Conspiracy theories abound, but Duffy has retained an excellent sense of humor throughout the whole affair, and still makes no secret of his hopes that sheer audacity and display of chops might yet get him hired by the juggernaut he so admires and loves to tease.
It’s a bit tech-heavy, but it’s worth reading his post and the updates that follow, if for no other reason than for his amusing riff on the cookie motif.
“So recently I wrote some software to grab and store up a bunch of cookies, keep them for more than 24 hours, and then automate searching for pages by this method. If I wanted to view page 100, the software would search for it and attempt to extract the image with a regular expression. If that doesn’t work, it will search for page 99 and extract the “next page” link to get to page 100. It will continue doing this for page 101, 98, and 102 until it finds the correct page. Whenever a cookie would hit the hard limit, I’d replace it with a new cookie from the queue. By grabbing the “next” and “previous” links automatically in this “inductive” fashion and using the search for skipping, I could view an entire book on Google Print with one click every time. I later modified the software to spit out a PDF of the book. I used simple components like GoogleCookie (cookie with accessible properties), GoogleCookieOven (queue with “baking time”, i.e. it only pops when the head of the queue is old enough to get the ability to search), and GoogleCookieBaker (thread that keeps the oven full of baking cookies by querying Google for new ones when the number drops below a certain threshold).”
Scam free books out of Google Print
…If I’m reading this correctly, he was basically collecting up all the loose pages of a book, then stitching them together in PDF and outputting it as a standalone book. Pretty impressive, I think…