Category Archives: google_print

the book is reading you

I just noticed that Google Book Search requires users to be logged in on a Google account to view pages of copyrighted works.
google book search account.jpg
They provide the following explanation:

Why do I have to log in to see certain pages?
Because many of the books in Google Book Search are still under copyright, we limit the amount of a book that a user can see. In order to enforce these limits, we make some pages available only after you log in to an existing Google Account (such as a Gmail account) or create a new one. The aim of Google Book Search is to help you discover books, not read them cover to cover, so you may not be able to see every page you’re interested in.

So they’re tracking how much we’ve looked at and capping our number of page views. Presumably a bone tossed to publishers, who I’m sure will continue suing Google all the same (more on this here). There’s also the possibility that publishers have requested information on who’s looking at their books — geographical breakdowns and stats on click-throughs to retailers and libraries. I doubt, though, that Google would share this sort of user data. Substantial privacy issues aside, that’s valuable information they want to keep for themselves.
That’s because “the aim of Google Book Search” is also to discover who you are. It’s capturing your clickstreams, analyzing what you’ve searched and the terms you’ve used to get there. The book is reading you. Substantial privacy issues aside, (it seems more and more that’s where we’ll be leaving them) Google will use this data to refine Google’s search algorithms and, who knows, might even develop some sort of personalized recommendation system similar to Amazon’s — you know, where the computer lists other titles that might interest you based on what you’ve read, bought or browsed in the past (a system that works only if you are logged in). It’s possible Google is thinking of Book Search as the cornerstone of a larger venture that could compete with Amazon.
There are many ways Google could eventually capitalize on its books database — that is, beyond the contextual advertising that is currently its main source of revenue. It might turn the scanned texts into readable editions, hammer out licensing agreements with publishers, and become the world’s biggest ebook store. It could start a print-on-demand service — a Xerox machine on steroids (and the return of Google Print?). It could work out deals with publishers to sell access to complete online editions — a searchable text to go along with the physical book — as Amazon announced it will do with its Upgrade service. Or it could start selling sections of books — individual pages, chapters etc. — as Amazon has also planned to do with its Pages program.
Amazon has long served as a valuable research tool for books in print, so much so that some university library systems are now emulating it. Recent additions to the Search Inside the Book program such as concordances, interlinked citations, and statistically improbable phrases (where distinctive terms in the book act as machine-generated tags) are especially fun to play with. Although first and foremost a retailer, Amazon feels more and more like a search system every day (and its A9 engine, though seemingly always on the back burner, is also developing some interesting features). On the flip side Google, though a search system, could start feeling more like a retailer. In either case, you’ll have to log in first.

thinking about google books: tonight at 7 on radio open source

While visiting the Experimental Television Center in upstate New York this past weekend, Lisa found a wonderful relic in a used book shop in Owego, NY — a small, leatherbound volume from 1962 entitled “Computers,” which IBM used to give out as a complimentary item. An introductory note on the opening page reads:

The machines do not think — but they are one of the greatest aids to the men who do think ever invented! Calculations which would take men thousands of hours — sometimes thousands of years — to perform can be handled in moments, freeing scientists, technicians, engineers, businessmen, and strategists to think about using the results.

This echoes Vannevar Bush’s seminal 1945 essay on computing and networked knowledge, “As We May Think”, which more or less prefigured the internet, web search, and now, the migration of print libraries to the world wide web. Google Book Search opens up fantastic possibilities for research and accessibility, enabling readers to find in seconds what before might have taken them hours, days or weeks. Yet it also promises to transform the very way we conceive of books and libraries, shaking the foundations of major institutions. Will making books searchable online give us more time to think about the results of our research, or will it change the entire way we think? By putting whole books online do we begin the steady process of disintegrating the idea of the book as a bounded whole and not just a sequence of text in a massive database?
The debate thus far has focused too much on the legal ramifications — helped in part by a couple of high-profile lawsuits from authors and publishers — failing to take into consideration the larger cognitive, cultural and institutional questions. Those questions will hopefully be given ample air time tonight on Radio Open Source.
Tune in at 7pm ET on local public radio or stream live over the web. The show will also be available later in the week as a podcast.

google print on deck at radio open source

Open Source, the excellent public radio program (not to be confused with “Open Source Media”) that taps into the blogosphere to generate its shows, has been chatting with me about putting together an hour on the Google library project. Open Source is a unique hybrid, drawing on the best qualities of the blogosphere — community, transparency, collective wisdom — to produce an otherwise traditional program of smart talk radio. As host Christopher Lydon puts it, the show is “fused at the brain stem with the world wide web.” Or better, it “uses the internet to be a show about the world.”
The Google show is set to air live this evening at 7pm (ET) (they also podcast). It’s been fun working with them behind the scenes, trying to figure out the right guests and questions for the ideal discussion on Google and its bookish ambitions. My exchange has been with Brendan Greeley, the Radio Open Source “blogger-in-chief” (he’s kindly linked to us today on their site). We agreed that the show should avoid getting mired in the usual copyright-focused news peg — publishers vs. Google etc. — and focus instead on the bigger questions. At my suggestion, they’ve invited Siva Vaidhyanathan, who wrote the wonderful piece in the Chronicle of Higher Ed. that I talked about yesterday (see bigger questions). I’ve also recommended our favorite blogger-librarian, Karen Schneider (who has appeared on the show before), science historian George Dyson, who recently wrote a fascinating essay on Google and artificial intelligence, and a bunch of cybertext studies people: Matthew G. Kirschenbaum, N. Katherine Hayles, Jerome McGann and Johanna Drucker. If all goes well, this could end up being a very interesting hour of discussion. Stay tuned.
UPDATE: Open Source just got a hold of Nicholas Kristof to do an hour this evening on Genocide in Sudan, so the Google piece will be pushed to next week.

sober thoughts on google: privatization and privacy

nypl reading room.jpg
Siva Vaidhyanathan has written an excellent essay for the Chronicle of Higher Education on the “risky gamble” of Google’s book-scanning project — some of the most measured, carefully considered comments I’ve yet seen on the issue. His concerns are not so much for the authors and publishers that have filed suit (on the contrary, he believes they are likely to benefit from Google’s service), but for the general public and the future of libraries. Outsourcing to a private company the vital task of digitizing collections may prove to have been a grave mistake on the part of Google’s partner libraries. Siva:

The long-term risk of privatization is simple: Companies change and fail. Libraries and universities last…..Libraries should not be relinquishing their core duties to private corporations for the sake of expediency. Whichever side wins in court, we as a culture have lost sight of the ways that human beings, archives, indexes, and institutions interact to generate, preserve, revise, and distribute knowledge. We have become obsessed with seeing everything in the universe as “information” to be linked and ranked. We have focused on quantity and convenience at the expense of the richness and serendipity of the full library experience. We are making a tremendous mistake.

This essay contains in abundance what has largely been missing from the Google books debate: intellectual courage. Vaidhyanathan, an intellectual property scholar and “avowed open-source, open-access advocate,” easily could have gone the predictable route of scolding the copyright conservatives and spreading the Google gospel. But he manages to see the big picture beyond the intellectual property concerns. This is not just about economics, it’s about knowledge and the public interest.
What irks me about the usual debate is that it forces you into a position of either resisting Google or being its apologist. But this fails to get at the real bind we all are in: the fact that Google provides invaluable services and yet is amassing too much power; that a private company is creating a monopoly on public information services. Sooner or later, there is bound to be a conflict of interest. That is where we, the Google-addicted public, are caught. It’s more complicated than hip versus square, or good versus evil.
Here’s another good piece on Google. On Monday, The New York Times ran an editorial by Adam Cohen that nicely lays out the privacy concerns:

Google says it needs the data it keeps to improve its technology, but it is doubtful it needs so much personally identifiable information. Of course, this sort of data is enormously valuable for marketing. The whole idea of “Don’t be evil,” though, is resisting lucrative business opportunities when they are wrong. Google should develop an overarching privacy theory that is as bold as its mission to make the world’s information accessible – one that can become a model for the online world. Google is not necessarily worse than other Internet companies when it comes to privacy. But it should be doing better.

original google.jpg Two graduate students in Stanford in the mid-90s recognized that search engines would the most important tools for dealing with the incredible flood of information that was then beginning to swell, so they started indexing web pages and working on algorithms. But as the company has grown, Google’s admirable-sounding mission statement — “to organize the world’s information and make it universally accessible and useful” — has become its manifest destiny, and “information” can now encompass the most private of territories.
At one point it simply meant search results — the answers to our questions. But now it’s the questions as well. Google is keeping a meticulous record of our clickstreams, piecing together an enormous database of queries, refining its search algorithms and, some say, even building a massive artificial brain (more on that later). What else might they do with all this personal information? To date, all of Google’s services are free, but there may be a hidden cost.
“Don’t be evil” may be the company motto, but with its IPO earlier this year, Google adopted a new ideology: they are now a public corporation. If web advertising (their sole source of revenue) levels off, then investors currently high on $400+ shares will start clamoring for Google to maintain profits. “Don’t be evil to us!” they will cry. And what will Google do then?
images: New York Public Library reading room by Kalloosh via Flickr; archive of the original Google page

google print is no more

Not the program, of course, just the name. From now on it is to be known as Google Book Search. “Print” obviously struck a little too close to home with publishers and authors. On the company blog, they explain the shift in emphasis:

No, we don’t think that this new name will change what some folks think about this program. But we do believe it will help a lot of people understand better what we’re doing. We want to make all the world’s books discoverable and searchable online, and we hope this new name will help keep everyone focused on that important goal.

having browsed google print a bit more…

…I realize I was over-hasty in dismissing the recent additions made since book scanning resumed earlier this month. True, many of the fine wines in the cellar are there only for the tasting, but the vintage stuff can be drunk freely, and there are already some wonderful 19th century titles, at this point mostly from Harvard. The surest way to find them is to search by date, or by title and date. Specify a date range in advanced search or simply enter, for example, “date: 1890” and a wealth of fully accessible texts comes up, any of which can be linked to from a syllabus. An astonishing resource for teachers and students.
The conclusion: Google Print really is shaping up to be a library, that is, of the world pre-1923 — the current line of demarcation between copyright and the public domain. It’s a stark reminder of how over-extended copyright is. Here’s an 1899 english printing of The Mahabharata:
A charming detail found on the following page is this old Harvard library stamp that got scanned along with the rest:
mahabharata harvard stamp.jpg

pages á la carte

The New York Times reports on programs being developed by both Amazon and Google that would allow readers to purchase online access to specific sections of books — say, a single recipe from a cookbook, an individual chapter from a how-to manual, or a particular short story or poem from an anthology. Such a system would effectively “unbind” books into modular units that consumers patch into their online reading, just as iTunes blew apart the integrity of the album and made digital music all about playlists. We become scrapbook artists.
It seems Random House is in on this too, developing a micropayment model and consulting closely with the two internet giants. Pages would sell for anywhere between five and 25 cents each.