Category Archives: google_print

google print’s not-so-public domain

wealthy new york google.jpg Google’s first batch of public domain book scans is now online, representing a smattering of classics and curiosities from the collections of libraries participating in Google Print. Essentially snapshots of books, they’re not particularly comfortable to read, but they are keyword-searchable and, since no copyright applies, fully accessible.
The problem is, there really isn’t all that much there. Google’s gotten a lot of bad press for its supposedly cavalier attitude toward copyright, but spend a few minutes browsing Google Print and you’ll see just how publisher-centric the whole affair is. The idea of a text being in the public domain really doesn’t amount to much if you’re only talking about antique manuscripts, and these are the only books that they’ve made fully accessible. Daisy Miller‘s copyright expired long ago but, with the exception of Harvard’s illustrated 1892 copy, all the available scanned editions are owned by modern publishers and are therefore only snippeted. This is not an online library, it’s a marketing program. Google Print will undeniably have its uses, but we shouldn’t confuse it with a library.
(An interesting offering from the stacks of the New York Public Library is this mid-19th century biographic registry of the wealthy burghers of New York: “Capitalists whose wealth is estimated at one hundred thousand dollars and upwards…”)

the creeping (digital) death of fair use

Meant to post about this last week but it got lost in the shuffle… In case anyone missed it, Tarleton Gillespie of Cornell has published a good piece in Inside Higher Ed about how sneaky settings in course management software are effectively eating away at fair use rights in the academy. Public debate tends to focus on the music and movie industries and the ever more fiendish anti-piracy restrictions they build into their products (the latest being the horrendous “analog hole”). But a similar thing is going on in education and it is decidely under-discussed.
Gillespie draws our attention to the “Copyright Permissions Building Block,” a new add-on for the Blackboard course management platform that automatically obtains copyright clearances for any materials a teacher puts into the system. It’s billed as a time-saver, a friendly chauffeur to guide you through the confounding back alleys of copyright.
But is it necessary? Gillespie, for one, is concerned that this streamlining mechanism encourages permission-seeking that isn’t really required, that teachers should just invoke fair use. To be sure, a good many instructors never bother with permissions anyway, but if they stop to think about it, they probably feel that they are doing something wrong. Blackboard, by sneakily making permissions-seeking the default, plays to this misplaced guilt, lulling teachers away from awareness of their essential rights. It’s a disturbing trend, since a right not sufficiently excercised is likely to wither away.
Fair use is what oxygenates the bloodstream of education, allowing ideas to be ideas, not commodities. Universities, and their primary fair use organs, libraries, shouldn’t be subjected to the same extortionist policies of the mainstream copyright regime, which, like some corrupt local construction authority, requires dozens of permits to set up a simple grocery store. Fair use was written explicitly into law in 1976 to guarantee protection. But the market tends to find a way, and code is its latest, and most insidious, weapon.
Amazingly, few academics are speaking out. John Holbo, writing on The Valve, wonders:

Why aren’t academics – in the humanities in particular – more exercised by recent developments in copyright law? Specifically, why aren’t they outraged by the prospect of indefinite copyright extension?…
…It seems to me odd, not because overextended copyright is the most pressing issue in 2005 but because it seems like a social/cultural/political/economic issue that recommends itself as well suited to be taken up by academics – starting with the fact that it is right here on their professional doorstep…

Most obviously on the doorstep is Google, currently mired in legal unpleasantness for its book-scanning ambitions and the controversial interpretation of fair use that undergirds them. Why aren’t the universities making a clearer statement about this? In defense? In concern? Soon, when search engines move in earnest into video and sound, the shit will really hit the fan. The academy should be preparing for this, staking out ground for the healthy development of multimedia scholarship and literature that necessitates quotation from other “texts” such as film, television and music, and for which these searchable archives will be an essential resource.
Fair use seems to be shrinking at just the moment it should be expanding, yet few are speaking out.

microsoft joins open content alliance

Microsoft’s forthcoming “MSN Book Search” is the latest entity to join the Open Content Alliance, the non-controversial rival to Google Print. ZDNet says: “Microsoft has committed to paying for the digitization of 150,000 books in the first year, which will be about $5 million, assuming costs of about 10 cents a page and 300 pages, on average, per book…”
Apparently having learned from Google’s mistakes, OCA operates under a strict “opt-in” policy for publishers vis-a-vis copyrighted works (whereas with Google, publishers have until November 1 to opt out). Judging by the growing roster of participants, including Yahoo, the National Archives of Britain, the University of California, Columbia University, and Rice University, not to mention the Internet Archive, it would seem that less hubris equals more results, or at least lower legal fees. Supposedly there is some communication between Google and OCA about potential cooperation.
Also story in NY Times.

to some writers, google print sounds like a sweet deal

Wired has a piece today about authors who are in favor of Google’s plans to digitize millions of books and make them searchable online. Most seem to agree that obscurity is a writer’s greatest enemy, and that the exposure afforded by Google’s program far outweighs any intellectual property concerns. Sometimes to get more you have to give a little.
The article also mentions the institute.

debating google print

The Washington Post has run a pair of op-eds, one from each side of the Google Print dispute. Neither says anything particularly new. Moreover, they enforce the perception that there can be only two positions on the subject — an endemic problem in newspaper opinion pages with their addiction to binaries, where two cardboard boxers are allotted their space to throw a persuasive punch. So you’re either for Google or against it? That’s awfully close to you’re either for technology — for progress — or against it. Unfortunately, like technology’s impact, the Google book-scanning project is a little trickier to figure out, and a more nuanced conversation is probably in order.
The first piece, “Riches We Must Share…”, is submitted in support of Google by University of Michigan President Sue Coleman (a partner in the Google library project). She argues that opening up the elitist vaults of the world’s great (english) research libraries will constitute a democratic revolution. “We believe the result can be a widening of human conversation comparable to the emergence of mass literacy itself.” She goes on to deliver some boilerplate about the “Net Generation” — too impatient to look for books unless they’re online etc. etc. (great to see a major university president being led by the students instead of leading herself).
Coleman then devotes a couple of paragraphs to the copyright question, failing to tackle any of its controversial elements:

Universities are no strangers to the responsible management of complex copyright, permission and security issues; we deal with them every day in our classrooms, libraries, laboratories and performance halls. We will continue to work within the current criteria for fair use as we move ahead with digitization.

The problem is, Google is stretching the current criteria of fair use, possibly to the breaking point. Coleman does not acknowledge or address this. She does, however, remind the plaintiffs that copyright is not only about the owners:

The protections of copyright are designed to balance the rights of the creator with the rights of the public. At its core is the most important principle of all: to facilitate the sharing of knowledge, not to stifle such exchange.

All in all a rather bland statement in support of open access. It fails to weigh in on the fair use question — something about which the academy should have a few things to say — and does not indicate any larger concern about what Google might do with its books database down the road.
The opposing view, “…But Not at Writers’ Expense”, comes from Nick Taylor, writer, and president of the Authors’ Guild (which sued Google last month). Taylor asserts that mega-rich Google is tramping on the dignity of working writers. But a couple of paragraphs in, he gets a little mixed up about contemporary publishing:

Except for a few big-name authors, publishers roll the dice and hope that a book’s sales will return their investment. Because of this, readers have a wealth of wonderful books to choose from.

A dubious assessment, since publishing conglomerates are not exactly enthusiastic dice rollers. I would counter that risk-averse corporate publishing has steadily shrunk the number of available titles, counting on a handful of blockbusters to drive the market. Taylor goes on to defend not just the publishing status quo, but the legal one:

Now that the Authors Guild has objected, in the form of a lawsuit, to Google’s appropriation of our books, we’re getting heat for standing in the way of progress, again for thoughtlessly wanting to be paid. It’s been tradition in this country to believe in property rights. When did we decide that socialism was the way to run the Internet?

First of all, it’s funny to think of the huge corporations that dominate the web as socialist. Second, this talk about being paid for appropriating books for a search database is revealing of the two totally different worldviews that are at odds in this struggle. The authors say that any use of their book requires a payment. Google sees including the books in the database as a kind of payment in itself. No one with a web page expects Google to pay them for indexing their site. They are grateful that they do! Otherwise, they are totally invisible. This is the unspoken compact that underpins web search. Google assumed the same would apply with books. Taylor says not so fast.
Here’s Taylor on fair use:

Google contends that the portions of books it will make available to searchers amount to “fair use,” the provision under copyright that allows limited use of protected works without seeking permission. That makes a private company, which is profiting from the access it provides, the arbiter of a legal concept it has no right to interpret. And they’re scanning the entire books, with who knows what result in the future.

Actually, Google is not doing all the interpreting. There is a legal precedent for Google’s reading of fair use established in the 2003 9th Circuit Court decision Kelly v. Arriba Soft. In the case, Kelly, a photographer, sued Arriba Soft, an online image search system, for indexing several of his photographs in their database. Kelly believed that his intellectual property had been stolen, but the court ruled that Arriba’s indexing of thumbnail-sized copies of images (which always linked to their source sites) was fair use: “Arriba’s use of the images serves a different function than Kelly’s use – improving access to information on the internet versus artistic expression.” Still, Taylor’s “with who knows what result in the future” concern is valid.
So on the one hand we have many writers and most publishers trying to defend their architecture of revenue (or, as Taylor would have it, their dignity). But I can’t imagine how Google Print would really be damaging that architecture, at least not in the foreseeable future. Rather it leverages it by placing it within the frame of another architecture: web search. The irony for the authors is that the current architecture doesn’t seem to be serving them terribly well. With print-on-demand gaining in quality and legitimacy, online book search could totally re-define what is an acceptable risk to publishers, and maybe more non-blockbuster authors would get published.
On the other hand we have the universities and libraries participating in Google’s program, delivering the good news of accessibility. But they are not sufficiently questioning what Google might do with its database down the road, or the implications of a private technology company becoming the principal gatekeeper of the world’s corpus.
If only this debate could be framed in a subtler way, rather than the for-Google-or-against-it paradigm we have now. I’m cautiously optimistic about the effect of having books searchable on the web. And I tend to believe it will be beneficial to authors and publishers. But I have other, deep reservations about the direction in which Google is heading, and feel that a number of things could go wrong. We think the cencorship of the marketplace is bad now in the age of publishing conglomerates. What if one company has total control of everything? And is keeping track of every book, every page, that you read. And is reading you while you read, throwing ads into your peripheral vision. I’m curious to hear from readers what they feel could be the hazards of Google Print.

yahoo! announces book-scanning project to rival google’s

Yahoo, in collaboration with The Internet Archive, Adobe, O’Reilly Media, Hewlett Packard Labs, the University of California, the University of Toronto, The National Archives of England, and others, will be participating in The Open Content Alliance, a book and media archiving project that will greatly enlarge the body of knowledge available online. At first glance, it appears the program will focus primarily on public domain works, and in the case of copyrighted books, will seek to leverage the Creative Commons.
Google Print, on the other hand, is more self-consciously a marketing program for publishers and authors (although large portions of the public domain will be represented as well). Google aims to make money off its indexing of books through keyword advertising and click-throughs to book vendors. Yahoo throwing its weight behind the “open content” movement seems on the surface to be more of a philanthropic move, but clearly expresses a concern over being outmaneuvered in the search wars. But having this stuff available online is clearly a win for the world at large.
The Alliance was conceived in large part by Brewster Kahle of the Internet Archive. He announced the project on Yahoo’s blog:

To kick this off, Internet Archive will host the material and sometimes helps with digitization, Yahoo will index the content and is also funding the digitization of an initial corpus of American literature collection that the University of California system is selecting, Adobe and HP are helping with the processing software, University of Toronto and O’Reilly are adding books, Prelinger Archives and the National Archives of the UK are adding movies, etc. We hope to add more institutions and fine tune the principles of working together.
Initial digitized material will be available by the end of the year.

More in:
NY Times
Chronicle of Higher Ed.

enter the cybrarian

inside1-googling-libraries.jpg The recent buzz surrounding Google’s library intitiative has everyone talking about the future of research, which inevitably raises the question: how will the digitization of library collections change the role of the librarian? I would guess that, far from becoming obsolete, their role will in fact be elevated in importance, if not necessarily in status. They could very well come to be our indispensible guides through the labyrinth – if perhaps invisible, engineering behind the digital walls.
It’s also important to consider the question of visualization. When you run a search on Google you are given an enormous list. This is already deeply ingrained in the day-to-day business of finding information. But these lists are basically the electronic equivelant of scrolls, with the items algorithmically determined to be most relevant placed at the top. But sooner or later we have to admit that using scrolls for this kind of business is ludicrous. There has to be a better way of arraying these vast harvests of information in a way that allows the researcher to zoom across degrees of specificity and through associative chains of context and meaning. I see no reason why a search shouldn’t take place in some kind of virtual library, emulating the physical architecture of research settings, and allowing for some of the associative or accidental echoes that so often enrich a paper trail blazed through a brick-and-mortar library. Or cannot knowledge resemble a tree, or an arterial matrix? Must we be bound to the scroll?
Returning to the question of the librarian’s role, I recalled this passage from James J O’Donnell’s 1996 paper The Pragmatics of the New: Trithemius, McLuhan, Cassiodorus:
“The librarians of the world have, moreover, already led the way, for academics at least, into the new information environment, not least because they are caught between rising demand from their customers (faculty and students) and rising supply and prices from their suppliers, and so have already been making reality-based decisions about ownership versus access, print versus electronics, and so on. In short, they are just now our leading pragmatists. Can we imagine a time in our universities when the librarians are the well-paid principals and the teachers their mere acolytes in a distribution chain? I do not think we can or should rule out that possibility for a moment”
oldgoogle.jpg
Related articles:
“Questions and Praise for Google Web Library” – NY Times
“Google’s library plan ‘a huge help'” – USA Today
“Making books readable on computer proves trying task” – USA Today
Also, I found this on Searchblog. For a trip down memory lane, check out the original Google in the Stanford archives (click on picture to right). Unfortunately, although it seems interactive, a search just brings up a bunch of stylesheets.

books behind bars – the Google library project

How useful will this service be for in-depth research when copyrighted books (which will account for a huge percentage of searchable texts) cannot be fully accessed? In such cases, a person will be able to view only a selection of pages (depending on agreements with publishers), and will find themselves bombarded with a variety of retail options. On a positive note, the search will be able to refer the user to any local libraries where the desired book is available, but still, the focus here remains squarely on digital texts as simply a means of getting to print texts.
Absent a major paradigm shift with regard to the accessibility and inherent virtue of electronic texts, this ambitious project will never achieve its full potential. For someone searching outside the public domain, the Google library project may amount to nothing more than a guided tour through a prison of incarcerated texts. I’ve found this to be true so far with Google Scholar – it turned up a lot of interesting stuff, but much of it was password protected or required purchase.
article in Filter: Google — 21st Century Dewey Decimal System (washingtonpost.com)