Category Archives: google

explosion

A Nov. 18 post on Adam Green’s Darwinian Web makes the claim that the web will “explode” (does he mean implode?) over the next year. According to Green, RSS feeds will render many websites obsolete:
The explosion I am talking about is the shifting of a website’s content from internal to external. Instead of a website being a “place” where data “is” and other sites “point” to, a website will be a source of data that is in many external databases, including Google. Why “go” to a website when all of its content has already been absorbed and remixed into the collective datastream.
Does anyone agree with Green? Will feeds bring about the restructuring of “the way content is distributed, valued and consumed?” More on this here.

world digital library

The Library of Congress has announced plans for the creation of a World Digital Library, “a shared global undertaking” that will make a major chunk of its collection freely available online, along with contributions from other national libraries around the world. From The Washington Post:

…[the] goal is to bring together materials from the United States and Europe with precious items from Islamic nations stretching from Indonesia through Central and West Africa, as well as important materials from collections in East and South Asia.

Google has stepped forward as the first corporate donor, pledging $3 million to help get operations underway. At this point, there doesn’t appear to be any direct connection to Google’s Book Search program, though Google has been working with LOC to test and refine its book-scanning technology.

google print is no more

Not the program, of course, just the name. From now on it is to be known as Google Book Search. “Print” obviously struck a little too close to home with publishers and authors. On the company blog, they explain the shift in emphasis:

No, we don’t think that this new name will change what some folks think about this program. But we do believe it will help a lot of people understand better what we’re doing. We want to make all the world’s books discoverable and searchable online, and we hope this new name will help keep everyone focused on that important goal.

all your base are belong to google

Google Base is live and ready for our stuff.
In AP: “New Project Will Expand Google’s Reach”

the book in the network – masses of metadata

In this weekend’s Boston Globe, David Weinberger delivers the metadata angle on Google Print:

…despite the present focus on who owns the digitized content of books, the more critical battle for readers will be over how we manage the information about that content-information that’s known technically as metadata.
…we’re going to need massive collections of metadata about each book. Some of this metadata will come from the publishers. But much of it will come from users who write reviews, add comments and annotations to the digital text, and draw connections between, for example, chapters in two different books.
As the digital revolution continues, and as we generate more and more ways of organizing and linking books-integrating information from publishers, libraries and, most radically, other readers-all this metadata will not only let us find books, it will provide the context within which we read them.

The book in the network is a barnacled spirit, carrying with it the sum of its various accretions. Each book is also its own library by virtue not only of what it links to itself, but of what its readers are linking to, of what its readers are reading. Each book is also a milk crate of earlier drafts. It carries its versions with it. A lot of weight for something physically weightless.

having browsed google print a bit more…

…I realize I was over-hasty in dismissing the recent additions made since book scanning resumed earlier this month. True, many of the fine wines in the cellar are there only for the tasting, but the vintage stuff can be drunk freely, and there are already some wonderful 19th century titles, at this point mostly from Harvard. The surest way to find them is to search by date, or by title and date. Specify a date range in advanced search or simply enter, for example, “date: 1890” and a wealth of fully accessible texts comes up, any of which can be linked to from a syllabus. An astonishing resource for teachers and students.
The conclusion: Google Print really is shaping up to be a library, that is, of the world pre-1923 — the current line of demarcation between copyright and the public domain. It’s a stark reminder of how over-extended copyright is. Here’s an 1899 english printing of The Mahabharata:

A charming detail found on the following page is this old Harvard library stamp that got scanned along with the rest:

pages á la carte

The New York Times reports on programs being developed by both Amazon and Google that would allow readers to purchase online access to specific sections of books — say, a single recipe from a cookbook, an individual chapter from a how-to manual, or a particular short story or poem from an anthology. Such a system would effectively “unbind” books into modular units that consumers patch into their online reading, just as iTunes blew apart the integrity of the album and made digital music all about playlists. We become scrapbook artists.
It seems Random House is in on this too, developing a micropayment model and consulting closely with the two internet giants. Pages would sell for anywhere between five and 25 cents each.

google print’s not-so-public domain

Google’s first batch of public domain book scans is now online, representing a smattering of classics and curiosities from the collections of libraries participating in Google Print. Essentially snapshots of books, they’re not particularly comfortable to read, but they are keyword-searchable and, since no copyright applies, fully accessible.
The problem is, there really isn’t all that much there. Google’s gotten a lot of bad press for its supposedly cavalier attitude toward copyright, but spend a few minutes browsing Google Print and you’ll see just how publisher-centric the whole affair is. The idea of a text being in the public domain really doesn’t amount to much if you’re only talking about antique manuscripts, and these are the only books that they’ve made fully accessible. Daisy Miller‘s copyright expired long ago but, with the exception of Harvard’s illustrated 1892 copy, all the available scanned editions are owned by modern publishers and are therefore only snippeted. This is not an online library, it’s a marketing program. Google Print will undeniably have its uses, but we shouldn’t confuse it with a library.
(An interesting offering from the stacks of the New York Public Library is this mid-19th century biographic registry of the wealthy burghers of New York: “Capitalists whose wealth is estimated at one hundred thousand dollars and upwards…”)

the creeping (digital) death of fair use

Meant to post about this last week but it got lost in the shuffle… In case anyone missed it, Tarleton Gillespie of Cornell has published a good piece in Inside Higher Ed about how sneaky settings in course management software are effectively eating away at fair use rights in the academy. Public debate tends to focus on the music and movie industries and the ever more fiendish anti-piracy restrictions they build into their products (the latest being the horrendous “analog hole”). But a similar thing is going on in education and it is decidely under-discussed.
Gillespie draws our attention to the “Copyright Permissions Building Block,” a new add-on for the Blackboard course management platform that automatically obtains copyright clearances for any materials a teacher puts into the system. It’s billed as a time-saver, a friendly chauffeur to guide you through the confounding back alleys of copyright.
But is it necessary? Gillespie, for one, is concerned that this streamlining mechanism encourages permission-seeking that isn’t really required, that teachers should just invoke fair use. To be sure, a good many instructors never bother with permissions anyway, but if they stop to think about it, they probably feel that they are doing something wrong. Blackboard, by sneakily making permissions-seeking the default, plays to this misplaced guilt, lulling teachers away from awareness of their essential rights. It’s a disturbing trend, since a right not sufficiently excercised is likely to wither away.
Fair use is what oxygenates the bloodstream of education, allowing ideas to be ideas, not commodities. Universities, and their primary fair use organs, libraries, shouldn’t be subjected to the same extortionist policies of the mainstream copyright regime, which, like some corrupt local construction authority, requires dozens of permits to set up a simple grocery store. Fair use was written explicitly into law in 1976 to guarantee protection. But the market tends to find a way, and code is its latest, and most insidious, weapon.
Amazingly, few academics are speaking out. John Holbo, writing on The Valve, wonders:

Why aren’t academics – in the humanities in particular – more exercised by recent developments in copyright law? Specifically, why aren’t they outraged by the prospect of indefinite copyright extension?…
…It seems to me odd, not because overextended copyright is the most pressing issue in 2005 but because it seems like a social/cultural/political/economic issue that recommends itself as well suited to be taken up by academics – starting with the fact that it is right here on their professional doorstep…

Most obviously on the doorstep is Google, currently mired in legal unpleasantness for its book-scanning ambitions and the controversial interpretation of fair use that undergirds them. Why aren’t the universities making a clearer statement about this? In defense? In concern? Soon, when search engines move in earnest into video and sound, the shit will really hit the fan. The academy should be preparing for this, staking out ground for the healthy development of multimedia scholarship and literature that necessitates quotation from other “texts” such as film, television and music, and for which these searchable archives will be an essential resource.
Fair use seems to be shrinking at just the moment it should be expanding, yet few are speaking out.

microsoft joins open content alliance

Microsoft’s forthcoming “MSN Book Search” is the latest entity to join the Open Content Alliance, the non-controversial rival to Google Print. ZDNet says: “Microsoft has committed to paying for the digitization of 150,000 books in the first year, which will be about $5 million, assuming costs of about 10 cents a page and 300 pages, on average, per book…”
Apparently having learned from Google’s mistakes, OCA operates under a strict “opt-in” policy for publishers vis-a-vis copyrighted works (whereas with Google, publishers have until November 1 to opt out). Judging by the growing roster of participants, including Yahoo, the National Archives of Britain, the University of California, Columbia University, and Rice University, not to mention the Internet Archive, it would seem that less hubris equals more results, or at least lower legal fees. Supposedly there is some communication between Google and OCA about potential cooperation.
Also story in NY Times.

if:book

A Project of the Institute for the Future of the Book