Tag Archives: libraries

google, digitization and archives: despatches from if:book

In discussing with other Institute folks how to go about reviewing four year’s worth of blog posts, I’ve felt torn at times. Should I cherry-pick ‘thinky’ posts that discuss a particular topic in depth, or draw out narratives from strings of posts each of which is not, in itself, a literary gem but which cumulatively form the bedrock of the blog? But I thought about it, and realised that you can’t really have one without the other.
Fair use, digitization, public domain, archiving, the role of libraries and cultural heritage are intricately interconnected. But the name that connects all these issues over the last few years has been Google. The Institute has covered Google’s incursions into digitization of libraries (amongst other things) in a way that has explored many of these issues – and raised questions that are as urgent as ever. Is it okay to privatize vast swathes of our common cultural heritage? What are the privacy issues around technology that tracks online reading? Where now for copyright, fair use and scholarly research?
In-depth coverage of Google and digitization has helped to draw out many of the issues central to this blog. Thus, in drawing forth the narrative of if:book’s Google coverage is, by extension, to watch a political and cultural stance emerging. So in this post I’ve tried to have my cake and eat it – to trace a story, and to give a sense of the depth of thought going into that story’s discussion.
In order to keep things manageable, I’ve kept this post to a largely Google-centric focus. Further reviews covering copyright-related posts, and general discussion of libraries and technology will follow.
2004-5: Google rampages through libraries, annoys Europe, gains rivals
In December 2004, if:book’s first post about Google’s digitization of libraries gave the numbers for the University of Michigan project.
In February 2005, the head of France’s national libraries raised a battle cry against the Anglo-centricity implicit in Google’s plans to digitize libraries. The company’s seemingly relentless advance brought Europe out in force to find ways of forming non-Google coalitions for digitization.
In August, Google halted book scans for a few months to appease publishers angry at encroachments on their copyright. But this was clearly not enough, as in October 2005, Google was sued (again) by a string of publishers for massive copyright infringement. However, undeterred either by European hostility or legal challenges, the same month the company made moves to expand Google Print into Europe. Also in October 2005, Yahoo! launched the Open Content Alliance, which was joined by Microsoft around the same time. Later the same month, a Wired article put the case for authors in favor of Google’s searchable online archive.
In November 2005 Google announced that from here on in Google Print would be known as Google Book Search, as the ‘Print’ reference perhaps struck too close to home for publishers. The same month, Ben savaged Google Print’s ‘public domain’ efforts – then recanted (a little) later that month.
In December 2005 Google’s digitization was still hot news – the Institute did a radio show/podcast with Open Source on the topic, and covered the Google Book Search debate at the American Bar Association. (In fact, most of that month’s posts are dedicated to Google and digitization and are too numerous to do justice to here).
2006: Digitization spreads
By 2006, digitization and digital archives – with attendant debates – are spreading. From January through March, three posts – ‘The book is reading you’ parts 1, 2 and 3 looked at privacy, networked books, fair use, downloading and copyright around Google Book Search. Also in March, a further post discussed Google and Amazon’s incursions into publishing.
In April, the Smithsonian cut a deal with Showtime making the media company a preferential media partner for documentaries using Smithsonian resources. Jesse analyzed the implications for open research.
In June, the Library of Congress and partners launched a project to make vintage newspapers available online. Google Book Search, meanwhile, was tweaked to reassure publishers that the new dedicated search page was not, in fact, a library. The same month, Ben responded thoughtfully in June 2006 to a French book attacking Google, and by extension America, for cultural imperialism. The debate continued with a follow-up post in July.
In August, Google announceddownloadable PDF versions of many of its public-domain books. Then, in August, the publication of Google’s contract with UCAL’s library prompted some debate the same month. In October we reported on Microsoft’s growing book digitization list, and some criticism of the same from Brewster Kahle. The same month, we reported that the Dutch government is pouring millions into a vast public digitization program.
In December, Microsoft launched its (clunkier) version of Google Books, Microsoft Live Book Search.

2007: Google is the environment

In January, former Netscape player Rich Skrenta crowned Google king of the ‘third age of computing’: ‘Google is the environment’, he declared. Meanwhile, having seemingly forgotten 2005’s tussles, the company hosted a publishing conference at the New York Public Library. In February the company signed another digitization deal, this time with Princeton; in August, this institution was joined by Cornell, and the Economist compared Google’s databases to the banking system of the information age. The following month, Siva’s first Monday podcast discussed the Googlization of libraries.
By now, while Google remains a theme, commercial digitization of public-domain archives is a far broader issue. In January, the US National Archives cut a digitization deal with Footnote, effectively paywalling digital access to a slew of public-domain documents; in August, a deal followd with Amazon for commercial distribution of its film archive. The same month, two major audiovisual archiving projects launched.
In May, Ben speculated about whether some ‘People’s Card Catalog’ could be devised to rival Google’s gated archive. The Open Archive launched in July, to mixed reviews – the same month that the ongoing back-and-forth between the Institute and academic Siva Vaidyanathan bore fruit. Siva’s networked writing project, The Googlization Of Everything, was announced (this would be launched in September). Then, in August, we covered an excellent piece by Paul Duguid discussing the shortcomings of Google’s digitization efforts.
In October, several major American libraries refused digitization deals with Google. By November, Google and digitization had found its way into the New Yorker; the same month the Library of Congress put out a call for e-literature links to be archived.

2008: All quiet?

In January we reported that LibraryThing interfaces with the British Library, and in March on the launch of an API for Google Books. Siva’s book found a print publisher the same month.
But if Google coverage has been slighter this year, that’s not to suggest a happy ending to the story. Microsoft abandoned its book scanning project in mid-May of this year, raising questions about the viability of the Open Content Alliance. It would seem as though Skrenta was right. The Googlization of Everything continues, less challenged than ever.

looking at libraries

A few weeks back though the auspices of TED, I paid a visit to a private library. The owner doesn’t want publicity, and I won’t reveal details, but it was a staggeringly beautiful (if idiosyncratic) collection, and I can’t imagine that there are many collections in private hands that rival it in value in the United States. Just about every lavish book imaginable was present: an elephant folio of Audobon along with a full set of John Gould‘s more sumptuous prints of birds; a Kelmscott Chaucer; a page from a Gutenberg Bible; a first edition of Johnson’s Dictionary; countless antique atlases of anatomy and cosmography; the Arion Press edition of Ulysses illustrated by Robert Motherwell; hand-illuminated Books of Hours. There were exquisite jeweled bindings, books woven entirely from silk, and doubtless many more things that couldn’t be seen in a three-hour tour. The collector mentioned in passing that he was thinking of buying a Wyclif Bible for around $600,000 because he didn’t have one yet.

Being no stranger to libraries, I’d seen many of these books before. Generally they’re the sort of books you see in the context of a museum or library, occasionally for sale in a gallery. They’re the sort of books that are generally found safely behind glass, books that one wears white gloves to touch. This was not such a collection: it’s not open to the public at all, only to the collector’s friends. A librarian would also be astonished that this collection of 30,000 books has no catalogue – the owner shelves all the books himself (by height, for which there’s historical precedent) and claims that he remembers where he put things. But what was most striking to me about my visit was how freely the books were handled by the owner, and how freely he allowed his guests to handle his books – not in a cavalier way, but in the way one touches a book one owns. The librarian in me suppressed a gasp when the owner explained how in the summer he opens the bay windows of the library and lets the breeze in. I’m sure that’s not how the Morgan Library works.

The collector can afford to let his visitors touch his books. In a way, the books in his collection are functioning as they are intended to function: as objects to be read and appreciated. They’re also functioning as signifiers of luxury. His collection is a repository of wealth in a way less metaphorical than we usually talk about library as repositories. No library, private or public, exists entirely outside of this economic system; it’s an integral part of the way we consider books.

Walking north on Laguardia Place last week, I was struck by how monolithic NYU’s Bobst Library appears from the south: it’s a hulking red-brick edifice that admits no entrance:

the outside and inside of bobst library at nyu

From inside it’s all windows and light, open stacks to be browsed. But: there’s the matter of getting inside, as admission is reserved to those with an NYU ID card. Those without cards are excluded. This is a necessary condition for the library to function: long ago on this blog I bemoaned the condition of the Brooklyn Library, where it’s almost impossible to find any book you’re looking for, though there’s still the pleasure of browsing. The quality of a collection seems to be inversely related to the number of people kept out. Keeping the books in and the world out is demonstrated elegantly by the thin marble windows of Yale’s Beinecke library which admit a small amount of light but not the viewer’s gaze:

outside and inside the beinecke

What’s inside and outside – who’s inside and outside – are completely separated. The poet Susan Howe inspects this separation in her book The Midnight, a volume which takes as one of its primary subjects interleaves, the sheets of tissue paper that publishers once put next to plates in books “in order to prevent illustration and text from rubbing together.” Howe’s work tends to be archivally based: she looks at how manuscripts are read or misread, and consequently has spent a lot of time in libraries. In this prose passage from the book, part of a section entitled “Scare Quotes II”, she looks at the way one enters Houghton, Harvard’s analogue to the Beinecke:

1991. Entering Houghton Library: Harvard Yard, 9:00 a.m., a fine June summer morning. At the entrance to the red-brick building designed by Robert C. Dean of Perry, Shaw and Hepburn in 1940, two single wooden doors with hinges, concealing two modernist plate glass doors without frames, have been swung into recesses to the left and right so as to be barely visible during open hours. The only metal fitting in each glass consists of a polished horizontal bar at waist height a visitor must pull to open. I enter an oval vestibule, about 10 feet wide and 5–6 feet deep, before me double doors again; again plate glass.

Passing through this first vestibule I find myself in an oval reception antechamber about 35 feet wide and 20 feet deep under what appears to be a ceiling with a dome at its apex. I think I see sunlight but closer inspection reveals electric light concealed under a slightly dropped form, also oval, illuminating the ceiling above. This first false skylight resembles a human eye and the central oval disc its ‘pupil.’ Maybe ghosts exist as spatiotemporal coordinates, even if they themselves do not occupy space, even if you’ve never seen one, so what? If the design of the antechamber can be read in terms of power and regimes of library control, and if ghosts ‘presently’ ‘occupy’ papers, you need to understand the present tense of ‘occupy.’

To enter this neo-Georgian building (a few Modernist touches added) with its state of the art technology for air filtration, security and controlled temperature and humidity for the preservation of materials, is to turn away from contemporary city life with all its follies and parasites in search of a second coming for dry bones. When the soul of a scholar has an inward bent and bias for an author in the Kingdom of Houghton, it is never at rest, until here. Perversely, nothing in Houghton awakens security sooner than curiosity.

Here – every researcher can be a perpetrator.

( pp. 120–121) While Houghton isn’t as architecturally ostentatious as the Beinecke, Howe’s scrutiny of the architecture of its entrance reveals it to be just as concerned with control. There’s a pessimistic view of human behavior embedded in library construction and the watchfulness of the sentries who guard them: if we, the public, could get at the books, we would most certainly destroy them.

There was the expectation that the barriers would be torn down with the coming of electronic libraries, that once the book’s spirit left its object, it would likewise escape its economic shackles. Certainly it makes sense: an electronic text isn’t degraded by copying in the same way that every reading is an infinitesimal destruction of a physical book. It’s unclear, however, that the media universe that’s unfolding is following this pattern: while sites like archive.org present a new model, projects like Google Books simply reconfigure the gates.

holiday round up

The institute is pleased to announce the release of the blog Without Gods. Mitchell Stephens is using this blog as a public workshop and forum for his work on his latest book which focuses on the history of atheism.
The wikipedia debate continues as Chris Anderson of Wired Magazine weighs in that people are uncomfortable with wikipedia because they cannot comprehend that emergent systems can produce “correct” answers on the marcoscale even if no one is really watching the microscale. Lisa poses that Anderson goes too far in the defense of wikipedia and that blind faith in the system is equally disconcerting, if not more so.
ITP Winter 2005 show had several institute related projects. Among them were explorations in digital graphic reinterpretation of poetry, student social networks, navigating New York through augmented reality, and manipulating video to document the city by freezing time and space.
Lisa discovered an interesting historical parallel in an article from Dial dating back to 1899. The specialized bookseller’s demise is lamented with the introduction of department store selling books as lost leader, not unlike today’s criticisms of Amazon. As libraries are increasing their relationships with the private sector, Lisa notes that some bookstores are playing a role in fostering intellectual and culture communities, a role which libraries traditionally held.
Lisa looked at Reed Johnson assertion in the Los Angeles Times that 2005 was the year that mass media has given way to the consumer-driven techno-cultural revolution.
The discourse on game space continues to evolve with Edward Castronova’s new book Synthetic Worlds. As millions of people spend more time in the immersive environments, Castronova looks at how the real and the virtual are blending through the lens of an economist.
In another advance for content creation volunteerism, LibriVox is creating and distributing audio files of public domain literature. Ranging from the Wizard of Oz to the US Constitution, Lisa was impressed by the quality of the recordings, which are voiced and recorded by volunteers who feel passionate about a particular work.

the future of the book(store), circa 1899 and 2005

Leafing through an 1899 issue of the literary magazine The Dial, I came across an article called “The Distribution of Books” which resonated with the present moment at several uncanny junctures, and got me thinking about the evolving relationship between publishers, libraries, bookstores, and Google Book Search — thoughts which themselves evolved after a conversation with a writer from Pages magazine about the future of bookstores.
“The Distribution of Books” focused mainly on changes in the way books were marketed and distributed, warning that bookstores might go out of business if they failed to change their own business practices in response. “Once more the plaint of the bookseller is heard in the land,” lamented the author, “and one would be indeed stony-hearted who could view his condition without concern.”

According to “The Distribution of Books,” what should have been the privileged domain of the bookseller was being eroded at the century’s end by the book sales of “the great dealers in miscellaneous merchandise.” The article was referring to the department stores that sold books at a loss in order to lure in customers: a bit less than a century later, critics would make the same claims about Amazon, that great dealer in miscellaneous merchandise now celebrating its tenth anniversary. “The Distribution of Books” also complains of the direct marketing practices of publishers who attempted to market to readers directly. This past year, similar complaints were made after Random House joined Scholastic and Simon and Schuster this year in establishing a direct-sale online presence.
Of course, 2005 is not 1899, and this is what makes the Dial piece so startling in its familiarity: in 1899, after all, the distinction between publisher and bookseller was much fresher than now. Hybrid merchant/tradesman who printed, marketed and distributed books at the same time had been the norm for a much longer interval than the shop owner who ordered books from a variety of different publishing houses. In this sense, the publisher’s “new” practice of selling books directly was in fact a modification of bookselling practices that predated the specialized bookshop. Ultimately, the Dial piece is less about the demise of the bookseller than about the imagined demise of a relatively recent phenomenon — the specialized book seller with an investment in promoting the culture of books generally rather than the work of a specific author or publisher.
This tension between specialization and generalization also revealed itself in the article’s most indignant passage, in which the author expressed outrage over the idea that libraries might themselves get involved in bookselling. According to the Dial, bookstore owners had been subjected to:
an onslaught so unexpected and so startling it left [them] gasping for breath — [a suggestion] made a few months ago by librarian Dewey, who calmly proposed that the public libraries throughout the country should be book-selling as well as book-circulating agencies… Booksellers have always looked askance at public libraries, not understanding how they create an appetite for reading that is sure in the end to redound to the bookseller’s advantage, but their suspicious fears never anticipated the explosion in their camp of such a bombshell as this.
After delivering the “bombshell,” the author goes on to reassure the reader that Dewey’s suggestion (yes, that would be Melvil Dewey, inventor of the Dewey Decimal System) could never be taken seriously in America: such a venture on the part of the nation’s libraries would represent a socialistic entangling of the spheres of government and industry. Books sold by libraries would be sold without an eye to profit, conjectured the author, and publishing —-and perhaps the notion of the private sector itself — would collapse. “If the state or the municipality were to go into the business of selling books at cost, what should prevent it from doing the like with groceries?”
While the Dial piece made me think about the ways in which the perceived “new” threats to today’s bookstores might not be so new, it also made me consider how Dewey’s proposal might emerge in modified form in the digital era. While present-day libraries haven’t been proposing the sale of books, they certainly are planning to get into the business of marketing and distribution, as the World Digital Library attests. They are also proposing, as Librarian of Congress librarian James Billington has said, a shift toward significant partnerships with for-profit businesses which have (for various reasons) serious economic stakes in sifting through digital materials. And, as Ben noted a few weeks ago, libraries themselves have been using various strategies from online retailers to catalog and present information.
Just as libraries are starting to embrace the private sector, many bookstores are heading in the other direction: driven to the verge of extinction by poor profits, they are reinventing themselves as nonprofits that serve a valuable social and cultural function. Sure, books are still for sale, but the real “value” of a bookstore is now lies not in its merchandise, but in the intellectual or cultural community it fosters: in that respect, some bookstores are thus akin to the subscription libraries of the past.
Is it so impossible to imagine a future in which one walks into a digital distribution center, orders a latte, and uses an Amazon-type search engine to pull up the ebook that can be read at one’s reading station after the requisite number of ads have flashed on the screen? Is this a library? Is this a bookstore? Does it matter? Should it?

more on wikipedia

As summarized by a Dec. 5 article in CNET, last week was a tough one for Wikipedia — on Wednesday, a USA today editorial by John Seigenthaler called Wikipedia “irresponsible” for not catching significant mistakes in his biography, and Thursday, the Wikipedia community got up in arms after discovering that former MTV VJ and longtime podcaster Adam Curry had edited out references to other podcasters in an article about the medium.
In response to the hullabaloo, Wikipedia founder Jimmy Wales now plans to bar anonymous users from creating new articles. The change, which went into effect today, could possibly prevent a repeat of the Seigenthaler debacle; now that Wikipedia would have a record of who posted what, presumably people might be less likely to post potentially libelous material. According to Wales, almost all users who post to Wikipedia are already registered users, so this won’t represent a major change to Wikipedia in practice. Whether or not this is the beginning of a series of changes to Wikipedia that push it away from its “hive mind” origins remains to be seen.
I’ve been surprised at the amount of Wikipedia-bashing that’s occurred over the past few days. In a historical moment when there’s so much distortion of “official” information, there’s something peculiar about this sudden outrage over the unreliability of an open-source information system. Mostly, the conversation seems to have shifted how people think about Wikipedia. Once an information resource developed by and for “us,” it’s now an unreliable threat to the idea of truth imposed on us by an unholy alliance between “volunteer vandals” (Seigenthaler’s phrase) and the outlaw Jimmy Wales. This shift is exemplified by the post that begins a discussion of Wikipedia that took place over the past several days on the Association of Internet Researchers list serve. The scholar who posted suggested that researchers boycott Wikipedia and prohibit their students from using the site as well until Wikipedia develops “an appropriate way to monitor contributions.” In response, another poster noted that rather than boycotting Wikipedia, it might be better to monitor for the site — or better still, write for it.
Another comment worthy of consideration from that same discussion: in a post to the same AOIR listserve, Paul Jones notes that in the 1960s World Book Encyclopedia, RCA employees wrote the entry on television — scarcely mentioning television pioneer Philo Farnsworth, longtime nemesis of RCA. “Wikipedia’s failing are part of a public debate,” Jones writes, “Such was not the case with World Book to my knowledge.” In this regard, the flak over Wikipedia might be considered a good thing: at least it gives those concerned with the construction of facts the opportunity to debate with the issue. I’m just not sure that making Wikipedia the enemy contributes that much to the debate.