Category Archives: archives

google launches archival news search

Today Google unveiled a major extension of its news search service, expanding into periodical archives that stretch back to the mid-18th century. Most of the articles are pay downloads, or pay-per-view, and are offered by Google through licensing agreements with newspapers and existing document retrieval services including The New York Times Co., The Washington Post Co., The Wall Street Journal, Reed Elsevier, LexisNexis and Factiva. Google won’t actually host content or handle payments, it simply presents items with titles, brief excerpts and ordering information. Google also crawls free archives already on the web and mixes these in, and (a nice touch) links all search results to “related web pages,” plugging keywords into a general web search. Google won’t run adds in this service, at least for now. More coverage here and here.
This is a fine service, but it only underscores the need for a non-commercial alternative. Much of the material here is public domain, but is provided through commercial services. Google simply adds a new web-integrated layer. Anyone who believes that the public domain ought to be fully accessible to all should be thinking bigger than Google.

showtiming our libraries

uc seal.png google book search.jpg Google’s contract with the University of California to digitize library holdings was made public today after pressure from The Chronicle of Higher Education and others. The Chronicle discusses some of the key points in the agreement, including the astonishing fact that Google plans to scan as many as 3,000 titles per day, and its commitment, at UC’s insistence, to always make public domain texts freely and wholly available through its web services.
But there are darker revelations as well, and Jeff Ubois, a TV-film archivist and research associate at Berkeley’s School of Information Management and Systems, hones in on some of these on his blog. Around the time that the Google-UC deal was first announced, Ubois compared it to Showtime’s now-infamous compact with the Smithsonian, which caused a ripple of outrage this past April. That deal, the details of which are secret, basically gives Showtime exclusive access to the Smithsonian’s film and video archive for the next 30 years.
The parallels to the Google library project are many. Four of the six partner libraries, like the Smithsonian, are publicly funded institutions. And all the agreements, with the exception of U. Michigan, and now UC, are non-disclosure. Brewster Kahle, leader of the rival Open Content Alliance, put the problem clearly and succinctly in a quote in today’s Chronicle piece:

We want a public library system in the digital age, but what we are getting is a private library system controlled by a single corporation.

He was referring specifically to sections of this latest contract that greatly limit UC’s use of Google copies and would bar them from pooling them in cooperative library systems. I vocalized these concerns rather forcefully in my post yesterday, and may have gotten a couple of details wrong, or slightly overstated the point about librarians ceding their authority to Google’s algorithms (some of the pushback in comments and on other blogs has been very helpful). But the basic points still stand, and the revelations today from the UC contract serve to underscore that. This ought to galvanize librarians, educators and the general public to ask tougher questions about what Google and its partners are doing. Of course, all these points could be rendered moot by one or two bad decisions from the courts.

presidents’ day

Few would disagree that Presidents’ Day, though in theory a celebration of the nation’s highest office, is actually one of our blandest holidays — not so much about history as the resuscitation of commerce from the post-holiday slump. Yesterday, however, brought a refreshing change.

dolley madison.jpg
Daguerreotype of Dolley Madison

Spending the afternoon at the institute was Holly Shulman, a historian from the University of Virginia well known among digital scholarship circles as the force behind the Dolley Madison Project — a comprehensive online portal to the life, letters and times of one of the great figures of the early American republic. So, for once we actually talked about presidential history on Presidents’ Day — only, in this case from the fascinating and chronically under-studied spousal perspective.
Shulman came to discuss possible collaboration on a web-based history project that would piece together the world of America’s founding period — specifically, as experienced and influenced by its leading women. The question, in terms of form, was how to break out of the mould of traditional web archives, which tend to be static and exceedingly hierarchical, and tap more fully into the energies of the network? We’re talking about something you might call open source scholarship — new collaborative methods that take cues from popular social software experiments like Wikipedia, Flickr and del.icio.us yet add new layers and structures that would better ensure high standards of scholarship. In other words: the best of both worlds.
Shulman lamented that the current generation of historians are highly resistant to the idea of electronic publication as anything more than supplemental to print. Even harder to swallow is the open ethos of Wikipedia, commonly regarded as a threat to the hierarchical authority and medieval insularity of academia.
Again, we’re reminded of how fatally behind the times the academy is in terms of communication — both communication among scholars and with the larger world. Shulman’s eyes lit up as we described the recent surge on the web of social software and bottom-up organizational systems like tagging that could potentially create new and unexpected avenues into history.
A small example that recurred in our discussion: Dolley Madison wrote eloquently on grief, mourning and widowhood, yet few would know to seek out her perspective on these matters. Think of how something like tagging, still in an infant stage of development, could begin to solve such a problem, helping scholars, students and general readers unlock the multiple facets of complex historical figures like Madison, and deepening our collective knowledge of subjects — like death and war — that have historically been dominated by men’s accounts. It’s a small example, but points toward something grand.

questions about blog search and time

Does anyone know of a good way to search for old blog entries on the web? I’ve just been looking at some of the available blog search resources and few of them appear to provide any serious advanced search options. The couple of major ones I’ve found that do (after an admittedly cursory look) are Google and Ice Rocket. Both, however, appear to be broken, at least when it comes to dates. I’ve tried them on three different browsers, on Mac and PC, and in each case the date menus seem to be frozen. It’s very weird. They give you the option of entering a specific time range but won’t accept the actual dates. Maybe I’m just having a bad tech day, but it’s as if there’s some conceptual glitch across the web vis a vis blogs and time.
Most blog search engines are geared toward searching the current blogosphere, but there should be a way to research older content. My first thought was that blog search engines crawl RSS feeds, most of which do not transmit the entirety of a blog’s content, just the more recent. That would pose a problem for archival search.
Does anyone know what would be the best way to go about finding, say, old blog entries containing the keywords “new orleans superdome” from late August to late September 2005? Is it best to just stick with general web search and painstakingly comb through for blogs? If we agree that blogs have become an important kind of cultural document, than surely there should be a way to find them more than a month after they’ve been written.