Category Archives: digitization

national archives/amazon agreement released

From Rick Prelinger:

In a rapid FOIA response, NARA has released the partnership agreement between them and Amazon’s CustomFlix (now CreateSpace) subsidiary. It’s downloadable here (I’m responsible for the poorly derived PDF). I’ll be reading and analyzing it soon.

cornell joins google book search

…offering up to 500,000 items for digitization. From the Cornell library site:

Cornell is the 27th institution to join the Google Book Search Library Project, which digitizes books from major libraries and makes it possible for Internet users to search their collections online. Over the next six years, Cornell will provide Google with public domain and copyrighted holdings from its collections. If a work has no copyright restrictions, the full text will be available for online viewing. For books protected by copyright, users will just get the basic background (such as the book’s title and the author’s name), at most a few lines of text related to their search and information about where they can buy or borrow a book. Cornell University Library will work with Google to choose materials that complement the contributions of the project’s other partners. In addition to making the materials available through its online search service, Google will also provide Cornell with a digital copy of all the materials scanned, which will eventually be incorporated into the university’s own digital library.

audiovisual heritage double play

Two major preservation and access initiatives just reported by Peter Brantley over at O’Reilly Radar (1 and 2):
1. Reframe (set to launch in September ’07)

The Reframe project is a new initiative of Renew Media in partnership with Amazon and with major support from the John D. & Catherine T. MacArthur Foundation, which promises to offer exciting solutions for the dissemination of important media arts and the preservation and accessibility of our visual heritage.
The Reframe project will help connect audiences of independent media to a robust collection of media arts via an integrated, resourceful website. Reframe will aggregate content from individual filmmakers, broadcasters, distributors, public media resources, archives, libraries and other sources of independent and alternative media. Serving as a both an aggregator of content and a powerful marketing tool, Reframe enables content-holders to digitize, disseminate and make available their content to a vast potential audience via a powerful online resource.
Renew Media will create a specialized Reframe website, which will interact with the Amazon storefront, to assist institutions (universities, libraries or museums) and consumers of niche content in browsing, finding, purchasing or renting Reframe content. Reframe website visitors will find it easy to locate relevant content through a rich menu of search and retrieval tools, including conventional search, recommender systems, social networking tools and curated lists. Reframe will allow individual viewers to rate and discuss the films they have seen and to sort titles according to their popularity among users with similar interests.

2. Library of Congress awards to preserve digitized and born-digital works

The Library of Congress, through its National Digital Information Infrastructure and Preservation Program (NDIIPP), today announced eight partnerships as part of its new Preserving Creative America initiative to address the long-term preservation of creative content in digital form. These partners will target preservation issues across a broad range of creative works, including digital photographs, cartoons, motion pictures, sound recordings and even video games. The work will be conducted by a combination of industry trade associations, private sector companies and nonprofits, as well as cultural heritage institutions.
Several of the projects will involve developing standardized approaches to content formats and metadata (the information that makes electronic content discoverable by search engines), which are expected to increase greatly the chances that the digital content of today will survive to become America’s cultural patrimony tomorrow. Although many of the creative content industries have begun to look seriously at what will be needed to sustain digital content over time, the $2.15 million being awarded to the Preserving Creative America projects will provide added impetus for collaborations within and across industries, as well as with libraries and archives.

Partners include the Academy of Motion Picture Arts and Sciences, the American Society of Media Photographers, ARTstor and others. Go here and scroll down part way to see the full list.
One project that caught my and Peter’s eye is an effort by the University of Illinois at Urbana-Champaign to address a particularly vexing problem: how to preserve virtual environments and other complex interactive media:

Interactive media are highly complex and at high risk for loss as technologies rapidly become obsolete. The Preserving Virtual Worlds project will explore methods for preserving digital games and interactive fiction. Major activities will include developing basic standards for metadata and content representation and conducting a series of archiving case studies for early video games, electronic literature and Second Life, an interactive multiplayer game. Second Life content participants include Life to the Second Power, Democracy Island and the International Spaceflight Museum. Partners: University of Maryland, Stanford University, Rochester Institute of Technology and Linden Lab.

privatizing public goods (our tax dollars at work)

The National Archives is at it again. After announcing in January its exclusive agreement with Footnote.com to digitize and offer priced access to millions of public domain historical records, NARA (National Archives and Records Administration) has now inked a deal with Amazon to distribute significant parts of its vast archival films collection commercially on DVD and online.
As reported by the Cumberland Times News:

The arrangement allows Amazon – and a subsidiary, CustomFlix Labs Inc. – to copy National Archives films and video onto DVDs, and sell them to the public via the Internet.
The Archives will initially make available its collection of Universal Newsreels, dating from 1920 to 1967. Thousands of other public-domain and government films will be made available later.
Included in the initial offerings are events as diverse as the famous 1959 “Kitchen Debate” between then-Vice President Richard Nixon and Soviet Premier Nikita Khrushchev, and footage of a youthful Fidel Castro after the communist revolution in Cuba. Newsreels that will become available later include coverage of the death of President Franklin D. Roosevelt, the end of World War II, and the royal wedding of Princess Margaret.
National Archives officials said the arrangement will greatly expand the availability of the collection. Previously, such films could only be viewed and recorded at the Archives facility in College Park.

No doubt NARA should doing everything in its power to digitize and increase access to its vaults, but locking materials down through commercial partnerships is no way to run a public trust. In a more commendable move, NARA put up a draft of another digitization/distribution agreement it has in the works, this one with the Genealogical Society of Utah (GSU), and they’ve even opened it up to public comment. They ought to do the same with the Amazon deal, and while they’re at it, offer less antiquated mechanisms for the public to make their voices heard. As it stands, comments on the GSU draft can be submitted in the following ways:
* regulations.gov
* fax
* postal mail
* hand delivery or courier
Hey, why not use CommentPress?

google library dominoes

Princeton is the latest university to partner up with the Google library project, signing an agreement to have 1 million public domain books scanned over the next six years. Over at ALA Techsource Tom Peters voices the growing unease among librarians worried about the long-term implications of commercial enclosure of the world’s leading research libraries.

give away the content and sell the thing

On my way to that rather long discussion of ARGs the other day, I fielded something Pat Kane said to me a while back about the growing importance of live gigs to the income of musicians.
So I was tickled when Paul Miller pointed me to a piece Chris Anderson blogged yesterday about the same thing. Increasingly, musicians are giving their music away for free in order to drive gig attendance – and it’s driving music reproduction companies crazy. And yet, what can they do? “The one thing that you can’t digitize and distribute with full fidelity is a live show”.
A minor synchronicity; but then I stop by here and find Gary Frost and bowerbird vigorously debating the likeliness of the digitisation of everything, and of the death of ‘the original’ as even a concept, in the context of Ben’s piece about the National Archives sellout. And then I remember that, the day before, someone sent me a spoof web page telling me to get a First Life. And I start to wonder if there’s some kind of post-digital backlash taking shape.
OK, Anderson is talking about music; it’s hard to speculate about how the manifest ‘authentic’ appeal of a time-bound, ephemeral ‘gig’ experience translates literally to the field of physical books without falling back into diaphanous stuff about tactility and marginalia and so on. But, in the light of people’s manifest willingness to pay ridiculous sums to see the ‘real’ Madonna in real time and space, is it really feasible to talk, as bowerbird does, about the coming digitisation of everything?
As far as I can see, as more digitisation progresses, authenticity is becoming big business. I think it’s worth exploring the possibilities of a split between ‘book’ as pure content, and book as ‘authentic’ object. In particular, I think it’s worth exploring the possible economics of this: the difference in approach, genesis, theory, self-justification, style and paycheque of content created for digital reproduction, and text created for tangible books. And finally, I think whoever manages to sus both has probably got it made.

national archives sell out

This falls into the category of deeply worrying. In a move reminiscent of last year’s shady Smithsonian-Showtime deal, the U.S. National Archives has signed an agreement with Footnote.com to digitize millions of public domain historical records — stuff ranging from the papers of the Continental Congress to Matthew B. Brady’s Civil War photographs — and to make them available through a commercial website. They say the arrangement is non-exclusive but it’s hard to see how this is anything but a terrible deal.
Here’s a picture of the paywall:

nationalarchivespaywall.jpg

Dan Cohen has a good run-down of why this should set off alarm bells for historians (thanks, Bowerbird, for the tip). Peter Suber has: the open access take: “The new Democratic Congress should look into this problem. It shouldn’t try to undo the Footnote deal, which is better than nothing for readers who can’t get to Washington. But it should try to swing a better deal, perhaps even funding the digitization and OA directly.” Absolutely. (Actually, they should undo it. Scrap it. Wipe it out.) Digitization should not become synonymous with privatization.
Elsewhere in mergers and acquisitions, the University of Texas Austin is the newest partner in the Google library project.

microsoft launches live search books

Windows Live Search Books, Microsoft’s answer to Google Book Search, is officially up and running and looks and feels pretty much the same as its nemesis. Being a Microsoft product, the interface is clunkier, and they have a bit of catching up to do in terms of navigation and search options. The one substantive difference is that Live Search is mostly limited to out-of-copyright books — i.e. pre-19231927 editions of public domain works. So the little they do have in there is fully accessible, with PDFs available for download. Like Google’s public domain books, however, the scans are of pretty poor quality, and not searchable. Readers point out that Microsoft, unlike Google, does in fact include a layer of low-quality but entirely searchable OCR text in its public domain downloads.
windows live books.jpg

brewster kahle on the google book search “nightmare”

kahlevidscreenshot.jpg
“Pretty much Google is trying to set themselves up as the only place to get to these materials; the only library; the only access. The idea of having only one company control the library of human knowledge is a nightmare.”
From a video interview with Elektrischer Reporter (click image to view).
(via Google Blogoscoped)

google makes slight improvements to book search interface

Google has added a few interface niceties to its Book Search book viewer. It now loads multiple pages at a time, giving readers the option of either scrolling down or paging through left to right. There’s also a full screen reading mode and a “more about this book” link taking you to a profile page with links to related titles plus references and citations from other books or from articles in Google Scholar. Also on the profile page is a searchable keyword cluster of high-incidence names or terms from the text.
bartlebygoogle2.jpg
Above is the in-copyright Signet Classic edition of Billy Budd and Other Tales by Melville, which contains only a limited preview of the text. You can also view the entire original 1856 edition of Piazza Tales as scanned from the Stanford Library. Public domain editions like this one can now be viewed with facing pages.
Still a conspicuous lack of any annotation or social reading tools.
bartlebygoogle.jpg