Category Archives: public_domain

dutch fund audiovisual heritage to the tune of 173 million euros

Larry Lessig writes in Free Culture:

Why is it that the part of our culture that is recorded in newspapers remains perpetually accessible, while the part that is recorded on videotape is not? How is it that we’ve created a world where researchers trying to understand the effect of media on nineteenth-century America will have an easier time than researchers trying to understand the effect of media on twentieth-century America?

Twentieth century Holland, it turns out, will be easier to decipher:

The Netherlands Government announced in its annual budget proposal the support for the project “Images for the Future” (in Dutch). Images for the Future is a large-scale conservation and digitalisation operation comprising 285,000 hours of film, television and radio recordings, and 2.9 million photos. The investment of 173 million euro, is spread over a period of seven years.
…It is unprecedented in its scale and ambition. All these films, programmes and photos will be made available for educational and creative purposes. An infrastructure for digital distribution will also be developed. A basic collection will be made available without copyright or under a Creative Commons licence. Making this heritage digitally available will lead to innovative applications in the area of new media and the development of valuable services for the public. The income/expense analysis included in the project plan shows that on balance the project will produce a positive social effect in the Dutch economy to the value of 20 to 60 million euros.
— from Association of Moving Image Archivists list-server

Pretty inspiring stuff.
Eddie Izzard once described the Netherlandish brand of enlightenment in a nutshell: “The Dutch speak four languages and smoke marijuana!” We now see that they also deem it wise policy to support a comprehensive cultural infrastructure for the 21st century, enabling their citizens to read, quote and reuse the media that shapes their world (while they whiz around on bicycles over tidy networks of canals). Not so here in the States where the government works for the monopolies, keeping big media on the dole through Sonny Bono-style protectionism. We should pass our benighted politicos a little of what the Dutch are smoking.

microsoft steps up book digitization

Back in June, Microsoft struck deals with the University of California and the University of Toronto to scan titles from their nearly 50 million (combined) books into its Windows Live Book Search service. Today, the Guardian reports that they’ve forged a new alliance with Cornell and are going to step up their scanning efforts toward a launch of the search portal sometime toward the beginning of next year. Microsoft will focus on public domain works, but is also courting publishers to submit in-copyright books.
Making these books searchable online is a great thing, but I’m worried by the implications of big coprorations building proprietary databases of public domain works. At the very least, we’ll need some sort of federated book search engine that can leap the walls of these competing services, matching text queries to texts in Google, Microsoft and the Open Content Alliance (which to my understanding is mostly Microsoft anyway).
But more important, we should get to work with OCR scanners and start extracting the texts to build our own databases. Even when they make the files available, as Google is starting to do, they’re giving them to us not as fully functioning digital texts (searchable, remixable), but as strings of snapshots of the scanned pages. That’s because they’re trying to keep control of the cultural DNA scanned from these books — that’s the value added to their search service.
But the public domain ought to be a public trust, a cultural infrastructure that is free to all. In the absence of some competing not-for-profit effort, we should at least start thinking about how we as stakeholders can demand better access to these public domain works. Microsoft and Google are free to scan them, and it’s good that someone has finally kickstarted a serious digitization campaign. It’s our job to hold them accountable, and to make sure that the public domain doesn’t get redefined as the semi-public domain.

google launches archival news search

Today Google unveiled a major extension of its news search service, expanding into periodical archives that stretch back to the mid-18th century. Most of the articles are pay downloads, or pay-per-view, and are offered by Google through licensing agreements with newspapers and existing document retrieval services including The New York Times Co., The Washington Post Co., The Wall Street Journal, Reed Elsevier, LexisNexis and Factiva. Google won’t actually host content or handle payments, it simply presents items with titles, brief excerpts and ordering information. Google also crawls free archives already on the web and mixes these in, and (a nice touch) links all search results to “related web pages,” plugging keywords into a general web search. Google won’t run adds in this service, at least for now. More coverage here and here.
This is a fine service, but it only underscores the need for a non-commercial alternative. Much of the material here is public domain, but is provided through commercial services. Google simply adds a new web-integrated layer. Anyone who believes that the public domain ought to be fully accessible to all should be thinking bigger than Google.

google offers public domain downloads

Google announced today that it has made free downloadable PDFs available for many of the public domain books in its database. This is a good thing, but there are several problems with how they’ve done it. The main thing is that these PDFs aren’t actually text, they’re simply strings of images from the scanned library books. As a result, you can’t select and copy text, nor can you search the document, unless, of course, you do it online in Google. So while public access to these books is a big win, Google still has us locked into the system if we want to take advantage of these books as digital texts.
A small note about the public domain. Editions are key. A large number of books scanned so far by Google have contents in the public domain, but are in editions published after the cut-off (I think we’re talking 1923 for most books). Take this 2003 Signet Classic edition of the Darwin’s The Origin of Species. Clearly, a public domain text, but the book is in “limited preview” mode on Google because the edition contains an introduction written in 1958. Copyright experts out there: is it just this that makes the book off limits? Or is the whole edition somehow copyrighted?
Other responses from Teleread and Planet PDF, which has some detailed suggestions on how Google could improve this service.

DRM and the damage done to libraries

nypl.jpg
New York Public Library

A recent BBC article draws attention to widespread concerns among UK librarians (concerns I know are shared by librarians and educators on this side of the Atlantic) regarding the potentially disastrous impact of digital rights management on the long-term viability of electronic collections. At present, when downloads represent only a tiny fraction of most libraries’ circulation, DRM is more of a nuisance than a threat. At the New York Public library, for instance, only one “copy” of each downloadable ebook or audio book title can be “checked out” at a time — a frustrating policy that all but cancels out the value of its modest digital collection. But the implications further down the road, when an increasing portion of library holdings will be non-physical, are far more grave.
What these restrictions in effect do is place locks on books, journals and other publications — locks for which there are generally no keys. What happens, for example, when a work passes into the public domain but its code restrictions remain intact? Or when materials must be converted to newer formats but can’t be extracted from their original files? The question we must ask is: how can librarians, now or in the future, be expected to effectively manage, preserve and update their collections in such straightjacketed conditions?
This is another example of how the prevailing copyright fundamentalism threatens to constrict the flow and preservation of knowledge for future generations. I say “fundamentalism” because the current copyright regime in this country is radical and unprecedented in its scope, yet traces its roots back to the initially sound concept of limited intellectual property rights as an incentive to production, which, in turn, stemmed from the Enlightenment idea of an author’s natural rights. What was originally granted (hesitantly) as a temporary, statutory limitation on the public domain has spun out of control into a full-blown culture of intellectual control that chokes the flow of ideas through society — the very thing copyright was supposed to promote in the first place.
If we don’t come to our senses, we seem destined for a new dark age where every utterance must be sanctioned by some rights holder or licensing agent. Free thought isn’t possible, after all, when every thought is taxed. In his “An Answer to the Question: What is Enlightenment?” Kant condemns as criminal any contract that compromises the potential of future generations to advance their knowledge. He’s talking about the church, but this can just as easily be applied to the information monopolists of our times and their new tool, DRM, which, in its insidious way, is a kind of contract (though one that is by definition non-negotiable since enforced by a machine):

But would a society of pastors, perhaps a church assembly or venerable presbytery (as those among the Dutch call themselves), not be justified in binding itself by oath to a certain unalterable symbol in order to secure a constant guardianship over each of its members and through them over the people, and this for all time: I say that this is wholly impossible. Such a contract, whose intention is to preclude forever all further enlightenment of the human race, is absolutely null and void, even if it should be ratified by the supreme power, by parliaments, and by the most solemn peace treaties. One age cannot bind itself, and thus conspire, to place a succeeding one in a condition whereby it would be impossible for the later age to expand its knowledge (particularly where it is so very important), to rid itself of errors, and generally to increase its enlightenment. That would be a crime against human nature, whose essential destiny lies precisely in such progress; subsequent generations are thus completely justified in dismissing such agreements as unauthorized and criminal.

We can only hope that subsequent generations prove more enlightened than those presently in charge.

google print’s not-so-public domain

wealthy new york google.jpg Google’s first batch of public domain book scans is now online, representing a smattering of classics and curiosities from the collections of libraries participating in Google Print. Essentially snapshots of books, they’re not particularly comfortable to read, but they are keyword-searchable and, since no copyright applies, fully accessible.
The problem is, there really isn’t all that much there. Google’s gotten a lot of bad press for its supposedly cavalier attitude toward copyright, but spend a few minutes browsing Google Print and you’ll see just how publisher-centric the whole affair is. The idea of a text being in the public domain really doesn’t amount to much if you’re only talking about antique manuscripts, and these are the only books that they’ve made fully accessible. Daisy Miller‘s copyright expired long ago but, with the exception of Harvard’s illustrated 1892 copy, all the available scanned editions are owned by modern publishers and are therefore only snippeted. This is not an online library, it’s a marketing program. Google Print will undeniably have its uses, but we shouldn’t confuse it with a library.
(An interesting offering from the stacks of the New York Public Library is this mid-19th century biographic registry of the wealthy burghers of New York: “Capitalists whose wealth is estimated at one hundred thousand dollars and upwards…”)