Category Archives: Libraries, Search and the Web

reading over your shoulder

A particularly offensive section of the Patriot Act was slapped down yesterday in Congress. From Reuters:

The U.S. House of Representatives on Wednesday defied President Bush by approving a measure making it harder for federal agents to secretly gather information on people’s library reading habits and bookstore purchases.

web news as gated community

Just found out about this on diglet.. Launched in April, The National Digital Newspaper Program (NDNP) is a joint effort of the Library of Congress and the National Endowment of the Humanities to create a comprehensive web archive of the nation’s public domain newspapers.

Ultimately, over a period of approximately 20 years, NDNP will create a national, digital resource of historically significant newspapers from all the states and U.S. territories published between 1836 and 1922. This searchable database will be permanently maintained at the Library of Congress (LC) and be freely accessible via the Internet.

(A similar project is getting underway in France.)
It’s frustrating that this online collection will stop at 1922. Ordinary libraries maintain up-to-date periodical archives and make them available to anyone if they’re willing to make the trip. But if they put those collections on the web, they’ll be sued. Archives are one of the few ways newspapers have figured out to make money on the web, so they’re not about to let libraries put their microfilm and periodical reading rooms online. The paradigm has flipped.. in print, you pay for the current day’s edition, but the following day it ends up in the trash, or wrapping a fish. The passage of 24 hours makes it worthless. On the web, most news is free. It’s the fish wrap that costs you.
The web has utterly changed what things are worth. For most people, when a news site asks them to pay, they high tail it out of there and never look back. Even being asked to register is enough to deter many readers. But come September, the New York Times will start charging a $50 annual fee for what it considers its most unique commodities – editorials, op-eds, and selected other features. Is a full subscription site not far off? With their prestige and vast readership, the Times might be able to pull it off. But smaller papers are afraid to start charging, even as they watch their print circulation numbers plummet. If one paper puts up a tollbooth, they instantly become irrelevant to millions of readers. There will always be a public highway somewhere nearby.
A friend at the Columbia School of Journalism told me that the only way newspapers can be profitable on the web is if they all join together in some sort of league and charge bulk subscription fees for universal access. If there’s a wholesale move to the pay model, then readers will have no choice but to shell out. It will be like paying for cable service, where each newspaper is a separate channel. The only time you register is when you pay the initial fee. From then on, it’s clear sailing.
It’s a compelling idea, but could just be collective suicide for the newspapers. There will always be free news on offer somewhere. Indian and Chinese wire services might claim the market while the prestigious western press withers away. Or people will turn to state-funded media like the BBC or Xinhua. Then again, people might be willing to pay if it means unfettered access to high quality, independent journalism. And with newspapers finally making money on web subscriptions, maybe they’d start loosening up about their archives.

“an invaluable resource that they had an extremely limited role in creating”

Good piece today in Wired on the transformation of scientific journals. There’s a general feeling that commercial publishers like Reed Elsevier enjoy unreasonable control over an evolving body of research that should be freely available to the public. With exorbitant subscription fees, affordable only for large institutions, most journals are effectively inaccessible, and the authors retain few or no reproduction rights. Recently, however, free article databases have sprung up on the web – The Public Library of Science (PLoS), BioMed Central, and NIH’s PubMed – some of which, like PLoS, have begun publishing their own journals. It’s a welcome change, considering how much labor and treasure is poured into scientific publications (from funders, private and public, and from the scientists themselves), and yet how little is gotten in return. Shifting to a non-profit model, as PLoS has done, preserves much of the financial architecture that supports the production of journals, but totally revolutionizes the distribution.

PLoS journals are free and allow authors to retain their copyrights, as long as they allow their work to be freely shared and distributed (with full credit given, naturally). They also require that authors pay $1,500 from their grants, or directly from their sponsors or institutions, to have their work published. These groups pay the bulk of the $10 billion that goes to scientific and medical publishers each year, and what do they get in return? Limited access to the research they funded, and no right to reuse the information.
“It’s ridiculous to give publishers complete control of an invaluable resource that they had an extremely limited role in creating,” Eisen said (Eisen teaches genetics and is a founder of PLoS).

But what is in many ways the tougher question is how to shift the architecture of prestige – peer review – to these new kinds of journals.

visual bookmarks

Wists is a visual bookmarking system for the web, doing for images what del.icio.us does for web pages. It’s like browsing the web with a camera, or creating your own hand-selected Google image search. Find an image you want to keep track of and Wists will create a thumbnail for you, linking back to the original site. If it’s a whole page you want to capture, Wists will take an automatic screenshot of the entire page. Add a title, tags and description and it goes into the system – a photo album of the web. Much like del.icio.us, Wists arranges popular tags on the sidebar and allows you to browse the latest entries. It also enables you to add other users’ bookmarks to your own gallery, clearing the slate for your own tags and descriptions. Best of all, it keeps track of people you’ve taken items from, and people who have taken items from you. Trails become apparent and the archive becomes interconnected. Here’s a grab of my “jaws” tag page – combing around for images, I found an amusing juxtaposition.
These are the kind of basic curatorial tools that would be great on Flickr. Currently, you are only able to apply tags to your own photos, or the those of friends, family or mutual contacts. But part of the fun of Flickr is browsing the photos of total strangers. You can comment on any photo or mark it as a favorite, but there is no way to curate your own collection of images from the community at large. Wists suggests how the gap between del.icio.us and Flickr might be bridged.

Google Print gets its own address

Google Print now has its own exclusive search page. But make no mistake, this is not a library. Google makes it very clear in a paragraph intended to reassure nervous publishers:

Google Print is a book marketing program, not an online library, and as such your entire book will not be made available online unless you expressly permit it.

If you reach your limit of permitted pages you get this:

(Technorati Tags: Google, ebook, Library)

self-destructing books

In January I bought my first ebook (ISBN: B0000E68Z2), which is published by Wiley. I have one copy on my laptop and a backup on my external harddrive. Last week, I downloaded and installed Adobe Professional (writer 6.0) from our company network (Norwegian School of Management, BI) – during the installation some files from the Adobe version that I downloaded and installed when I bought the ebook (from Amazon.com UK) were deleted. Since then, I have not been able to access my ebook – I have tried to get help from our computer staff but they have not been able to help me.
Adobe thinks that I’m using another computer, while I’m not – and it didn’t help to activate the computer through some Adobe DRM Activator stuff. Now I have spent at least 10 hours trying to access my ebook – hope you can help…

Boing Boing points to this story illustrating the fundamental flaws of digital rights management (DRM) – about a Norwegian prof who paid $172 for an ebook on Amazon UK only to have it turn to unreadable code jibberish after updating his Acrobat software. He made several pleas for help – to Acrobat, to Wiley (the publisher), and to Amazon. All were in vain. It turns out that after reading the story in Boing Boing (in the past 24 hours, I guess), Wiley finally sent a replacement copy. But the problem of built-in obsolescence in ebooks goes unaddressed.
I’m convinced that encrypting single “copies” is lunacy. For everything we gain with electronic texts – search, multimedia, connection to the network etc. – we lose much in the way of permanence and tactility. DRM software only makes the loss more painful. Publishers need to get away from the idea of selling “copies” and start experimenting with charging for access to a library of titles. You pay for the service, not for the copy. Digital books are immaterial – so the idea of the “copy” has to be revised.
Another example of old thinking with new media is the New York Public Library’s ebook collection. That “copies” of electronic titles are set to expire after 21 days is not surprising. The “copy” is “returned” automatically and you sweep the expired file like a husk into the trash. What’s incredible is that the library only allows one “copy” to be checked out a time, entirely defeating one of the primary virtues of electronic books: they can always be in circulation. Clearly terrified by the implications of the new medium (or of the retribution of publishers), the NYPL keeps ebooks on an even tighter tether than they do their print books. As a result, they’ve set up a service that’s too frustrating to use. They should rethink this idea of the single “copy” and save everyone the “quote” marks.

academic publishers get snippety with Google

Last Friday, the Association of American University Presses (AAUP) sent Google a long letter expressing concern over what might amount to “systematic infringement of copyright on a massive scale” in its library project. BusinessWeek reports. The AAUP letter can be read here. Much of it asks Google to clarify its position on a number of points – to provide, as it were, the fine print on Google Print. Here’s a great item:

Snippet is used so consistently in describing Google Print for Libraries that it’s taking on the status of a technical term, and thus requires a specific definition. How long is a “snippet?”

Google defends its mass digitization project on the grounds of “fair use” (Section 107 of the US Copyright Act). In other words, it asserts the right to copy copyrighted materials and make them browseable on the web for research purposes as long as they restrict the amount that can be seen for free. Any commercial use of the text will take place only in the context of a publisher agreement. Publishers have the right to opt out, and apparently a couple already have, though most are holding their breath and waiting to see if they might be able to profit from Google’s project. The tricky question is, can a book that has been withheld from the publisher program be included in the library program?
You could say that the web is one enormous copying machine. And so fair use questions are more important than ever before. Will Google be the juggernaut that breaks down the door into a more permissive fair use era for all? Or will they use their power to establish an exclusive, Google-only, fair use zone, and set up a cartel with publishers? Or will a few well aimed law suits sink the project before it gets off the ground?

librarians set up shop at Wikipedia

“We librarians flatter ourselves that we know a thing or two about organizing information. It’s time we stepped up and contributed to Wikipedia: not just to its content but to its structures and technologies. This project page is intended to provide a rallying point for these activities.”

libraries improve the front end

There’s a nice article in yesterday’s NY Times on how some university libraries are rethinking how they arrange their space – moving, or redistributing print collections to make way for an “electronic information commons.” It’s not about abandoning the books, or relegating them to a lesser status. It’s more about re-positioning them as a sort of physical database. If a library is a big computer, than the database exists toward the back end. These days, digging through the stacks, even if it results in a paper return, is generally done digitally. That’s the front end, or the interface, and this is what smart libraries are seeking to improve. Research and scholarly production may be going digital, but the social, conversational space of the university, and in turn, the library, is still vital. The strategy for the libraries, then, is to restructure and expand that space as a compelling social software environment – one that is both physical and virtual. Sounds good in theory, but I’m not sure that’s how these facilities will actually be used. Turning libraries into a hi-tech rec center might be sacrificing more than it saves.
The article focuses on reorganization efforts at the University of Texas at Austin. Press release here concerning the transformation of the Flawn Academic Center (likely a more durable link than the Times story).
(image: Perry-Castañeda Library at UT Austin)

Europe aims canon at Google

Google is riding high. With nearly 50 percent market share, it is the most widely used search engine on the web. It is even beginning to act suspiciously like a portal (notice the “login” link tucked discretely in the upper right corner?), handling your mail, hosting your blog, helping you find files on your desktop, and even storing a history of all your web searches. (No doubt, Google’s expansion into web-based applications has Microsoft scared – they’ve always considered software their turf.) Google recently patented a system that ranks news searches by the quality and credibility of the source. Gorgeous satellite maps have put the surface of the earth in our web browser and left people breathless (see the Google Sightseeing blog, or the memorymap tag on Flickr as examples of sheer exultation in seeing the world through Google maps). Late last year, Google announced plans to digitize and put online major portions of the libraries of Stanford, U. Michigan, Harvard, Oxford and the NY Public Library. And on top of all this, its stock has continued to soar. Already this year, the newly public company shattered predictions, earning $369.1 million in the first quarter alone, more than covering the cost of the projected 10-year library scanning project. It seems there is no limit to what Google might do.
That is precisely what has Europeans so worried. Across the Atlantic, Google is coming to be seen as yet another symbol of American cultural hegemony, bestriding the web like a colossus. And the library project touches a particularly sensitive nerve, raising questions of cultural heritage – and cultural destiny. If the future of libraries is solely in Google’s hands, what will be left out in the process? Will English become the lingua franca not only for politics and commerce, but for all intellectual discourse? Not content simply to ask questions, Europe has responded. In February, Jean-Noel Jeanneney, chief librarian of the Bibliothè que Nationale de France, warned that Google Print would effectively anglicize the world’s knowledge, and called for a French digitization effort to beat back the surging english tide. Less than a month later, President Jacques Chirac gave Jeanneney’s proposal the green light. Then, last week, nineteen national libraries, evidently moved by France’s determination, signed a joint motion urging the creation of a giant pan-European digital library to counterbalance the nascent Googlian stacks. A couple days ago, 16 EU culture ministers, several heads of state, and over 800 artists and intellectuals met in Paris to close the deal, issuing a strong, continent-wide directive to preserve and promote culture, beginning with the digitization of European library collections.
More serious digitization projects are undoubtedly a good thing. The European effort, being a more purely civic enterprise, might in fact turn out far better than Google Print, which clearly has a large commercial dimension (deals with publishers, advertising etc.). The Euro initiative might produce bona fide electronic editions, not just searchable scans – fully structured, annotated and perhaps employing other scholarly resources (but let’s not hold our breath). To be fair, Google has never said it wants to be the only gig in town. Rather, they hope to act as a catalyst for other digitization efforts. And judging by Europe’s reaction, it seems to be working. Looking at this latest transatlantic folly, it’s funny to think of the Bush administration trying to undercut European unity, splitting the continent into “old” and “new” in the hope of fishing out support for the Iraq war. We’ve seen where that kind of destructive diplomacy has led us. But quite wonderfully, Google appears to have achieved the opposite, galvanizing a united Europe with a big, visionary idea. If the Euro library project exists for no other reason than the perceived imperialism of Google, then so be it. It will result in a great gift for all. If only our foreign policy were so deft.
(image by libraryman via Flickr)

if:book

A Project of the Institute for the Future of the Book