the open library

A little while back I was musing on the possibility of a People’s Card Catalog, a public access clearinghouse of information on all the world’s books to rival Google’s gated preserve. Well thanks to the Internet Archive and its offshoot the Open Content Alliance, it looks like we might now have it – ?or at least the initial building blocks. On Monday they launched a demo version of the Open Library, a grand project that aims to build a universally accessible and publicly editable directory of all books: one wiki page per book, integrating publisher and library catalogs, metadata, reader reviews, links to retailers and relevant Web content, and a menu of editions in multiple formats, both digital and print.

Imagine a library that collected all the world’s information about all the world’s books and made it available for everyone to view and update. We’re building that library.

The official opening of Open Library isn’t scheduled till October, but they’ve put out the demo now to prove this is more than vaporware and to solicit feedback and rally support. If all goes well, it’s conceivable that this could become the main destination on the Web for people looking for information in and about books: a Wikipedia for libraries. On presentation of public domain texts, they already have Google beat, even with recent upgrades to the GBS system including a plain text viewing option. The Open Library provides TXT, PDF, DjVu (a high-res visual document browser), and its own custom-built Book Viewer tool, a digital page-flip interface that presents scanned public domain books in facing pages that the reader can leaf through, search and (eventually) magnify.
Page turning interfaces have been something of a fad recently, appearing first in the British Library’s Turning the Pages manuscript preservation program (specifically cited as inspiration for the OL Book Viewer) and later proliferating across all manner of digital magazines, comics and brochures (often through companies that you can pay to convert a PDF into a sexy virtual object complete with drag-able page corners that writhe when tickled with a mouse, and a paper-like rustling sound every time a page is turned).
This sort of reenactment of paper functionality is perhaps too literal, opting for imitation rather than innovation, but it does offer some advantages. Having a fixed frame for reading is a relief in the constantly scrolling space of the Web browser, and there are some decent navigation tools that gesture toward the ways we browse paper. To either side of the open area of a book are thin vertical lines denoting the edges of the surrounding pages. Dragging the mouse over the edges brings up scrolling page numbers in a small pop-up. Clicking on any of these takes you quickly and directly to that part of the book. Searching is also neat. Type a query and the book is suddenly interleaved with yellow tabs, with keywords highlighted on the page, like so:

But nice as this looks, functionality is sacrificed for the sake of fetishism. Sticky tabs are certainly a cool feature, but not when they’re at the expense of a straightforward list of search returns showing keywords in their sentence context. These sorts of references to the feel and functionality of the paper book are no doubt comforting to readers stepping tentatively into the digital library, but there’s something that feels disjointed about reading this way: that this is a representation of a book but not a book itself. It is a book avatar. I’ve never understood the appeal of those Second Life libraries where you must guide your virtual self to a virtual shelf, take hold of the virtual book, and then open it up on a virtual table. This strikes me as a failure of imagination, not to mention tedious. Each action is in a sense done twice: you operate a browser within which you operate a book; you move the hand that moves the hand that moves the page. Is this perhaps one too many layers of mediation to actually be able to process the book’s contents? Don’t get me wrong, the Book Viewer and everything the Open Library is doing is a laudable start (cause for celebration in fact), but in the long run we need interfaces that deal with texts as native digital objects while respecting the originals.
What may be more interesting than any of the technology previews is a longish development document outlining ambitious plans for building the Open Library user interface. This covers everything from metadata standards and wiki templates to tagging and OCR proofreading to search and browsing strategies, plus a well thought-out list of user scenarios. Clearly, they’re thinking very hard about every conceivable element of this project, including the sorts of things we frequently focus on here such as the networked aspects of texts. Acolytes of Ted Nelson will be excited to learn that a transclusion feature is in the works: a tool for embedding passages from texts into other texts that automatically track back to the source (hypertext copy-and-pasting). They’re also thinking about collaborative filtering tools like shared annotations, bookmarking and user-defined collections. All very very good, but it will take time.
Building an open source library catalog is a mammoth undertaking and will rely on millions of hours of volunteer labor, and like Wikipedia it has its fair share of built-in contradictions. Jessamyn West of librarian.net put it succinctly:

It’s a weird juxtaposition, the idea of authority and the idea of a collaborative project that anyone can work on and modify.

But the only realistic alternative may well be the library that Google is building, a proprietary database full of low-quality digital copies, a semi-accessible public domain prohibitively difficult to use or repurpose outside the Google reading room, a balkanized landscape of partner libraries and institutions left in its wake, each clutching their small slice of the digitized pie while the whole belongs only to Google, all of it geared ultimately not to readers, researchers and citizens but to consumers. Construed more broadly to include not just books but web pages, videos, images, maps etc., the Google library is a place built by us but not owned by us. We create and upload much of the content, we hand-make the links and run the search queries that program the Google brain. But all of this is captured and funneled into Google dollars and AdSense. If passive labor can build something so powerful, what might active, voluntary labor be able to achieve? Open Library aims to find out.

outages

Apologies to everyone who has had difficulty viewing our sites today. We’ve determined that the problem lies not with our server but with Time Warner. Apparently, sites are viewable through different ISPs but Time Warner/Roadrunner appears to be undergoing some sort of maintenance that prevents them from resolving our URLs. We thought this might just be a New York problem but recently got a message from someone in North Carolina that they’re experiencing the same thing. If you’re reading this now it probably means you are unaffected.
A paranoid thought did occur to me: is this momentary glitch a preview of a post net neutrality world in which less privileged, non-premium sites like ours get the shaft?
Hopefully this will all pass soon.
Update: it seems to have passed.

welcome siva vaidhyanathan, our first fellow

We are proud to announce that the brilliant media scholar and critic Siva Vaidhyanathan will be establishing a virtual residency here as the Institute’s first fellow. Siva is in the process of moving from NYU to the University of Virginia, where he’ll be teaching media studies and law. While we’re sad to be losing him in New York, we’re thrilled that this new relationship will bring our work into closer, more dynamic proximity. Precisely what “fellowship” entails will develop over time but for now it means that the Institute is the new digital home of SIVACRACY.NET, Siva’s popular weblog. It also means that next month we will be a launching a new website devoted to Siva’s latest book project, The Googlization of Everything, an examination of Google’s disruptive effects on culture, commerce and community.
Siva is one of just a handful of writers to have leveled a consistent and coherent critique of Google’s expansionist policies, arguing not from the usual kneejerk copyright conservatism that has dominated the debate but from a broader cultural and historical perspective: what does it mean for one company to control so much of the world’s knowledge? Siva recently gave a keynote talk at the New Network Theory conference in Amsterdam where he explored some of these ideas, which you can read about here. Clearly Siva’s views on these issues are sympathetic to our own so we’re very glad to be involved in the development of this important book. Stay tuned for more details.
Welcome aboard, Siva.

a little weekend reading

There’s an interesting post by Kenneth Goldsmith at Harriet, the blog of the Poetry Foundation about writing and the Web. Kenneth Goldsmith is probably best known – or not known? – to those who read if:book as the force behind UbuWeb; there was a fascinating interview with him recently at Archinect which provides a great deal of background on his work there. He’s also an accomplished poet; see, for example, his piece Soliloquy. In his post at Harriet, Goldsmith starts with a provocative statement: “With the rise of the web, writing has met its photography.” He argues that writing needs to redefine itself for the new parameters the Web offers; it’s a provocative argument, and one that deserves to stir up a broad discussion.

perspectives on distributed creativity

Assignment Zero, an experimental news site that brings professional journalists together with volunteer researcher-reporters to collaboratively write stories, has kicked off its tenure at Wired News by doing an extended investigation of “crowdsourcing.” Crowdsourcing is the latest internet parlance used to describe work traditionally carried out by one or a few persons being distributed among many people. I’ve always found something objectionable about the term, which is more suggestive of a business model than a creative strategy and sidesteps the numerous ethical questions about peer production and corporate exploitation that are inevitably bound up in it. But it’s certainly a subject that could use a bit of scrutiny, and who better to do it than a journalistic team composed of the so-called crowd?
It is in this self-reflexive spirt that Jay Rosen, a exceedingly sharp thinker on the future of journalism and executive editor of Assignment Zero (and the related NewAssignment.net), presents an interesting series of features assembled by his “pro-am” team that look at a wide variety of online collaboration forms. This package has been in development for several months (many of the pieces contain links back to the original “assignments” and you can see how they evolved) and there’s a lot there: 80 Q&A’s, essays and stories (mostly Q&A’s) looking at innovative practices and practitioners across media types and cultural/commercial arenas. From an initial sifting, it’s less an analysis than just a big collection of perspectives, but this is valuable I think, if for no other reason than as a jumping-off point for further research.
There are many of the usual suspects like Benkler, Lessig, Jarvis, Shirky, Surowiecki, Wales etc., but as many or more of the pieces venture off the beaten track. There’s a thought-provoking interview with Douglas Rushkoff on open source as a cultural paradigm, some stuff on the Wu Ming fiction collective (which is fascinating), a piece about Sydney Poore, a Wikipedia “super-contributor,” and some coverage of our work, an interview with McKenzie Wark about Gamer Theory and collaborative writing. There’s also an essay by one of the Assignment Zero contributors, Kristin Gorski, synthesizing some of the material gathered on the latter subject: “Creative Crowdwriting: The Open Book.”
All in all this seems like a successful test drive for an experimental group that is still inventing its process. I’m interested to see how it develops with other less “wired” subjects.

of shelves and selves

William Drenttel has a lovely post over on Design Observer about the exquisite information of bookshelves, a meditation spurred by 60 photographs of the library of renowned San Francisco designer, typographer, printer and founder of Greenwood Press Jack Stauffacher. Each image (they were taken by Dennis Letbetter) gives a detailed view of one section of Stauffacher’s shelves, a rare glimpse of one individual’s bibliographic DNA, made browseable as a slideshow (unfortunately, the images are not reassembled at the end to give a full view of the collection).

Early evidence suggests that the impulse toward personal mapping through media won’t abate as we go deeper into the digital. Delicious Library and Library Thing are more or less direct transpositions of physical shelves to the computer environment, the latter with an added social dimension (people meeting through their virtual shelves). More generally, social networking sites from Facebook to MySpace are full of self-signification through shelves, or rather lists, of favorite books, movies and music. Social bookmarking sites too bear traces of identity in the websites people save and tag (the tags themselves are a kind of personal signature). Much of the texture and spatial language of the physical may be lost, a new social terrain has opened up, one which we’re only beginning to understand.
But it’s not as though physical bookshelves haven’t always been social. We arrange books not only for our own conceptual orientation, but to give others who venture into our space a sense of our self (or what we’d like to appear as our self), our distinct intellectual algorithm. Browsing a friend’s thoughtfully arranged shelf is like looking through a lens calibrated to their view of the world, especially when those books have played a crucial role, as in Stauffacher’s, in shaping a life’s work. Drenttel savors the idiosyncrasies that inevitably are etched into such a collection:

I have seen many great rare book libraries…. But the libraries I most enjoy are working libraries, where the books have been used and cited and annotated – first editions marred with underlining, notes throughout their pages. (I will always remember the chaos of Susan Sontag’s library, where every book had been touched, read and filled with notes and ephemera.) The organization of a working library is seldom alphabetical…but rather follows some particular mental construct of its owner. Jack Stauffacher’s shelves have some order, one knows. But it is his order, his life.

Or, in Stauffacher’s own words:

Without this working library, I would have no compass, no map, to guide me through the density of our human condition.

six blind men and an elephant

Thomas Mann, author of The Oxford Guide to Library Research, has published an interesting paper (pdf available) examining the shortcomings of search engines and the continued necessity of librarians as guides for scholarly research. It revolves around the case of a graduate student investigating tribute payments and the Peloponnesian War. A Google search turns up nearly 80,000 web pages and 700 books. An overwhelming retrieval with little in the way of conceptual organization and only the crudest of tools for measuring relevance. But, with the help of the LC Catalog and an electronic reference encyclopedia database, Mann manages to guide the student toward a manageable batch of about a dozen highly germane titles.
Summing up the problem, he recalls a charming old fable from India:

Most researchers – at any level, whether undergraduate or professional – who are moving into any new subject area experience the problem of the fabled Six Blind Men of India who were asked to describe an elephant: one grasped a leg and said “the elephant is like a tree”; one felt the side and said “the elephant is like a wall”; one grasped the tail and said “the elephant is like a rope”; and so on with the tusk (“like a spear”), the trunk (“a hose”) and the ear (“a fan”). Each of them discovered something immediately, but none perceived either the existence or the extent of the other important parts – or how they fit together.
Finding “something quickly,” in each case, proved to be seriously misleading to their overall comprehension of the subject.
In a very similar way, Google searching leaves remote scholars, outside the research library, in just the situation of the Blind Men of India: it hides the existence and the extent of relevant sources on most topics (by overlooking many relevant sources to begin with, and also by burying the good sources that it does find within massive and incomprehensible retrievals). It also does nothing to show the interconnections of the important parts (assuming that the important can be distinguished, to begin with, from the unimportant).

Mann believes that books will usually yield the highest quality returns in scholarly research. A search through a well tended library catalog (controlled vocabularies, strong conceptual categorization) will necessarily produce a smaller, and therefore less overwhelming quantity of returns than a search engine (books do not proliferate at the same rate as web pages). And those returns, pound for pound, are more likely to be of relevance to the topic:

Each of these books is substantially about the tribute payments – i.e., these are not just works that happen to have the keywords “tribute” and “Peloponnesian” somewhere near each other, as in the Google retrieval. They are essentially whole books on the desired topic, because cataloging works on the assumption of “scope-match” coverage – that is, the assigned LC headings strive to indicate the contents of the book as a whole….In focusing on these books immediately, there is no need to wade through hundreds of irrelevant sources that simply mention the desired keywords in passing, or in undesired contexts. The works retrieved under the LC subject heading are thus structural parts of “the elephant” – not insignificant toenails or individual hairs.

If nothing else, this is a good illustration of how libraries, if used properly, can still be much more powerful than search engines. But it’s also interesting as a librarian’s perspective on what makes the book uniquely suited for advanced research. That is: a book is substantial enough to be a “structural part” of a body of knowledge. This idea of “whole books” as rungs on a ladder toward knowing something. Books are a kind of conceptual architecture that, until recently, has been distinctly absent on the Web (though from the beginning certain people and services have endeavored to organize the Web meaningfully). Mann’s study captures the anxiety felt at the prospect of the book’s decline (the great coming blindness), and also the librarian’s understandable dread at having to totally reorganize his/her way of organizing things.
It’s possible, however, to agree with the diagnosis and not the prescription. True, librarians have gotten very good at organizing books over time, but that’s not necessarily how scholarship will be produced in the future. David Weinberg ponders this:

As an argument for maintaining human expertise in manually assembling information into meaningful relationships, this paper is convincing. But it rests on supposing that books will continue to be the locus of worthwhile scholarly information. Suppose more and more scholars move onto the Web and do their thinking in public, in conversation with other scholars? Suppose the Web enables scholarship to outstrip the librarians? Manual assemblages of knowledge would retain their value, but they would no longer provide the authoritative guide. Then we will have either of two results: We will have to rely on “‘lowest common denominator'”and ‘one search box/one size fits all’ searching that positively undermines the requirements of scholarly research”…or we will have to innovate to address the distinct needs of scholars….My money is on the latter.

As I think is mine. Although I would not rule out the possibility of scholars actually participating in the manual assemblage of knowledge. Communities like MediaCommons could to some extent become their own libraries, vetting and tagging a wide array of electronic resources, developing their own customized search frameworks.
There’s much more in this paper than I’ve discussed, including a lengthy treatment of folksonomies (Mann sees them as a valuable supplement but not a substitute for controlled taxonomies). Generally speaking, his articulation of the big challenges facing scholarly search and librarianship in the digital age are well worth the read, although I would argue with some of the conclusions.

the paper e-book

Manolis Kelaidis, a designer at the Royal College of Art in London, has found a way to make printed pages digitally interactive. His “blueBook” prototype is a paper book with circuits embedded in each page and with text printed with conductive ink. When you touch a “linked” word on the page and your finger completes a circuit, sending a signal to a processor in the back cover which communicates by Bluetooth with a nearby computer, bringing up information on the screen.

(image from booktwo.org)
I’ve heard from a number of people that Kelaidis brought down the house last week at O’Reilly’s “Tools of Change for Publishing” conference in San Jose. Andrea Laue, who blogs at jusTaText, did a nice write-up:

He asked the audience if, upon encountering an obscure reference or foreign word on the page of a book, we would appreciate the option of touching the word on the page and being taken (on our PC) to an online resource that would identify or define the unfamiliar word. Then he made it happen. Standing O.
Yes, he had a printed and bound book which communicated with his laptop. He simply touched the page, and the laptop reacted. It brought up pictures of the Mona Lisa. It translated Chinese. It played a piece of music. Kelaidis suggested that a library of such books might cross-refer, i.e. touching a section in one book might change the colors of the spines of related books on your shelves. Imagine.

So there you have it. A networked book – in print. Amazing.
It’s not surprising to hear that the O’Reilly crowd, filled with anxious publishers, was ecstatic about the blueBook. Here was tangible proof that print can be meaningfully integrated with the digital world without sacrificing its essential formal qualities: the love child of the printed book and the companion CD-ROM. And since so much of the worry in publishing is really about the crumbling of business models and only secondarily about the essential nature of books or publishing, it was no doubt reassuring to imagine something like the blueBook as the digital book of the future: a physical object that can be reliably bought and sold (and which, with all those conductors, circuits and processors involved, would be exceedingly difficult to copy).
Kelaidis’ invention definitely sounds wonderful, but is it a plausible vision of things to come? I suppose electronic paper of all kinds, pulp and polymer, will inevitably get better and cheaper over time. How transient and historically contingent is our attachment to paper? There’s a compelling argument to be made (Gary Frost makes it, and we frequently debate it around the table here) that, in spite of all the new possibilities opened up by digital technologies, the paper book is a unique ergonomic fit for the human hand and mind, and, moreover, that its “bounded” nature allows for a kind of reading that people will want to keep distinct from the more fragmentary and multi-directional forms of reading we do on computers and online. (That’s certainly my personal reading strategy these days.) Perhaps, with something like the blueBook, it would be possible to have the best of both worlds.
But what about accessibility? What about trees? By the time e-paper is a practical reality, will attachment to print have definitively ebbed? Will we be used to a greater degree of interactivity (the ability not only to link text but to copy, edit and recombine it, and to mix it directly, on the “page,” with other media) than even the blueBook can provide?
Subsequent thought:A discussion about this on an email list I subscribe to reminded me of the intellectual traps that I and many others fall into when speculating about future technologies: the horse race (which technology will win?), the either/or question. What do I really think? The future of the book is not monolithic but rather a multiplicity of things – the futures of the book – and I expect (and hope) that well-crafted hyrbrid works like Kelaidis’ will be among those futures./thought
We just found out that next week Kelaidis will be spending a full day at the Institute so we’ll be able to sift through some of these questions in person.

poetry in motion

I’m not sure why we didn’t note QuickMuse last year when it debuted. No matter: the concept isn’t dated and the passing year has allowed it to accrue an archive worth visiting. On the backend, QuickMuse is a project built on software by Fletcher Moore that tracks what a writer does over time; when played back, the visitor with a Javascript-enabled browser sees how the composition was written over time, sped up if desired. On the front, editor Ken Gordon has invited a number of poets to compose a poem in fifteen minutes, based, usually, on some found text. The poetry thus created isn’t necessarily the best, but that’s immaterial: it’s interesting to see how people write. (If you’d like to try this yourself, you can use Dlog.)
Composition speeds vary. Rick Moody starts writing early, making mistakes and minor corrections, but ceaselessly moving forward at a formidable clip until his fifteen minutes are up; you get the impression he could happily keep writing at the same pace for hours. The sentence “Every year South American disappears” hangs alone in Mary Jo Salter’s composition for thirty seconds; you imagine the poet turning the phrase over in her mind to find the next sentence. Lines are added, slowly, always with time passing.
What this underscores in my mind is how writing is a weirdly private act. In a sense, the reader of QuickMuse is very close to the writer, watching the poem as it unfolds; the letters appear at the exact speed at which the writer’s fingers type them in. There’s a sense of intimacy that comes with the shared time. But the thought behind the action of typing is conspicuously absent. Is the pause a pregnant moment of decision? or simply the writer not paying attention? It’s impossible to say.

johannes who?

This is the oldest existing document in the world printed with metal movable type: an anthology of Zen teachings, Goryeo Dynasty, Korea… 1377. It’s a little known fact, at least in the West, that movable type was first developed in Korea circa 1230, over 200 years before that goldsmith from Mainz came on the scene. I saw this today in the National Library of Korea in Seoul (more on that soon). This book is actually a reproduction. The original resides in Paris and is the subject of a bitter dispute between the French and Korean governments.