Category Archives: search

a few rough notes on knols

Think you’ve got an authoritative take on a subject? Write up an article, or “knol,” and see how the Web judgeth. If it’s any good, you might even make a buck.
Google’s new encyclopedia will go head to head with Wikipedia in the search rankings, though in format it more resembles other ad-supported, single-author info sources like the or Squidoo. The knol-verse (how the hell do we speak of these things as a whole?) will be a Darwinian writers’ market where the fittest knols rise to the top. Anyone can write one. Google will host it for free. Multiple knols can compete on a single topic. Readers can respond to and evaluate knols through simple community rating tools. Content belongs solely to the author, who can license it in any way he/she chooses (all rights reserved, Creative Commons, etc.). Authors have the option of having contextual ads run to the side, revenues from which are shared with Google. There is no vetting or editorial input from Google whatsoever.
Except… Might not the ads exert their own subtle editorial influence? In this entrepreneurial writers’ fray, will authors craft their knols for AdSense optimization? Will they become, consciously or not, shills for the companies that place the ads (I’m thinking especially of high impact topic areas like health and medicine)? Whatever you may think of Wikipedia, it has a certain integrity in being ad-free. The mission is clear and direct: to build a comprehensive free encyclopedia for the Web. The range of content has no correlation to marketability or revenue potential. It’s simply a big compendium of stuff, the only mention of money being a frank electronic tip jar at the top of each page. The Googlepedia, in contrast, is fundamentally an advertising platform. What will such an encyclopedia look like?
In the official knol announcement, Udi Manber, a VP for engineering at Google, explains the genesis of the project: “The challenge posed to us by Larry, Sergey and Eric was to find a way to help people share their knowledge. This is our main goal.” You can see embedded in this statement all the trademarks of Google’s rhetoric: a certain false humility, the pose of incorruptible geek integrity and above all, a boundless confidence that every problem, no matter how gray and human, has a technological fix. I’m not saying it’s wrong to build a business, nor that Google is lying whenever it talks about anything idealistic, it’s just that time and again Google displays an astonishing lack of self-awareness in the way it frames its services -? a lack that becomes especially obvious whenever the company edges into content creation and hosting. They tend to talk as though they’re building the library of Alexandria or the great Encyclopédie, but really they’re describing an advanced advertising network of Google-exclusive content. We shouldn’t allow these very different things to become as muddled in our heads as they are in theirs. You get a worrisome sense that, like the Bushies, the cheerful software engineers who promote Google’s products on the company’s various blogs truly believe the things they’re saying. That if we can just get the algorithm right, the world can bask in the light of universal knowledge.
The blogosphere has been alive with commentary about the knol situation throughout the weekend. By far the most provocative thing I’ve read so far is by Anil Dash, VP of Six Apart, the company that makes the Movable Type software that runs this blog. Dash calls out this Google self-awareness gap, or as he puts it, its lack of a “theory of mind”:

Theory of mind is that thing that a two-year-old lacks, which makes her think that covering her eyes means you can’t see her. It’s the thing a chimpanzee has, which makes him hide a banana behind his back, only taking bites when the other chimps aren’t looking.
Theory of mind is the awareness that others are aware, and its absence is the weakness that Google doesn’t know it has. This shortcoming exists at a deep cultural level within the organization, and it keeps manifesting itself in the decisions that the company makes about its products and services. The flaw is one that is perpetuated by insularity, and will only be remedied by becoming more open to outside ideas and more aware of how people outside the company think, work and live.

He gives some examples:

Connecting PageRank to economic systems such as AdWords and AdSense corrupted the meaning and value of links by turning them into an economic exchange. Through the turn of the millennium, hyperlinking on the web was a social, aesthetic, and expressive editorial action. When Google introduced its advertising systems at the same time as it began to dominate the economy around search on the web, it transformed a basic form of online communication, without the permission of the web’s users, and without explaining that choice or offering an option to those users.

He compares the knol enterprise with GBS:

Knol shares with Google Book Search the problem of being both indexed by Google and hosted by Google. This presents inherent conflicts in the ranking of content, as well as disincentives for content creators to control the environment in which their content is published. This necessarily disadvantages competing search engines, but more importantly eliminates the ability for content creators to innovate in the area of content presentation or enhancement. Anything that is written in Knol cannot be presented any better than the best thing in Knol. [his emphasis]

And lastly concludes:

An awareness of the fact that Google has never displayed an ability to create the best tools for sharing knowledge would reveal that it is hubris for Google to think they should be a definitive source for hosting that knowledge. If the desire is to increase knowledge sharing, and the methods of compensation that Google controls include traffic/attention and money/advertising, then a more effective system than Knol would be to algorithmically determine the most valuable and well-presented sources of knowledge, identify the identity of authorites using the same journalistic techniques that the Google News team will have to learn, and then reward those sources with increased traffic, attention and/or monetary compensation.

For a long time Google’s goal was to help direct your attention outward. Increasingly we find that they want to hold onto it. Everyone knows that Wikipedia articles place highly in Google search results. Makes sense then that they want to capture some of those clicks and plug them directly into the Google ad network. But already the Web is dominated by a handful of mega sites. I get nervous at the thought that could gradually become an internal directory, that Google could become the alpha and omega, not only the start page of the Internet but all the destinations.
It will be interesting to see just how and to what extent knols start creeping up the search results. Presumably, they will be ranked according to the same secret metrics that measure all pages in Google’s index, but given the opacity of their operations, who’s to say that subtle or unconscious rigging won’t occur? Will community ratings factor in search rankings? That would seem to present a huge conflict of interest. Perhaps top-rated knols will be displayed in the sponsored links area at the top of results pages. Or knols could be listed in order of community ranking on a dedicated knol search portal, providing something analogous to the experience of searching within Wikipedia as opposed to finding articles through external search engines. Returning to the theory of mind question, will Google develop enough awareness of how it is perceived and felt by its users to strike the right balance?
One last thing worth considering about the knol -? apart from its being possibly the worst Internet neologism in recent memory -? is its author-centric nature. It’s interesting that in order to compete with Wikipedia Google has consciously not adopted Wikipedia’s model. The basic unit of authorial action in Wikipedia is the edit. Edits by multiple contributors are combined, through a complicated consensus process, into a single amalgamated product. On Google’s encyclopedia the basic unit is the knol. For each knol (god, it’s hard to keep writing that word) there is a one to one correspondence with an individual, identifiable voice. There may be multiple competing knols, and by extension competing voices (you have this on Wikipedia too, but it’s relegated to the discussion pages).
Viewed in this way, Googlepedia is perhaps a more direct rival to Larry Sanger’s Citizendium, which aims to build a more authoritative Wikipedia-type resource under the supervision of vetted experts. Citizendium is a strange, conflicted experiment, a weird cocktail of Internet populism and ivory tower elitism -? and by the look of it, not going anywhere terribly fast. If knols take off, could they be the final nail in the coffin of Sanger’s awkward dream? Bryan Alexander wonders along similar lines.
While not explicitly employing Sanger’s rhetoric of “expert” review, Google seems to be banking on its commitment to attributed solo authorship and its ad-based incentive system to lure good, knowledgeable authors onto the Web, and to build trust among readers through the brand-name credibility of authorial bylines and brandished credentials. Whether this will work remains to be seen. I wonder… whether this system will really produce quality. Whether there are enough checks and balances. Whether the community rating mechanisms will be meaningful and confidence-inspiring. Whether self-appointed experts will seem authoritative in this context or shabby, second-rate and opportunistic. Whether this will have the feeling of an enlightened knowledge project or of sleezy intellectual link farming (or something perfectly useful in between).
The feel of a site -? the values it exudes -? is an important factor though. This is why I like, and in an odd way trust Wikipedia. Trust not always to be correct, but to be transparent and to wear its flaws on its sleeve, and to be working for a higher aim. Google will probably never inspire that kind of trust in me, certainly not while it persists in its dangerous self-delusions.
A lot of unknowns here. Thoughts?

“digitization and its discontents”

Anthony Grafton’s New Yorker piece “Future Reading” paints a forbidding picture of the global digital library currently in formation on public and private fronts around the world (Google et al.). The following quote sums it up well – ?a refreshing counterpoint to the millenarian hype we so often hear w/r/t mass digitization:

The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we’ll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.

Grafton begins and ends in a nostalgic tone, with a paean to the New York Public Library and the critic Alfred Kazin: the poor son of immigrants, City College-educated, who researched his seminal study of American literature On Native Grounds almost entirely with materials freely available at the NYPL. Clearly, Grafton is a believer in the civic ideal of the public library – ?a reservoir of knowledge, free to all – ?and this animates his critique of the balkanized digital landscape of search engines and commercial databases. Given where he appears to stand, I wish he could have taken a stab at what a digital public library might look like, and what sorts of technical, social, political and economic reorganization might be required to build it. Obviously, these are questions that would have required their own article, but it would have been valuable for Grafton, whose piece is one of those occasional journalistic events that moves the issue of digitization and the future of libraries out of the specialist realm into the general consciousness, to have connected the threads. Instead Grafton ends what is overall a valuable and intelligent article with a retreat into print fetishism – ?”crowded public rooms where the sunlight gleams on varnished tables….millions of dusty, crumbling, smelly, irreplaceable documents and books” – ?which, while evocative, obscures more than it illuminates.
Incidentally, those questions are precisely what was discussed at our Really Modern Library meetings last month. We’re still compiling our notes but expect a report soon.

all the news that’s fit to search

Placing a long-term bet on online advertising and the power of search engines, the New York Times will, effective tomorrow, close down its two-year-old “Select” subscription service (which was actually making money for the paper) and opened up access to columnists, Select blogs, and archives from 1987 to the present, and 1851 to 1922. Nice!
From PaidContent, quoting the Times’ own coverage:

The change is because of what’s happened in the internet in the past two years – ?particularly the power of search.” She [Vivian Schiller, senior vp and general manager of] added later: “Think about this recipe – ?millions and millions of new documents, all seo’d [search engine optimized], double-digit advertising growth.” The Times expects “the scale and the power of the revenue that would come from that over time” to replace the subscriptions revenue and then some.

google news adds an interesting (and risky) editorial layer

Starting this week, Google News will publish comments alongside linked stories from “a special subset of readers: those people or organizations who were actual participants in the story in question.”
John Murrell and Steve Rubel have good analyses of why moving beyond pure aggregation is a risky move for Google, whose relationship with news content owners is already tense to say the least.

six blind men and an elephant

Thomas Mann, author of The Oxford Guide to Library Research, has published an interesting paper (pdf available) examining the shortcomings of search engines and the continued necessity of librarians as guides for scholarly research. It revolves around the case of a graduate student investigating tribute payments and the Peloponnesian War. A Google search turns up nearly 80,000 web pages and 700 books. An overwhelming retrieval with little in the way of conceptual organization and only the crudest of tools for measuring relevance. But, with the help of the LC Catalog and an electronic reference encyclopedia database, Mann manages to guide the student toward a manageable batch of about a dozen highly germane titles.
Summing up the problem, he recalls a charming old fable from India:

Most researchers – at any level, whether undergraduate or professional – who are moving into any new subject area experience the problem of the fabled Six Blind Men of India who were asked to describe an elephant: one grasped a leg and said “the elephant is like a tree”; one felt the side and said “the elephant is like a wall”; one grasped the tail and said “the elephant is like a rope”; and so on with the tusk (“like a spear”), the trunk (“a hose”) and the ear (“a fan”). Each of them discovered something immediately, but none perceived either the existence or the extent of the other important parts – or how they fit together.
Finding “something quickly,” in each case, proved to be seriously misleading to their overall comprehension of the subject.
In a very similar way, Google searching leaves remote scholars, outside the research library, in just the situation of the Blind Men of India: it hides the existence and the extent of relevant sources on most topics (by overlooking many relevant sources to begin with, and also by burying the good sources that it does find within massive and incomprehensible retrievals). It also does nothing to show the interconnections of the important parts (assuming that the important can be distinguished, to begin with, from the unimportant).

Mann believes that books will usually yield the highest quality returns in scholarly research. A search through a well tended library catalog (controlled vocabularies, strong conceptual categorization) will necessarily produce a smaller, and therefore less overwhelming quantity of returns than a search engine (books do not proliferate at the same rate as web pages). And those returns, pound for pound, are more likely to be of relevance to the topic:

Each of these books is substantially about the tribute payments – i.e., these are not just works that happen to have the keywords “tribute” and “Peloponnesian” somewhere near each other, as in the Google retrieval. They are essentially whole books on the desired topic, because cataloging works on the assumption of “scope-match” coverage – that is, the assigned LC headings strive to indicate the contents of the book as a whole….In focusing on these books immediately, there is no need to wade through hundreds of irrelevant sources that simply mention the desired keywords in passing, or in undesired contexts. The works retrieved under the LC subject heading are thus structural parts of “the elephant” – not insignificant toenails or individual hairs.

If nothing else, this is a good illustration of how libraries, if used properly, can still be much more powerful than search engines. But it’s also interesting as a librarian’s perspective on what makes the book uniquely suited for advanced research. That is: a book is substantial enough to be a “structural part” of a body of knowledge. This idea of “whole books” as rungs on a ladder toward knowing something. Books are a kind of conceptual architecture that, until recently, has been distinctly absent on the Web (though from the beginning certain people and services have endeavored to organize the Web meaningfully). Mann’s study captures the anxiety felt at the prospect of the book’s decline (the great coming blindness), and also the librarian’s understandable dread at having to totally reorganize his/her way of organizing things.
It’s possible, however, to agree with the diagnosis and not the prescription. True, librarians have gotten very good at organizing books over time, but that’s not necessarily how scholarship will be produced in the future. David Weinberg ponders this:

As an argument for maintaining human expertise in manually assembling information into meaningful relationships, this paper is convincing. But it rests on supposing that books will continue to be the locus of worthwhile scholarly information. Suppose more and more scholars move onto the Web and do their thinking in public, in conversation with other scholars? Suppose the Web enables scholarship to outstrip the librarians? Manual assemblages of knowledge would retain their value, but they would no longer provide the authoritative guide. Then we will have either of two results: We will have to rely on “‘lowest common denominator'”and ‘one search box/one size fits all’ searching that positively undermines the requirements of scholarly research”…or we will have to innovate to address the distinct needs of scholars….My money is on the latter.

As I think is mine. Although I would not rule out the possibility of scholars actually participating in the manual assemblage of knowledge. Communities like MediaCommons could to some extent become their own libraries, vetting and tagging a wide array of electronic resources, developing their own customized search frameworks.
There’s much more in this paper than I’ve discussed, including a lengthy treatment of folksonomies (Mann sees them as a valuable supplement but not a substitute for controlled taxonomies). Generally speaking, his articulation of the big challenges facing scholarly search and librarianship in the digital age are well worth the read, although I would argue with some of the conclusions.

the people’s card catalog (a thought)

New partners and new features. Google has been busy lately building up Book Search. On the institutional end, Ghent, Lausanne and Mysore are among the most recent universities to hitch their wagons to the Google library project. On the user end, the GBS feature set continues to expand, with new discovery tools and more extensive “about” pages gathering a range of contextual resources for each individual volume.
Recently, they extended this coverage to books that haven’t yet been digitized, substantially increasing the findability, if not yet the searchability, of thousands of new titles. The about pages are similar to Amazon’s, which supply book browsers with things like concordances, “statistically improbably phrases” (tags generated automatically from distinct phrasings in a text), textual statistics, and, best of all, hot-linked lists of references to and from other titles in the catalog: a rich bibliographic network of interconnected texts (Bob wrote about this fairly recently). Google’s pages do much the same thing but add other valuable links to retailers, library catalogues, reviews, blogs, scholarly resources, Wikipedia entries, and other relevant sites around the net (an example). Again, many of these books are not yet full-text searchable, but collecting these resources in one place is highly useful.
It makes me think, though, how sorely an open source alternative to this is needed. Wikipedia already has reasonably extensive articles about various works of literature. Library Thing has built a terrific social architecture for sharing books. There are a great number of other freely accessible resources around the web, scholarly database projects, public domain e-libraries, CC-licensed collections, library catalogs.
Could this be stitched together into a public, non-proprietary book directory, a People’s Card Catalog? A web page for every book, perhaps in wiki format, wtih detailed bibliographic profiles, history, links, citation indices, social tools, visualizations, and ideally a smart graphical interface for browsing it. In a network of books, each title ought to have a stable node to which resources can be attached and from which discussions can branch. So far Google is leading the way in building this modern bibliographic system, and stands to turn the card catalogue of the future into a major advertising cash nexus. Let them do it. But couldn’t we build something better?

emerging libraries at rice: day one

For the next few days, Bob and I will be at the De Lange “Emerging Libraries” conference hosted by Rice University in Houston, TX, coming to you live with occasional notes, observations and overheard nuggets of wisdom. Representatives from some of the world’s leading libraries are here: the Library of Congress, the British Library, the new Bibliotheca Alexandrina, as well as the architects of recent digital initiatives like the Internet Archive, and the Public Library of Science. A very exciting gathering indeed.
We’re here, at least in part, with our publisher hat on, thinking quite a lot these days about the convergence of scholarly publishing with digital research infrastructure (i.e. MediaCommons). It was fitting then that the morning kicked off with a presentation by Richard Baraniuk, founder of the open access educational publishing platform Connexions. Connexions, which last year merged with the digitally reborn Rice University Press, is an innovative repository of CC-licensed courses and modules, built on an open volunteer basis by educators and freely available to weave into curricula and custom-designed collections, or to remix and recombine into new forms.
Connexions is designed not only as a first-stop resource but as a foundational layer upon which richer and more focused forms of access can be built. Foremost among those layers of course is Rice University Press, which, apart from using the Connexions publishing framework will still operate like a traditional peer review-driven university press. But other scholarly and educational communities are also encouraged to construct portals, or “lenses” as they call them, to specific areas of the Connexions corpus, possibly filtered through post-publication peer review. It will be interesting to see whether Connexions really will end up supporting these complex external warranting processes or if it will continue to serve more as a building block repository — an educational lumber yard for educators around the world.
Constructive crit: there’s no doubt that Connexions is one of the most important and path-breaking scholarly publishing projects out there, though it still feels to me more like backend infrastructure than a fully developed networked press. It has a flat, technical-feeling design and cookie cutter templates that give off a homogenous impression in spite of the great diversity of materials. The social architecture is also quite limited, and what little is there (ways to suggest edits and discussion forums attached to modules) is not well integrated with course materials. There’s an opportunity here to build more tightly knit communities around these offerings — lively feedback loops to improve and expand entries, areas to build pedagogical tutorials and to collect best practices, and generally more ways to build relationships that could lead to further collaboration. I got to chat with some of the Connexions folks and the head of the Rice press about some of these social questions and they were very receptive.

*     *     *     *     *

Michael A. Keller of Stanford spoke of emerging “cybraries” and went through some very interesting and very detailed elements of online library search that I’m too exhausted to summarize now. He capped off his talk with a charming tour through the Stanford library’s Second Life campus and the library complex on Information Island. Keller said he ultimately doesn’t believe that purely imitative virtual worlds will become the principal interface to libraries but that they are nonetheless a worthwhile area for experimentation.
Browsing during the talk, I came across an interesting and similarly skeptical comment by Howard Rheingold on a long-running thread on Many 2 Many about Second Life and education:

I’ve lectured in Second Life, complete with slides, and remarked that I didn’t really see the advantage of doing it in SL. Members of the audience pointed out that it enabled people from all over the world to participate and to chat with each other while listening to my voice and watching my slides; again, you don’t need an immersive graphical simulation world to do that. I think the real proof of SL as an educational medium with unique affordances would come into play if an architecture class was able to hold sessions within scale models of the buildings they are studying, if a biochemistry class could manipulate realistic scale-model simulations of protein molecules, or if any kind of lesson involving 3D objects or environments could effectively simulate the behaviors of those objects or the visual-auditory experience of navigating those environments. Just as the techniques of teleoperation that emerged from the first days of VR ended up as valuable components of laparascopic surgery, we might see some surprise spinoffs in the educational arena. A problem there, of course, is that education systems suffer from a great deal more than a lack of immersive environments. I’m not ready to write off the educational potential of SL, although, as noted, the importance of that potential should be seen in context. In this regard, we’re still in the early days of the medium, similar to cinema in the days when filmmakers nailed a camera tripod to a stage and filmed a play; SL needs D.W. Griffiths to come along and invent the equivalent of close-ups, montage, etc.

Rice too has some sort of Second Life presence and apparently was beaming the conference into Linden land.

*     *     *     *     *

Next came a truly mind-blowing presentation by Noha Adly of the Bibliotheca Alexandrina in Egypt. Though only five years old, the BA casts itself quite self-consciously as the direct descendant of history’s most legendary library, the one so frequently referenced in contemporary utopian rhetoric about universal digital libraries. The new BA glories in this old-new paradigm, stressing continuity with its illustrious past and at the same time envisioning a breathtakingly modern 21st century institution unencumbered by the old thinking and constrictive legacies that have so many other institutions tripping over themselves into the digital age. Adly surveyed more fascinating-sounding initiatives, collections and research projects than I can possibly recount. I recommend investigating their website to get a sense of the breadth of activity that is going on there. I will, however, note that that they are the only library in the world to house a complete copy of the Internet Archive: 1.5 petabytes of data on nearly 900 computers.
olpckahle.jpg (Speaking of the IA, Brewster Kahle is also here and is closing the conference Wednesday afternoon. He brought with him a test model of the hundred dollar laptop, which he showed off at dinner (pic to the right) in tablet mode sporting an e-book from the Open Content Alliance’s children’s literature collection (a scanned copy of The Owl and the Pussycat)).
And speaking of old thinking and constrictive legacies, following Adly was Deanna B. Marcum, an associate librarian at the Library of Congress. Marcum seemed well aware of the big picture but gave off a strong impression of having hands tied by a change-averse institution that has still not come to grips with the basic fact of the World Wide Web. It was a numbing hour and made one palpably feel the leadership vacuum left by the LOC in the past decade, which among other things has allowed Google to move in and set the agenda for library digitization.
Next came Lynne J. Brindley, Chief Executive of the British Library, which is like apples to the LOC’s oranges. Slick, publicly engaged and with pockets deep enough to really push the technological envelope, the British Library is making a very graceful and sometimes flashy (Turning the Pages) migration to the digital domain. Brindley had many keen insights to offer and described a several BL experiments that really challenge the conventional wisdom on library search and exhibitions. I was particularly impressed by these “creative research” features: short, evocative portraits of a particular expert’s idiosyncratic path through the collections; a clever way of featuring slices of the catalogue through the eyes of impassioned researchers (e.g. here). Next step would be to open this up and allow the public to build their own search profiles.

*     *     *     *     *

That more or less covers today with the exception of a final keynote talk by John Seely Brown, which was quite inspiring and included a very kind mention of our work at MediaCommons. It’s been a long day, however, and I’m fading. So I’ll pick that up tomorrow.

feeling random

Following HarperCollins’ recent Web renovations, Random House today unveiled their publisher-driven alternative to Google: a new, full-text search engine of over 5,000 new and backlist books including browsable samples of select titles. The most interesting thing here is that book samples can be syndicated on other websites through a page-flipping browser widget (Flash 9 required) that you embed with a bit of cut-and-paste code (like a YouTube clip). It’s a nice little tool, though it comes in two sizes only — one that’s too small to read, and one that embedded would take up most of a web page (plus it keeps crashing my browser). Compare below with HarperCollins’ simpler embeddable book link:

Worth noting here is that both the search engine and the sampling widget were produced by Random House in-house. Too many digital forays by major publishers are accomplished by hiring an external Web shop, meaning of course that little ends up being learned within the institution. It’s an old mantra of Bob’s that publishers’ digital budgets would be better spent by throwing 20 grand at a bright young editor or assistant editor a few years out of college and charging them with the task of doing something interesting than by pouring huge sums into elaborate revampings from the outside. Random House’s recent home improvements were almost certainly more expensive, and more focused on infrastructure and marketing than on genuinely reinventing books, but they indicate a do it yourself approach that could, maybe, lead in new directions.

belgian news sites don cloak of invisibility

In an act of stunning shortsightedness, a consortium of 19 Belgian newspapers has sued and won a case against Google for copyright infringement in its News Search engine. Google must now remove all links, images and cached pages of these sites from its database or else face fines. Similar lawsuits from other European papers are likely to follow soon.
The main beef in the case (all explained in greater detail here) is Google’s practice of deep linking to specific articles, which bypasses ads on the newspapers’ home pages and reduces revenue. This and Google’s caching of full articles for search purposes, copies that the newspapers contend could be monetized through a pay-for-retrieval service. Echoes of the Book Search lawsuits on this side of the Atlantic…
What the Belgians are in fact doing is rendering their papers invisible to a potentially global audience. Instead of lashing out against what is essentially a free advertising service, why not rethink your own ad structure to account for the fact that more and more readers today are coming through search engines and not your front page? While you’re at it, rethink the whole idea of a front page. Or better yet, join forces with other newspapers, form your own federated search service and beat Google at its own game.

ecclesiastical proust archive: starting a community

(Jeff Drouin is in the English Ph.D. Program at The Graduate Center of the City University of New York)
About three weeks ago I had lunch with Ben, Eddie, Dan, and Jesse to talk about starting a community with one of my projects, the Ecclesiastical Proust Archive. I heard of the Institute for the Future of the Book some time ago in a seminar meeting (I think) and began reading the blog regularly last Summer, when I noticed the archive was mentioned in a comment on Sarah Northmore’s post regarding Hurricane Katrina and print publishing infrastructure. The Institute is on the forefront of textual theory and criticism (among many other things), and if:book is a great model for the kind of discourse I want to happen at the Proust archive. When I finally started thinking about how to make my project collaborative I decided to contact the Institute, since we’re all in Brooklyn, to see if we could meet. I had an absolute blast and left their place swimming in ideas!
Saint-Lô, by Corot (1850-55)While my main interest was in starting a community, I had other ideas — about making the archive more editable by readers — that I thought would form a separate discussion. But once we started talking I was surprised by how intimately the two were bound together.
For those who might not know, The Ecclesiastical Proust Archive is an online tool for the analysis and discussion of à la recherche du temps perdu (In Search of Lost Time). It’s a searchable database pairing all 336 church-related passages in the (translated) novel with images depicting the original churches or related scenes. The search results also provide paratextual information about the pagination (it’s tied to a specific print edition), the story context (since the passages are violently decontextualized), and a set of associations (concepts, themes, important details, like tags in a blog) for each passage. My purpose in making it was to perform a meditation on the church motif in the Recherche as well as a study on the nature of narrative.
I think the archive could be a fertile space for collaborative discourse on Proust, narratology, technology, the future of the humanities, and other topics related to its mission. A brief example of that kind of discussion can be seen in this forum exchange on the classification of associations. Also, the church motif — which some might think too narrow — actually forms the central metaphor for the construction of the Recherche itself and has an almost universal valence within it. (More on that topic in this recent post on the archive blog).
Following the if:book model, the archive could also be a spawning pool for other scholars’ projects, where they can present and hone ideas in a concentrated, collaborative environment. Sort of like what the Institute did with Mitchell Stephens’ Without Gods and Holy of Holies, a move away from the ‘lone scholar in the archive’ model that still persists in academic humanities today.
One of the recurring points in our conversation at the Institute was that the Ecclesiastical Proust Archive, as currently constructed around the church motif, is “my reading” of Proust. It might be difficult to get others on board if their readings — on gender, phenomenology, synaesthesia, or whatever else — would have little impact on the archive itself (as opposed to the discussion spaces). This complex topic and its practical ramifications were treated more fully in this recent post on the archive blog.
I’m really struck by the notion of a “reading” as not just a private experience or a public writing about a text, but also the building of a dynamic thing. This is certainly an advantage offered by social software and networked media, and I think the humanities should be exploring this kind of research practice in earnest. Most digital archives in my field provide material but go no further. That’s a good thing, of course, because many of them are immensely useful and important, such as the Kolb-Proust Archive for Research at the University of Illinois, Urbana-Champaign. Some archives — such as the NINES project — also allow readers to upload and tag content (subject to peer review). The Ecclesiastical Proust Archive differs from these in that it applies the archival model to perform criticism on a particular literary text, to document a single category of lexia for the experience and articulation of textuality.
American propaganda, WWI, depicting the destruction of Rheims CathedralIf the Ecclesiastical Proust Archive widens to enable readers to add passages according to their own readings (let’s pretend for the moment that copyright infringement doesn’t exist), to tag passages, add images, add video or music, and so on, it would eventually become a sprawling, unwieldy, and probably unbalanced mess. That is the very nature of an Archive. Fine. But then the original purpose of the project — doing focused literary criticism and a study of narrative — might be lost.
If the archive continues to be built along the church motif, there might be enough work to interest collaborators. The enhancements I currently envision include a French version of the search engine, the translation of some of the site into French, rewriting the search engine in PHP/MySQL, creating a folksonomic functionality for passages and images, and creating commentary space within the search results (and making that searchable). That’s some heavy work, and a grant would probably go a long way toward attracting collaborators.
So my sense is that the Proust archive could become one of two things, or two separate things. It could continue along its current ecclesiastical path as a focused and led project with more-or-less particular roles, which might be sufficient to allow collaborators a sense of ownership. Or it could become more encyclopedic (dare I say catholic?) like a wiki. Either way, the organizational and logistical practices would need to be carefully planned. Both ways offer different levels of open-endedness. And both ways dovetail with the very interesting discussion that has been happening around Ben’s recent post on the million penguins collaborative wiki-novel.
Right now I’m trying to get feedback on the archive in order to develop the best plan possible. I’ll be demonstrating it and raising similar questions at the Society for Textual Scholarship conference at NYU in mid-March. So please feel free to mention the archive to anyone who might be interested and encourage them to contact me at And please feel free to offer thoughts, comments, questions, criticism, etc. The discussion forum and blog are there to document the archive’s development as well.
Thanks for reading this very long post. It’s difficult to do anything small-scale with Proust!