Monthly Archives: December 2006

people-powered search (part 1)

Last week, the London Times reported that the Wikipedia founder, Jimbo Wales, was announcing a new search engine called “Wikiasari.” This search engine would incorporate a new type of social ranking system and would rival Google and Yahoo in potential ad revenue. When the news first got out, the blogosphere went into a frenzy; many echoing inaccurate information – mostly in excitement – causing lots confusion. Some sites even printed dubious screenshots of what they thought was the search engine.
Alas, there were no real screenshots and there was no search engine… yet. Yesterday, unable to make any sense what was going on by reading the blogs, I looked through the developer mailing list and found this post by Jimmy Wales:

The press coverage this weekend has been a comedy of errors. Wikiasari was not and is not the intended name of this project… the London Times picked that off an old wiki page from back in the day when I was working on the old code base and we had a naming contest for it. […] And then TechCrunch ran a screenshot of something completely unrelated, thus unfortunately perhaps leading people to believe that something is already built about about to be unveiled. No, the point of the project is to build something, not to unveil something which has already been built.

And in the Wikia search webpage he explains why:

Search is part of the fundamental infrastructure of the Internet. And, it is currently broken. Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency. Here, we will change all that.

So there is no Google-killer just yet, but something is brewing.
From the details that we have so far, we know that this new search engine will be funded by Wikia Inc, Wales’ for-profit and ad-driven MediaWiki hosting company. We also know that the search technology will be based on Nutch and Lucene – the same technology that powers Wikipedia’s search. And we also know that the search engine will allow users to directly influence search results.
I found interesting that in the Wikia “about page”, Wales suggests that he has yet to make up his mind on how things are going to work, so suggestions appear to be welcome.
Also, during the frenzy, I managed to find many interesting technologies that I think might be useful in making a new kind of search engine. Now that a dialog appears to be open and there is good reason to believe a potentially competitive search engine could be built, current experimental technologies might play an important role in the development of Wikia’s search. Some questions that I think might be useful to ponder are:
Can current social bookmarking tools, like, provide a basis for determining “high quality” sites? Will using Wikipedia and it’s external site citing engine make sense for determining “high quality” links? Will using a Digg-like, rating system result spamless or simply just low brow results? Will a search engine dependant on tagging, but no spider be useful? But the question I am most interested in is whether a large scale manual indexing lay the foundation for what could turn into the Semantic Web (Web 3.0)? Or maybe just Web 2.5?
The most obvious and most difficult challenge for Wikia, besides coming up with a good name and solid technology, will be with dealing with sheer size of the internet.
I’ve found that open-source communities are never as large or as strong as they appear. Wikipedia is one of the largest and one of the most successful online collaborative projects, yet just over 500 people make over 50% of all edits and about 1400 make about 75% of all edits. If Wikia’s new search engine does not generate a large group of users to help index the web early on, this project will not survive; A strong online community, possibly in a magnitude we’ve never seen before, might be necessary to ensure that people-powered search is of any use.

back to the future

John Walter, a graduate student at St. Louis University wrote to the TechRet list the other day to announce the launch of the Walter Ong Collection, a digital archive based at the SLU. I went to the site and downloaded a PDF of an early version of one of Ong’s more famous essays, “The Writer’s Audience Is Always a Fiction.” In this particular essay, Ong who made his name analyzing the difference between oral and written communication, explores how this shift changed the role of the reader. Ong makes the case that the role of the reader is quite different than the role of the “listener” in oral communication.

“The orator has before him an audience which is a true audience, a collectivity. ‘Audience” is a collective noun. There is no such collective noun for readers, nor so far as I am able to puzzle out, can there be. “Readers” is a plural. For readers do not form a collectivity acting here and now on one another, and on the one speaking to them, as members of an audience do.”

What’s so interesting here, is that it seems that the age of networked reading and writing promises to get us much closer to one of the crucial aspects of oral culture — the sense that the story teller/author and the audience/reader are joined together in a collective enterprise where the actions of each will have a direct and noticeable impact on the other.

scholarpedia: sharpening the wiki for expert results

Eugene M. Izhikevich, a Senior Fellow in Theoretical Neurobiology at The Neurosciences Institute in San Diego, wants to see if academics can collaborate to produce a peer reviewed equivalent to Wikipedia. The attempt is Scholarpedia, a free peer reviewed encyclopedia, entirely open to public contributions but with editorial oversight by experts.
scholarpedia.jpg At first, this sounded to me a lot like Larry Sanger’s Citizendium project, which will attempt to add an expert review layer to material already generated by Wikipedia (they’re calling it a “progressive fork” off of the Wikipedia corpus). Sanger insists that even with this added layer of control the open spirit of Wikipedia will live on in Citizendium while producing a more rigorous and authoritative encyclopedia.
It’s always struck me more as a simplistic fantasy of ivory tower-common folk détente than any reasoned community-building plan. We’ll see if Walesism and Sangerism can be reconciled in a transcendent whole, or if intellectual class warfare (of the kind that has already broken out on multiple occasions between academics and general contributors on Wikipedia) — or more likely inertia — will be the result.
The eight-month-old Scholarpedia, containing only a few dozen articles and restricted for the time being to three neuroscience sub-fields, already feels like a more plausible proposition, if for no other reason than that it knows who its community is and that it establishes an unambiguous hierarchy of participation. Izhikevich has appointed himself editor-in-chief and solicited full articles from scholarly peers around the world. First the articles receive “in-depth, anonymous peer review” by two fellow authors, or by other reviewers who measure sufficiently high on the “scholar index.” Peer review, it is explained, is employed both “to insure the accuracy and quality of information” but also “to allow authors to list their papers as peer-reviewed in their CVs and resumes” — a marriage of pragmaticism and idealism in Mr. Izhikevich.
After this initial vetting, the article is officially part of the Scholarpedia corpus and is hence open to subsequent revisions and alterations suggested by the community, which must in turn be accepted by the author, or “curator,” of the article. The discussion, or “talk” pages, familiar from Wikipedia are here called “reviews.” So far, however, it doesn’t appear that many of the approved articles have received much of a public work-over since passing muster in the initial review stage. But readers are weighing in (albeit in modest numbers) in the public election process for new curators. I’m very curious to see if this will be treated by the general public as a read-only site, or if genuine collaboration will arise.
It’s doubtful that this more tightly regulated approach could produce a work as immense and varied as Wikipedia, but it’s pretty clear that this isn’t the goal. It’s a smaller, more focused resource that Izhikevich and his curators are after, with an eye toward gradually expanding to embrace all subjects. I wonder, though, if the site wouldn’t be better off keeping its ambitions concentrated, renaming itself something like “Neuropedia” and looking simply to inspire parallel efforts in other fields. One problem of open source knowledge projects is that they’re often too general in scope (Scholarpedia says it all). A federation of specialized encyclopedias, produced by focused communities of scholars both academic and independent — and with some inter-disciplinary porousness — would be a more valuable, if less radical, counterpart to Wikipedia, and more likely to succeed than the Citizendium chimera.

future of the filter

An article by Jon Pareles in the Times (December 10th, 2006) brings to mind some points that have been risen here throughout the year. One, is the “corporatization” of user-generated content, the other is what to do with all the material resulting from the constant production/dialogue that is taking place on the Internet.
Pareles summarizes the acquisition of MySpace by Rupert’s Murdoch’s News Corporation and YouTube by Google with remarkable clarity:

What these two highly strategic companies spent more than $2 billion on is a couple of empty vessels: brand-named, centralized repositories for whatever their members decide to contribute.

As he puts it, this year will be remembered as the year in which old-line media, online media and millions of individual web users agreed. I wouldn’t use the term “agreed,” but they definitely came together as the media giants saw the financial possibilities of individual self-expression generated in the Web. As it usually happens with independent creative products, large amounts of the art originated in websites such as MySpace and YouTube, borrow freely and get distributed and promoted outside of the traditional for-profit mechanisms. As Pareles says, “it’s word of mouth that can reach the entire world.” Nonetheless, the new acquisitions will bring a profit for some while the rest will supply material for free. But, problems arise when part of that production uses copyrighted material. While we have artists fighting immorally to extend copyright laws, we have Google paying copyright holders for material used in YouTube, but also fighting them.
The Internet has allowed for the democratization of creation and distribution, it has made the anonymous public while providing virtual meeting places for all groups of people. The flattening of the wax cylinder into a portable, engraved surface that produced sound when played with a needle, brought the music hall, the clubs and cabarets into the home, but it also gave rise to the entertainment business. Now the CD burner, the MP3, and online tools have brought the recording studio into the home. Interestingly enough, far from promoting isolation, the Internet has generated dialogue. YouTube is not a place for merely watching dubious videos; it is also a repository of individual reactions. Something similar is happening with film, photography and books. But, what to do with all that? Pareles sees the proliferation of blogs and the user-generated play lists as a sort of filter from which the media moguls are profiting: “Selection, a time-consuming job, has been outsourced. What’s growing is the plentitude not just of user-generated content, but also of user-filtered content.” But he adds, “Mouse-clicking individuals can be as tasteless, in the aggregate, as entertainment professionals.” What is going to happen as private companies become the holders of those filters?

live, on the web, it’s the iraq study group report!

Since leaving Harper’s last spring, Lewis Lapham has been developing plans for a new journal, Lapham’s Quarterly, which will look at important contemporary subjects (one per issue) through the lens of history. Not long ago, Lewis approached the Institute about helping him and his colleagues to develop a web component of the Quarterly, which he imagined as a kind of unorthodox news site where history and source documents would serve as a decoder ring for current events — a trading post of ideas, facts, and historical parallels where readers would play a significant role in piecing together the big picture. To begin probing some of the possibilities for this collaboration, we came up with an exciting and timely experiment: we’ve taken the granular commenting format that we hacked together just a few weeks ago for Mitch Stephens’ paper and plugged in the Iraq Study Group Report. The Lapham crew, for their part, have taken their first editorial plunge into the web, using their broad print network to assemble an astonishing roster of intellectual heavyweights to collectively annotate the text, paragraph by paragraph, live on the site. Here’s more from Lewis:

As expected and in line with standard government practice, the report issued by the Iraq Study Group on December 6th comes to us written in a language that guards against the crime of meaning–a document meant to be admired as a praise-worthy gesture rather than understood as a clear statement of purpose or an intelligible rendition of the facts. How then to read the message in the bottle or the handwriting on the wall?
Lapham’s Quarterly and the Institute for the Future of the Book answers the question with a new form of discussion and critique– an annotated edition of the ISG Report on a website programmed to that specific purpose, evolving over the course of the next three weeks into a collaborative illumination of an otherwise black hole. What you have before you is the humble beginnings of that effort–the first few marginal notes and commentaries furnished by what will eventually be a large number of informed sources both foreign and domestic (historians, military officers, politicians, intelligence operatives, diplomats, some journalists), invited to amend, correct, augment or contradict any point in the text seemingly in need of further clarification or forthright translation into plain English.
As the discussion adds to the number of its participants so also it will extend the reach of its memory and enlarge its spheres of reference. What we hope will take shape on short notice and in real time is the publication of a report that should prove to be a good deal more instructive than the one distributed to the members of Congress and the major news media.

Being at the very beginning of the experiment, what you’ll see on the site today is more or less a blank slate. Our hope is that in the days and weeks ahead, a lively conversation will begin to bubble up in the pages of the report — a kind of collaborative marginalia on a grand scale — mounting toward Bush’s big Iraq strategy speech next month. Around that time, the Lapham’s editors will open up commenting to the public. Until then, here are just some of the people we expect to participate: Anthony Arnove, Helena Cobban, Joshua Cohen, Jean Daniel, Raghida Dergham, Joan Didion, Mark Danner, Barbara Ehrenrich, Daniel Ellsberg, Tom Engelhardt, Stanley Fish, Robert Fisk, Eric Foner, Christopher Hitchens, Rashid Khalidi, Chalmers Johnson, Donald Kagan, Kanan Makiya, William Polk, Walter Russel Mead, Karl Meyer, Ralph Nader, Gianni Riotta, M.J. Rosenberg, Gary Sick, Matthew Stevenson, Frances Stonor, Lally Weymouth, and Wayne White.
Not too shabby.
The form is still very much in the R&D phase, but we’ve made significant improvements since the last round. Add this to your holiday reading stack and watch how it develops.
(We strongly recommend viewing the site in Firefox.)

table of comments

Yesterday Bud Parr made the following suggestion regarding the design of Mitch Stephens’ networked paper, “The Holy of Holies”:

I think the site would benefit from something right up front highlighting the most recent exchange of comments and/or what’s getting the most attention in terms of comments.

It just so happens that we’ve been cooking up something that does more or less what he describes: a simple meta comment page, or “table of comments,” displaying a running transcript of all the conversations in the site filtered by section. You can get to it from a link on the front page next to the total comment count for the paper (as of this writing there are 93).
It’s an interesting way to get into the text: through what people are saying about it. Any other ideas of how something like this could work?

report on scholarly cyberinfrastructure

The American Council of Learned Societies has just issued a report, “Our Cultural Commonwealth,” assessing the current state of scholarly cyberinfrastructure in the humanities and social sciences and making a series of recommendations on how it can be strengthened, enlarged and maintained in the future.
The definition of cyberinfastructure they’re working with:

“the layer of information, expertise, standards, policies, tools, and services that are shared broadly across communities of inquiry but developed for specific scholarly purposes: cyberinfrastructure is something more specific than the network itself, but it is something more general than a tool or a resource developed for a particular project, a range of projects, or, even more broadly, for a particular discipline.”

I’ve only had time to skim through it so far, but it all seems pretty solid.
John Holbo pointed me to the link in some musings on scholarly publishing in Crooked Timber, where he also mentions our Holy of Holies networked paper prototype as just one possible form that could come into play in a truly modern cyberinfrastructure. We’ve been getting some nice notices from others active in this area such as Cathy Davidson at HASTAC. There’s obviously a hunger for this stuff.

2 – dimensions just aren’t enough

We’re burning way too much midnight oil this weekend trying to ready a networked version of the Iraq Study Group report for release next week. We’ll introduce the project itself in a few days, but right now i just want to mention that i think we’re about at the end of our ability to organize these very complex reading/writing projects using the 2-dimensional design constraints inherited from print. Ben came to the same conclusion in his recent post inspired by the difficulty of designing the site for Mitch Stephens’ paper, Holy of Holies. My first resolution for 2007 is to try an experiments building a networked book inside of Second Life or some other three-dimensional environment.

worth reading

In 2005, Jean-Noël Jeanneney, the President of the Bibliothè que Nationale (France’s equivalent of the Library of Congress) wrote one of the most trenchant critiques of Google’s intention to digitize millions of books from a number of major libraries. Jeanneney expanded on his original essay and this past October, The University of Chicago Press published a translation of Google and the Myth of Universal Knowledge: A View from Europe. In this December’s D-Lib Magazine, there’s a superb precis and analysis of Jeanneney’s book by Dave Bearman.

lessig’s “code v.2” released

codev2.jpg A new edition of Larry Lessig’s classic Code and Other Laws of Cyberspace has emerged from the crucible of an extended public revision process. From the preface:

The genesis of the revisions found here was a wiki. Basic Books allowed me to post the original edition of the book in a wiki hosted by Jotspot, and a team of “chapter captains” helped facilitate a conversation about the text. There were some edits to the text itself, and many more valuable comments and criticisms. I then took that text as of the end of 2005 and added my own edits to produce this book. While I wouldn’t go as far as the musician Jeff Tweedy (“Half of it’s you, half is me”), an important part of this is not my work. In recognition of that, I’ve committed the royalties from this book to the nonprofit Creative Commons.

“Code v2” is available in paperback ($$) and PDF (free download). The original revision wiki is still up, and they’ve also set up a new one so that the book can continue to evolve.