people-powered search (part 1)

Last week, the London Times reported that the Wikipedia founder, Jimbo Wales, was announcing a new search engine called “Wikiasari.” This search engine would incorporate a new type of social ranking system and would rival Google and Yahoo in potential ad revenue. When the news first got out, the blogosphere went into a frenzy; many echoing inaccurate information – mostly in excitement – causing lots confusion. Some sites even printed dubious screenshots of what they thought was the search engine.
Alas, there were no real screenshots and there was no search engine… yet. Yesterday, unable to make any sense what was going on by reading the blogs, I looked through the developer mailing list and found this post by Jimmy Wales:

The press coverage this weekend has been a comedy of errors. Wikiasari was not and is not the intended name of this project… the London Times picked that off an old wiki page from back in the day when I was working on the old code base and we had a naming contest for it. […] And then TechCrunch ran a screenshot of something completely unrelated, thus unfortunately perhaps leading people to believe that something is already built about about to be unveiled. No, the point of the project is to build something, not to unveil something which has already been built.

And in the Wikia search webpage he explains why:

Search is part of the fundamental infrastructure of the Internet. And, it is currently broken. Why is it broken? It is broken for the same reason that proprietary software is always broken: lack of freedom, lack of community, lack of accountability, lack of transparency. Here, we will change all that.

So there is no Google-killer just yet, but something is brewing.
From the details that we have so far, we know that this new search engine will be funded by Wikia Inc, Wales’ for-profit and ad-driven MediaWiki hosting company. We also know that the search technology will be based on Nutch and Lucene – the same technology that powers Wikipedia’s search. And we also know that the search engine will allow users to directly influence search results.
I found interesting that in the Wikia “about page”, Wales suggests that he has yet to make up his mind on how things are going to work, so suggestions appear to be welcome.
Also, during the frenzy, I managed to find many interesting technologies that I think might be useful in making a new kind of search engine. Now that a dialog appears to be open and there is good reason to believe a potentially competitive search engine could be built, current experimental technologies might play an important role in the development of Wikia’s search. Some questions that I think might be useful to ponder are:
Can current social bookmarking tools, like, provide a basis for determining “high quality” sites? Will using Wikipedia and it’s external site citing engine make sense for determining “high quality” links? Will using a Digg-like, rating system result spamless or simply just low brow results? Will a search engine dependant on tagging, but no spider be useful? But the question I am most interested in is whether a large scale manual indexing lay the foundation for what could turn into the Semantic Web (Web 3.0)? Or maybe just Web 2.5?
The most obvious and most difficult challenge for Wikia, besides coming up with a good name and solid technology, will be with dealing with sheer size of the internet.
I’ve found that open-source communities are never as large or as strong as they appear. Wikipedia is one of the largest and one of the most successful online collaborative projects, yet just over 500 people make over 50% of all edits and about 1400 make about 75% of all edits. If Wikia’s new search engine does not generate a large group of users to help index the web early on, this project will not survive; A strong online community, possibly in a magnitude we’ve never seen before, might be necessary to ensure that people-powered search is of any use.

wikipedia-britannica debate

The Wall Street Journal the other day hosted an email debate between Wikipedia founder Jimmy Wales and Encyclopedia Britannica editor-in-chief Dale Hoiberg. Irreconcilible differences, not surprisingly, were in evidence. Wales_Jimmy_gst09072006111650.jpg Hoiberg_Dale_gst09072006111650.jpg But one thing that was mentioned, which I had somehow missed recently, was a new governance experiment just embarked upon by the German Wikipedia that could dramatically reduce vandalism, though some say at serious cost to Wikipedia’s openness. In the new system, live pages will no longer be instantaneously editable except by users who have been registered on the site for a certain (as yet unspecified) length of time, “and who, therefore, [have] passed a threshold of trustworthiness” (CNET). All edits will still be logged, but they won’t be reflected on the live page until that version has been approved as “non-vandalized” by more senior administrators. One upshot of the new German policy is that Wikipedia’s front page, which has long been completely closed to instantaneous editing, has effectively been reopened, at least for these “trusted” users.
In general, I believe that these sorts of governance measures are a sign not of a creeping conservatism, but of the growing maturity of Wikipedia. But it’s a slippery slope. In the WSJ debate, Wales repeatedly assails the elitism of Britannica’s closed editorial model. But over time, Wikipedia could easily find itself drifting in that direction, with a steadily hardening core of overseers exerting ever tighter control. Of course, even if every single edit were moderated, it would still be quite a different animal from Britannica, but Wales and his council of Wikimedians shouldn’t stray too far from what made Wikipedia work in the first place, and from what makes it so interesting.
In a way, the exchange of barbs in the Wales-Hoiberg debate conceals a strange magnetic pull between their respective endeavors. Though increasingly seen as the dinosaur, Britannica has made small but not insignificant moves toward openess and currency on its website (Hoiberg describes some of these changes in the exchange), while Wikipedia is to a certain extent trying to domesticate itself in order to attain the holy grail of respectability that Britannica has long held. Think what you will about Britannica’s long-term prospects, but it’s a mistake to see this as a clear-cut story of violent succession, of Wikipedia steamrolling Britannica into obsolescence. It’s more interesting to observe the subtle ways in which the two encyclopedias cause each other to evolve.
Wales certainly has a vision of openness, but he also wants to publish the world’s best encyclopedia, and this includes releasing something that more closely resembles a Britannica. Back in 2003, Wales proposed the idea of culling Wikipedia’s best articles to produce a sort of canonical version, a Wikipedia 1.0, that could be distributed on discs and printed out across the world. Versions 1.1, 1.2, 2.0 etc. would eventually follow. This is a perfectly good idea, but it shouldn’t be confused with the goals of the live site. I’m not saying that the “non-vandalized” measure was constructed specifically to prepare Wikipedia for a more “authoritative” print edition, but the trains of thought seem to have crossed. Marking versions of articles as non-vandalized, or distinguishing them in other ways, is a good thing to explore, but not at the expense of openness at the top layer. It’s that openness, crazy as it may still seem, that has lured millions into this weird and wonderful collective labor.

wikipedia, lifelines, and the packaging of authority

03comm500.364.jpg In a nice comment in yesterday’s Times, “The Nitpicking of the Masses vs. the Authority of the Experts,” George Johnson revisits last month’s Seigenthaler smear episode and Nature magazine Wikipedia-Britannica comparison, and decides to place his long term bets on the open-source encyclopedia:

It seems natural that over time, thousands, then millions of inexpert Wikipedians – even with an occasional saboteur in their midst – can produce a better product than a far smaller number of isolated experts ever could.

Reading it, a strange analogy popped into my mind: “Who Wants to Be a Millionaire.” Yes, the game show. What does it have to do with encyclopedias, the internet and the re-mapping of intellectual authority? I’ll try to explain. “Who Wants to Be a Millionaire” is a simple quiz show, very straightforward, like “Jeopardy” or “The $64,000 Question.” A single contestant answers a series of multiple choice questions, and with each question the money stakes rise toward a million-dollar jackpot. The higher the stakes the harder the questions (and some seriously overdone lighting and music is added for maximum stress). There is a recurring moment in the game when the contestant’s knowledge fails and they have the option of using one of three “lifelines” that have been alloted to them for the show.
The first lifeline (and these can be used in any order) is the 50:50, which simply reduces the number of possible answers from four to two, thereby doubling your chances of selecting the correct one — a simple jiggering of probablities. wwtbam002.jpg The other two are more interesting. The second lifeline is a telephone call to a friend or relative at home who is given 30 seconds to come up with the answer to the stumper question. This is a more interesting kind of a probability, since it involves a personal relationship. It deals with who you trust, who you feel you can rely on. Last, and my favorite, is the “ask the audience” lifeline, in which the crowd in the studio is surveyed and hopefully musters a clear majority behind one of the four answers. Here, the probability issue gets even more intriguing. Your potential fortune is riding on the knowledge of a room full of strangers.
In most respects, “Who Wants to Be a Millionaire” is just another riff on the classic quiz show genre, but the lifeline option pegs it in time, providing a clue about its place in cultural history. The perceptive game show anthropologist would surely recognize that the lifeline is all about the network. It’s what gives “Millionaire” away as a show from around the time of the tech bubble in the late 90s — manifestly a network-era program. Had it been produced in the 50s, the lifeline option would have been more along the lines of “ask the professor!” Lights rise on a glass booth containing a mustached man in a tweed jacket sucking on a pipe. Our cliché of authority. But “Millionaire” turns not to the tweedy professor in the glass booth (substitute ivory tower) but rather to the swarming mound of ants in the crowd.
And that’s precisely what we do when we consult Wikipedia. It isn’t an authoritative source in the professor-in-the-booth sense. It’s more lifeline number 3 — hive mind, emergent intelligence, smart mobs, there is no shortage of colorful buzzwords to describe it. We’ve always had lifeline number 2. It’s who you know. The friend or relative on the other end of the phone line. Or think of the whispered exchange between students in the college library reading room, or late-night study in the dorm. Suddenly you need a quick answer, an informal gloss on a subject. You turn to your friend across the table, or sprawled on the couch eating Twizzlers: When was the Glorious Revolution again? Remind me, what’s the Uncertainty Principle?
With Wikipedia, this friend factor is multiplied by an order of millions — the live studio audience of the web. This is the lifeline number 3, or network, model of knowledge. Individual transactions may be less authoritative, pound for pound, paragraph for paragraph, than individual transactions with the professors. But as an overall system to get you through a bit of reading, iron out a wrinkle in a conversation, or patch over a minor factual uncertainty, it works quite well. And being free and informal it’s what we’re more inclined to turn to first, much more about the process of inquiry than the polished result. As Danah Boyd puts it in an excellently measured defense of Wikipedia, it “should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts.” Wikipedia advocates and critics alike ought to acknowledge this distinction.
wikipedia.png So, having acknowledged it, can we then broker a truce between Wikipedia and Britannica? Can we just relax and have the best of both worlds? I’d like that, but in the long run it seems that only one can win, and if I were a betting man, I’d have to bet with Johnson. Britannica is bound for obsolescence. A couple of generations hence (or less), who will want it? How will it keep up with this larger, far more dynamic competitor that is already of roughly equal in quality in certain crucial areas?
Just as the printing press eventually drove the monastic scriptoria out of business, Wikipedia’s free market of knowledge, with all its abuses and irregularities, its palaces and slums, will outperform Britannica’s centralized command economy, with its neat, cookie-cutter housing slabs, its fair, dependable, but ultimately less dynamic, system. But, to stretch the economic metaphor just a little further before it breaks, it’s doubtful that the free market model will remain unregulated for long. At present, the world is beginning to take notice of Wikipedia. A growing number are championing it, but for most, it is more a grudging acknowledgment, a recognition that, for better of for worse, what’s going on with Wikipedia is significant and shouldn’t be ignored.
Eventually we’ll pass from the current phase into widespread adoption. We’ll realize that Wikipedia, being an open-source work, can be repackaged in any conceivable way, for profit even, with no legal strings attached (it already has been on sites like and thousands — probably millions — of spam and link farms). As Lisa intimated in a recent post, Wikipedia will eventually come in many flavors. There will be commercial editions, vetted academic editions, handicap-accessible editions. Darwinist editions, creationist editions. Google, Yahoo and Amazon editions. Or, in the ultimate irony, Britannica editions! (If you can’t beat ’em…)
All the while, the original Wikipedia site will carry on as the sprawling community garden that it is. The place where a dedicated minority take up their clippers and spades and tend the plots. Where material is cultivated for packaging. Right now Wikipedia serves best as an informal lifeline, but soon enough, people will begin to demand something more “authoritative,” and so more will join in the effort to improve it. Some will even make fortunes repackaging it in clever ways for which people or institutions are willing to pay. In time, we’ll likely all come to view Wikipedia, or its various spin-offs, as a resource every bit as authoritative as Britannica. But when this happens, it will no longer be Wikipedia.
Authority, after all, is a double-edged sword, essential in the pursuit of truth, but dangerous when it demands that we stop asking questions. What I find so thrilling about the Wikipedia enterprise is that it is so process-oriented, that its work is never done. The minute you stop questioning it, stop striving to improve it, it becomes a museum piece that tells the dangerous lie of authority. Even those of use who do not take part in the editorial gardening, who rely on it solely as lifeline number 3, we feel the crowd rise up to answer our query, we take the knowledge it gives us, but not (unless we are lazy) without a grain of salt. The work is never done. Crowds can be wrong. But we were not asking for all doubts to be resolved, we wanted simply to keep moving, to keep working. Sometimes authority is just a matter of packaging, and the packaging bonanza will soon commence. But I hope we don’t lose the original Wikipedia — the rowdy community garden, lifeline number 3. A place that keeps you on your toes — that resists tidy packages.

can there be great textbooks without great authors?

Jimmy Wales believes that the Wikibooks project will do for the textbook what Wikipedia did for the encyclopedia; replacing costly printed books with free online content developed by a community of contributors. But will it? Or, more accurately, should it? The open source volunteer format works for encyclopedia entries, which don’t require deep knowledge of a particular subject. But the sustained examination and comprehensive vision required to understand and contextualize a particular subject area is out of reach for most wiki contributors. The communal voice of the open source textbook is also problematic, especially for humanities texts, as it lacks the power of an inspired authoritative narrator. This is not to say that I think open source textbooks are doomed to failure. In fact, I agree with Jimmy Wales that open source textbooks represent an exciting, liberating and inevitable change. But there are some real concerns that we need to address in order to help this format reach its full potential. Including: how to create a coherent narrative out of a chorus of anonymous voices, how to prevent plagiarism, and how to ensure superior scholarship.
To illustrate these points, I’m going to pick on a Wikibook called: Art History. This book won the distinction of “collaboration of the month” for October, which suggests that, within the purview of wikibooks, it represents a superior effort. Because space is limited, I’m only going to examine two passages from Chapter One, comparing the wikibook to similar sections in a traditional art history textbook. Below is the opening paragraph, framing the section on Paleolithic Art and cave paintings, which begins the larger story of art history.

Art has been part of human culture for millenia. Our ancient ancestors left behind paintings and sculptures of delicate beauty and expressive strength. The earliest finds date from the Middle Paleolithic period (between 200,000 and 40,000 years ago), although the origins of Art might be older still, lost to the impermanence of materials.

Compare that to the introduction given by Gardner’s Art Through the Ages (seventh edition):

What Genesis is to the biblical account of the fall and redemption of man, early cave art is to the history of his intelligence, imagination, and creative power. In the caves of southern France and of northern Spain, discovered only about a century ago and still being explored, we may witness the birth of that characteristically human capability that has made man master of his environment–the making of images and symbols. By this original and tremendous feat of abstraction upper Paleolithic men were able to fix the world of their experience, rendering the continuous processes of life in discrete and unmoving shapes that had identity and meaning as the living animals that were their prey.
In that remote time during the last advance and retreat of the great glaciers man made the critical breakthrough and became wholly human. Our intellectual and imaginative processes function through the recognition and construction of images and symbols; we see and understand the world pretty much as we were taught to by the representations of it familiar to our time and place. The immense achievement of Stone Age man, the invention of representation, cannot be exaggerated.

As you can see the wiki book introduction seems rather anemic and uninspired when compared to Gardner’s. The Gardner’s introduction also sets up a narrative arc placing art of this era in the context of an overarching story of human civilization.
I chose Gardner’s Art Through the Ages because it is the classic “Intro to Art History” textbook (75 years old, in its eleventh edition). I bought my copy in high school and still have it. That book, along with my brilliant art history teacher Gretchen Whitman, gave me a lifelong passion for visual art and a deep understanding of its significance in the larger story of western civilization. My tattered but beloved Gardner’s volume still serves me well, some 20 odd years later. Perhaps it is the beauty of the writing, or the solidity of the authorial voice, or the engaging manner in which the “story” of art is told.
Let’s compare another passage; this one describes pictorial techniques employed by stone age painters. First the wikibook:

Another feature of the Lascaux paintings deserves attention. The bulls there show a convention of representing horns that has been called twisted perspective, because the viewer sees the heads in profile but the horns from the front. Thus, the painter’s approach is not strictly or consistently optical. Rather, the approach is descriptive of the fact that cattle have two horns. Two horns are part of the concept “bull.” In strict optical-perspective profile, only one horn would be visible, but to paint the animal in that way would, as it were, amount to an incomplete definition of it.

And now Gardner’s:

The pictures of cattle at Lascaux and elsewhere show a convention of representation of horns that has been called twisted perspective, since we see the heads in profile but the horns from a different angle. Thus, the approach of the artist is not strictly or consistently optical–that is, organized from a fixed-viewpoint perspective. Rather, the approach is descriptive of the fact that cattle have two horns. Two horns would be part of the concepts “cow” or “bull.” In a strict optical-perspective profile only one horn would be visible, but to paint the animal in such a way would, as it were, amount to an incomplete definition of it.

This brings up another very serious problem with open-source textbooks–plagiarism. If the first page of the wikibook-of-the month blatantly rips-off one of the most popular art history books in print and nobody notices, how will Wikibooks be able to police the other 11,000 plus textbooks it intends to sponsor? What will the consequences be if poorly written, plagairized, open-source textbooks become the runaway hit that Wikibooks predicts?