Category Archives: wikipedia

smarter links for a better wikipedia

As Wikipedia continues its evolution, smaller and smaller pieces of its infrastructure come up for improvement. The latest piece to step forward to undergo enhancement: the link. “Computer scientists at the University of Karlsruhe in Germany have developed modifications to Wikipedia’s underlying software that would let editors add extra meaning to the links between pages of the encyclopaedia.” (full article) While this particular idea isn’t totally new (at least one previous attempt has been made: platypuswiki), SemanticWiki is using a high profile digital celebrity, which brings media attention and momentum.
What’s happening here is that under the Wikipedia skin, the SemanticWiki uses an extra bit of syntax in the link markup to inject machine readable information. A normal link in wikipedia is coded like this [link to a wiki page] or [http://www.someothersite.com link to an outside page]. What more do you need? Well, if by “you” I mean humans, the answer is: not much. We can gather context from the surrounding text. But our computers get left out in the cold. They aren’t smart enough to understand the context of a link well enough to make semantic decisions with the form “this link is related to this page this way”. Even among search engine algorithms, where PageRank rules them all, PageRank counts all links as votes, which increase the linked page’s value. Even PageRank isn’t bright enough to understand that you might link to something to refute or denigrate its value. When we write, we rely on judgement by human readers to make sense of a link’s context and purpose. The researchers at Karlsruhe, on the other hand, are enabling machine comprehension by inserting that contextual meaning directly into the links.
SemanticWiki links look just like Wikipedia links, only slightly longer. They include info like

  1. categories: An article on Karlsruhe, a city in Germany, could be placed in the City Category by adding [[Category: City]] to the page.
  2. More significantly, you can add typed relationships. Karlsruhe [[:is located in::Germany]] would show up as Karlsruhe is located in Germany (the : before is located in saves typing). Other examples: in the Washington D.C. article, you can add [[is capital of:: United States of America]]. The types of relationships (“is capital of”) can proliferate endlessly.
  3. attributes, which specify simple properties related to the content of an article without creating a link to a new article. For example, [[population:=3,396,990]]

Adding semantic information to links is a good idea, and hewing closely to the current Wikipedia syntax is a smart tactic. But here’s why I’m not more optimistic: this solution combines the messiness of tagging with the bother of writing machine readable syntax. This combo reminds me of a great Simpsons quote, where Homer says, “Nuts and gum, together at last!” Tagging and semantic are not complementary functions – tagging was invented to put humans first, to relieve our fuzzy brains from the mechanical strictures of machine readable categorization; writing relationships in a machine readable format puts the machine squarely in front. It requires the proliferation of wikipedia type articles to explain each of the typed relationships and property names, which can quickly become unmaintainable by humans, exacerbating the very problem it’s trying to solve.
But perhaps I am underestimating the power of the network. Maybe the dedication of the Wikipedia community can overcome those intractible systemic problems. Through the quiet work of the gardeners who sleeplessly tend their alphanumeric plots, the fact-checkers and passers-by, maybe the SemanticWiki will sprout links with both human and computer sensible meanings. It’s feasible that the size of the network will self-generate consensus on the typology and terminology for links. And it’s likely that if Wikipedia does it, it won’t be long before semantic linking makes its way into the rest of the web in some fashion. If this is a success, I can foresee the semantic web becoming a reality, finally bursting forth from the SemanticWiki seed.
UPDATE:
I left off the part about how humans benefit from SemanticWiki type links. Obviously this better be good for something other than bringing our computers up to a second grade reading level. It should enable computers to do what they do best: sort through massive piles of information in milliseconds.

How can I search, using semantic annotations? – It is possible to search for the entered information in two differnt ways. On the one hand, one can enter inline queries in articles. The results of these queries are then inserted into the article instead of the query. On the other hand, one can use a basic search form, which also allows you to do some nice things, such as picture search and basic wildcard search.

For example, if I wanted to write an article on Acting in Boston, I might want a list of all the actors who were born in Boston. How would I do this now? I would count on the network to maintain a list of Bostonian thespians. But with SemanticWiki I can just add this: <ask>[[Category:Actor]] [[born in::Boston]], which will replace the inline query with the desired list of actors.
To do a more straightforward search I would go to the basic search page. If I had any questions about Berlin, I would enter it into the Subject field. SemanticWiki would return a list of short sentences where Berlin is the subject.
But this semantic functionality is limited to simple constructions and nouns—it is not well suited for concepts like 'politics,' or 'vacation'. One other point: SemanticWiki relationships are bounded by the size of the wiki. Yes, digital encyclopedias will eventually cover a wide range of human knowledge, but never all. In the end, SemanticWiki promises a digital network populated by better links, but it will take the cooperation of the vast human network to build it up.

shirky (and others) respond to lanier’s “digital maoism”

Clay Shirky has written an excellent rebuttal of Jaron Lanier’s wrong-headed critique of collaborative peer production on the Internet: “Digital Maoism: The Hazards of the New Online Collectivism.” Shirky’s response is one of about a dozen just posted on Edge.org, which also published Lanier’s essay.
Shirky begins by taking down Lanier’s straw man, the cliché of the “hive mind,” or mob, that propels collective enterprises like Wikipedia: “…the target of the piece, the hive mind, is just a catchphrase, used by people who don’t understand how things like Wikipedia really work.”
He then explains how they work:

Wikipedia is best viewed as an engaged community that uses a large and growing number of regulatory mechanisms to manage a huge set of proposed edits. “Digital Maoism” specifically rejects that point of view, setting up a false contrast with open source projects like Linux, when in fact the motivations of contributors are much the same. With both systems, there are a huge number of casual contributors and a small number of dedicated maintainers, and in both systems part of the motivation comes from appreciation of knowledgeable peers rather than the general public. Contra Lanier, individual motivations in Wikipedia are not only alive and well, it would collapse without them.

(Worth reading in connection this is Shirky’s well-considered defense of Wkipedia’s new “semi-protection” measures, which some have decried as the death of the Wikipedia dream.)
I haven’t finished reading through all the Edge responses, but was particularly delighted by this one from Fernanda Viegas and Martin Wattenberg, creators of History Flow, a tool that visualizes the revision histories of Wikipedia articles. Building History Flow taught them how to read Wikipedia in a more sophisticated way, making sense of its various “arenas of context” — the “talk” pages and massive edit trails underlying every article. In their Edge note, Viegas and Wattenberg show off their superior reading skills by deconstructing the facile opening of Lanier’s essay, the story of his repeated, and ultimately futile, attempts to fix an innacuracy in his Wikipediated biography.

Here’s a magic trick for you: Go to a long or controversial Wikipedia page (say, “Jaron Lanier”). Click on the tab marked “discussion” at the top. Abracadabra: context!
These efforts can also be seen through another arena of context: Wikipedia’s visible, trackable edit history. The reverts that erased Lanier’s own edits show this process in action. Clicking on the “history” tab of the article shows that a reader — identified only by an anonymous IP address — inserted a series of increasingly frustrated complaints into the body of the article. Although the remarks did include statements like “This is Jaron — really,” another reader evidently decided the anonymous editor was more likely to be a vandal than the real Jaron. While Wikipedia failed this Jaron Lanier Turing test, it was seemingly set up for failure: would he expect the editors of Britannica to take corrections from a random hotmail.com email address? What he didn’t provide, ironically, was the context and identity that Wikipedia thrives on. A meaningful user name, or simply comments on the talk page, might have saved his edits from the axe.

Another respondent, Dan Gillmor, makes a nice meta-comment on the discussion:

The collected thoughts from people responding to Jaron Lanier’s essay are not a hive mind, but they’ve done a better job of dissecting his provocative essay than any one of us could have done. Which is precisely the point.

so you’ve got a discussion going — how do you use it?

Alan Wexelblat has some interesting thoughts up on Copyfight about the GAM3R 7H30RY approach to writing.

Writers, particularly new ones, are often encouraged and bouyed up by physical writer’s groups, in which people co-critique works in progress. Some writing workshops/groups also include lectures from established authors and related well-known people in publishing. In SF/Fantasy, the Clarion SF&F Writers’ Workshop is well known and has graduated a number of folk who have gone on to great success.
So, can this model work online? I’m dubious. One of the things that makes a good writers’ group, and that makes Clarion the success it has been, is a rigorous screening process. You get into these things not just by having good intentions or a lot to say but by having valuable experience and insights to contribute. It’s unclear to me how one filters the mass audience of the Web into something resembling useful wisdom.

This is not a trivial question. Already, it’s all Ken can do to keep a handle on the various feedback loops spinning through the site. Separating the wheat from the chaff requires a great amount of time and attention on top of that. If we had unlimited time and resources, it would be interesting to play with some sort of collaborative filtering system for comments. What if readers had a way of advancing through a series of levels (appropriate to the game theme), gaining credibility as a respondent with each new level attained (like karma in Slashdot). These “advanced” readers would then have more authority to moderate other discussions, sharing some of the burden with the author.

On the other hand, perhaps a workshop is the wrong model. Maybe this is more like the writing of a massive wikipedia entry on games and game theory. One person writes most of it, but the audience participates in the edit and refinement process? It seems like that model might produce something more useful.

This is not headed for anything encyclopedic. Ken is still an individual voice and this book ultimately an expression of his unique critical view (the idea of writing any work of criticism collaboratively, the way one writes a Wikipedia aticle, is a little odd). But Ken is getting useful work out of his readers (who, among other things, are good at spotting typos). There’s definitely some of that wiki work ethic at play.
Another thing he’s after is good testimonials about what it feels like to play these games. We already got a fabulous little description of the experience of Katamari Damacy. Hopefully the first of many. So this is also another way of doing interviews for the book, in the setting most familiar to gamers talking about gaming: an online discussion forum.

wikipedia — mainstream media sighting

In his op-ed piece today, NY Times columnist, Paul Krugman, quotes from the Wikipedia to define conspiracy theory:

A conspiracy theory, says Wikipedia, “attempts to explain the cause of an event as a secret, and often deceptive, plot by a covert alliance.”

This is the first time I’ve seen the Wikipedia used as an authoritative reference in the Times or any other major media outlet.

defining the networked book: a few thoughts and a list

The networked book, as an idea and as a term, has gained currency of late. A few weeks ago, Farrar Straus and Giroux launched Pulse , an adventurous marketing experiment in which they are syndicating the complete text of a new nonfiction title in blog, RSS and email. Their web developers called it, quite independently it seems, a networked book. Next week (drum roll), the institute will launch McKenzie Wark’s “GAM3R 7H30RY,” an online version of a book in progress designed to generate a critical networked discussion about video games. And, of course, the July release of Sophie is fast approaching, so soon we’ll all be making networked books.

screencap.gif

The institue will launch McKenzie Wark’s GAM3R 7H30RY Version 1.1 on Monday, May 15

The discussion following Pulse highlighted some interesting issues and made us think hard about precisely what it is we mean by “networked book.” Last spring, Kim White (who was the first to posit the idea of networked books) wrote a paper for the Computers and Writing Online conference that developed the idea a little further, based on our experience with the Gates Memory Project, where we tried to create a collaborative networked document of Christo and Jeanne-Claude’s Gates using popular social software tools like Flickr and del.icio.us. Kim later adapted parts of this paper as a first stab at a Wikipedia article. This was a good start.
We thought it might be useful, however, in light of recent discussion and upcoming ventures, to try to focus the definition a little bit more — to create some useful boundaries for thinking this through while holding on to some of the ambiguity. After a quick back-and-forth, we came up with the following capsule definition: “a networked book is an open book designed to be written, edited and read in a networked environment.”
Ok. Hardly Samuel Johnson, I know, but it at least begins to lay down some basic criteria. Open. Designed for the network. Still vague, but moving in a good direction. Yet already I feel like adding to the list of verbs “annotated” — taking notes inside a text is something we take for granted in print but is still quite rare in electronic documents. A networked book should allow for some kind of reader feedback within its structure. I would also add “compiled,” or “assembled,” to account for books composed of various remote parts — either freestanding assets on distant databases, or sections of text and media “transcluded” from other documents. And what about readers having conversations inside the book, or across books? Is that covered by “read in a networked environment”? — the book in a peer-to-peer ecology? Also, I’d want to add that a networked book is not a static object but something that evolves over time. Not an intersection of atoms, but an intersection of intentions. All right, so this is a little complicated.
It’s also possible that defining the networked book as a new species within the genus “book” sows the seeds of its own eventual obsolescence, bound, as we may well be, toward a post-book future. But that strikes me as too deterministic. As Dan rightly observed in his recent post on learning to read Wikipedia, the history of media (or anything for that matter) is rarely a direct line of succession — of this replacing that, and so on. As with the evolution of biological life, things tend to mutate and split into parallel trajectories. The book as the principal mode of discourse and cultural ideal of intellectual achievement may indeed be headed for gradual decline, but we believe the network has the potential to keep it in play far longer than the techno-determinists might think.
But enough with the theory and on to the practice. To further this discussion, I’ve compiled a quick-and-dirty list of projects currently out in the wild that seem to be reasonable candidates for networked bookdom. The list is intentionally small and ridden with gaps, the point being not to create a comprehensive catalogue, but to get a conversation going and collect other examples (submitted by you) of networked books, real or imaginary.

*     *     *     *     *

Everyone here at the institute agrees that Wikipedia is a networked book par excellence. A vast, interwoven compendium of popular knowledge, never fixed, always changing, recording within its bounds each and every stage of its growth and all the discussions of its collaborative producers. Linked outward to the web in millions of directions and highly visible on all the popular search indexes, Wikipedia is a city-like book, or a vast network of shanties. If you consider all its various iterations in 229 different languages it resembles more a pan-global tradition, or something approaching a real-life Hitchhiker’s Guide to the Galaxy. And it is only five years in the making.
But already we begin to run into problems. Though we are all comfortable with the idea of Wikipedia as a networked book, there is significant discord when it comes to Flickr, MySpace, Live Journal, YouTube and practically every other social software, media-sharing community. Why? Is it simply a bias in favor of the textual? Or because Wikipedia – the free encyclopedia — is more closely identified with an existing genre of book? Is it because Wikipedia seems to have an over-arching vision (free, anyone can edit it, neutral point of view etc.) and something approaching a coherent editorial sensibility (albeit an aggregate one), whereas the other sites just mentioned are simply repositories, ultimately shapeless and filled with come what may? This raises yet more questions. Does a networked book require an editor? A vision? A direction? Coherence? And what about the blogosphere? Or the world wide web itself? Tim O’Reilly recently called the www one enormous ebook, with Google and Yahoo as the infinitely mutable tables of contents.
Ok. So already we’ve opened a pretty big can of worms (Wikipedia tends to have that effect). But before delving further (and hopefully we can really get this going in the comments), I’ll briefly list just a few more experiments.
>>> Code v.2 by Larry Lessig
From the site:

“Lawrence Lessig first published Code and Other Laws of Cyberspace in 1999. After five years in print and five years of changes in law, technology, and the context in which they reside, Code needs an update. But rather than do this alone, Professor Lessig is using this wiki to open the editing process to all, to draw upon the creativity and knowledge of the community. This is an online, collaborative book update; a first of its kind.
“Once the project nears completion, Professor Lessig will take the contents of this wiki and ready it for publication.”

Recently discussed here, there is the new book by Yochai Benkler, another intellectual property heavyweight:
>>> The Wealth of Networks
Yale University Press has set up a wiki for readers to write collective summaries and commentaries on the book. PDFs of each chapter are available for free. The verdict? A networked book, but not a well executed one. By keeping the wiki and the text separate, the publisher has placed unnecessary obstacles in the reader’s path and diminished the book’s chances of success as an organic online entity.
>>> Our very own GAM3R 7H30RY
On Monday, the institute will launch its most ambitious networked book experiment to date, putting an entire draft of McKenzie Wark’s new book online in a compelling interface designed to gather reader feedback. The book will be matched by a series of free-fire discussion zones, and readers will have the option of syndicating the book over a period of nine weeks.
>>> The afore-mentioned Pulse by Robert Frenay.
Again, definitely a networked book, but frustratingly so. In print, the book is nearly 600 pages long, yet they’ve chosen to serialize it a couple pages at a time. It will take readers until November to make their way through the book in this fashion — clearly not at all the way Frenay crafted it to be read. Plus, some dubious linking made not by the author but by a hired “linkologist” only serves to underscore the superficiality of the effort. A bold experiment in viral marketing, but judging by the near absence of reader activity on the site, not a very contagious one. The lesson I would draw is that a networked book ought to be networked for its own sake, not to bolster a print commodity (though these ends are not necessarily incompatible).
>>> The Quicksilver Wiki (formerly the Metaweb)
A community site devoted to collectively annotating and supplementing Neal Stephenson’s novel “Quicksilver.” Currently at work on over 1,000 articles. The actual novel does not appear to be available on-site.
>>> Finnegans Wiki
A complete version of James Joyce’s demanding masterpiece, the entire text placed in a wiki for reader annotation.
>>> There’s a host of other literary portals, many dating back to the early days of the web: Decameron Web, the William Blake Archive, the Walt Whitman Archive, the Rossetti Archive, and countless others (fill in this list and tell us what you think).
Lastly, here’s a list of book blogs — not blogs about books in general, but blogs devoted to the writing and/or discussion of a particular book, by that book’s author. These may not be networked books in themselves, but they merit study as a new mode of writing within the network. The interesting thing is that these sites are designed to gather material, generate discussion, and build a community of readers around an eventual book. But in so doing, they gently undermine the conventional notion of the book as a crystallized object and begin to reinvent it as an ongoing process: an evolving artifact at the center of a conversation.
Here are some I’ve come across (please supplement). Interestingly, three of these are by current or former editors of Wired. At this point, they tend to be about techie subjects:
>>> An exception is Without Gods: Toward a History of Disbelief by Mitchell Stephens (another institute project).

“The blog I am writing here, with the connivance of The Institute for the Future of the Book, is an experiment. Our thought is that my book on the history of atheism (eventually to be published by Carroll and Graf) will benefit from an online discussion as the book is being written. Our hope is that the conversation will be joined: ideas challenged, facts corrected, queries answered; that lively and intelligent discussion will ensue. And we have an additional thought: that the web might realize some smidgen of benefit through the airing of this process.”

>>> Searchblog
John Battelle’s daily thoughts on the business and technology of web search, originally set up as a research tool for his now-published book on Google, The Search.
>>> The Long Tail
Similar concept, “a public diary on the way to a book” chronicling “the shift from mass markets to millions of niches.” By current Wired editor-in-chief Chris Anderson.
>>> Darknet
JD Lasica’s blog on his book about Hollywood’s war against amateur digital filmmakers.
>>> The Technium
Former Wired editor Kevin Kelly is working through ideas for a book:

“As I write I will post here. The purpose of this site is to turn my posts into a conversation. I will be uploading my half-thoughts, notes, self-arguments, early drafts and responses to others’ postings as a way for me to figure out what I actually think.”

>>> End of Cyberspace by Alex Soojung-Kim Pang
Pang has some interesting thoughts on blogs as research tools:

“This begins to move you to a model of scholarly performance in which the value resides not exclusively in the finished, published work, but is distributed across a number of usually non-competitive media. If I ever do publish a book on the end of cyberspace, I seriously doubt that anyone who’s encountered the blog will think, “Well, I can read the notes, I don’t need to read the book.” The final product is more like the last chapter of a mystery. You want to know how it comes out.
“It could ultimately point to a somewhat different model for both doing and evaluating scholarship: one that depends a little less on peer-reviewed papers and monographs, and more upon your ability to develop and maintain a piece of intellectual territory, and attract others to it– to build an interested, thoughtful audience.”

180px-Talmud.png

*     *     *     *     *

This turned out much longer than I’d intended, and yet there’s a lot left to discuss. One question worth mulling over is whether the networked book is really a new idea at all. Don’t all books exist over time within social networks, “linked” to countless other texts? What about the Talmud, the Jewish compendium of law and exigesis where core texts are surrounded on the page by layers of commentary? Is this a networked book? Or could something as prosaic as a phone book chained to a phone booth be considered a networked book?
In our discussions, we have focused overwhelmingly on electronic books within digital networks because we are convinced that this is a major direction in which the book is (or should be) heading. But this is not to imply that the networked book is born in a vacuum. Naturally, it exists in a continuum. And just as our concept of the analog was not fully formed until we had the digital to hold it up against, perhaps our idea of the book contains some as yet undiscovered dimensions that will be revealed by investigating the networked book.

learning to read

Two Girls Reading in a Garden by RenoirSomebody interviewed Bob for a documentary a few months ago. I don’t remember who this was, because I was in the other room busy with something else, but I was half-listening to what was being discussed: how the book is changing, what precisely the Institute does, in short, what we discuss from day to day on this blog. One statement captured my ear: Bob offhandedly declared that “we don’t really know how to read Wikipedia yet”. I made a note of it at the time; since then I’ve been periodically pulling his statement out at idle moments and rolling it over and over in my mind like a pebble in my pocket, trying to decide exactly what it could mean.

There’s something appealing to me about the flatness of the statement: “We don’t really know how to read Wikipedia yet.” It’s obvious but revelatory: the reason that we find the Wikipedia frustrating is that we need to learn how to read it. (By we I mean the reading public as a whole. Perhaps you have; judging from the arguments that fly back and forth, it would seem that the majority of us haven’t.) The problem is, of course, that so few people actually bother to state this sort of thing directly and then to unpack the repercussions of it.

What’s there to learn in reading the Wikipedia? Let’s start with a sample sentence from the entry on Marcel Proust:

In addition to the grief that attended his mother’s death, Proust’s life changed due to a very large inheritance (in today’s terms, a principal of about $6 million, with a monthly income of about $15,000).

Criticizing the Wikipedia for being poorly written is like shooting fish in a barrel, but bear with my lack of sportsmanship for a second. Imagine that you found the above sentence in a printed reference work. A printed reference book that seems to be written in the voice of a sixth grade student deeply interested in matters financial might worry you. It would worry me. It’s worried many critics of the Wikipedia, who point out that this clearly isn’t the sort of manicured prose we’re used to reading in books and magazines.

But this prose is also conceptually different. A Wikipedia article is not constructed in the same way that a magazine article is written. Nor is the content of a Wikipedia article at one particular instant in time – content that has probably been different, and might certainly change – analogous to the content of a print magazine article, which is always, from the moment of printing, exactly the same. If we are to keep using the Wikipedia, we’ll have to get used to the solecisms endemic there; we’ll also need to readjust they way we give credence to media. (Right now I’m going to tiptoe around the issue of text and authority, which is of course an enormous can of worms that I’d prefer not to open right now.) But there’s a reason that the above quotation shouldn’t be that worrying: it’s entirely possible, and increasingly probable as time goes on, that when you click the link above, you won’t be able to find the sentence I quoted.

This faith in the long run isn’t an easy thing, however. When we read Wikipedia we tend to apply to it the standards of judgment that we would apply to a book or magazine, and it often fails by these standards, as might be expected. When we’re judging Wikipedia this way, we presuppose that we know what it is formally: that it’s the same sort of thing as the texts we know. This seems arrogant: why should we assume that we already know how to read something that clearly behaves differently from the text we’re used to? We shouldn’t, though we do: it’s a human response to compare something new to something we already know, but often when we do this, we miss major formal differences.

Horseless Carriage Land, 1961This isn’t the best way to read something new. It’s akin to the “horseless carriage” analogy that Ben’s used: when you think of a car as a carriage without a horse, you miss whatever it is that makes a car special. But there’s a problem with that metaphor, in that it carries with it ideas of displacement. Evolution is often perceived as being transformative: one thing turns into, and is then replaced by, another, as the horse was replaced by the car for purposes of transportation. But it’s usually more of a splitting: there’s a new species as well as the old species from which it sprung. The old species may go extinct, or it may not. To finish that example: we still have horses.

Figuratively, what’s happened with the Wikipedia is that a new species of text has arisen and we’re still wondering why it won’t eat the apples we’re proffering it. The Wikipedia hasn’t replaced print encyclopedias; in all probability, the two will coexist for a while. But I don’t think we yet know how to read Wikipedia. We judge it by what we’re used to, and everyone loses. Were you to judge a car by a horse’s attributes, you wouldn’t expect to have an oil crisis in a century.

Perhaps a useful way to think about this: a few paragraphs of Proust, found on a trip through In Search of Lost Time with Bob’s statement bouncing around my head. The Guermantes Way, the third part of the book, feels like the longest: much of this volume is about failing to recognize how things really are. Proust’s hapless narrator alternately recognizes his own mistakes of judgment and makes new ones for six hundred pages, with occasional flashes of insight, like this reflection:

Thieves in the Night by Fromentin. . . . There was a time when people recognized things easily when they were depicted by Fromentin and failed to recognize them at all when they were painted by Renoir.

Today people of taste tell us that Renoir is a great eighteenth-century painter. But when they say this they forget Time, and that it took a great deal of time, even in the middle of the nineteenth century, for Renoir to be hailed as a great artist. To gain this sort of recognition, an original painter or an original writer follows the path of the occultist. His painting or his prose acts upon us like a course of treatment that is not always agreeable. When it is over, the practitioner says to us, “Now look.” And at this point the world (which was not created once and for all, but as often as an original artist is born) appears utterly different from the one we knew, but perfectly clear. Women pass in the street, different from those we used to see, because they are Renoirs, the same Renoirs we once refused to see as women. The carriages are also Renoirs, and the water, and the sky: we want to go for a walk in a forest like the one that, when we first saw it, was anything but a forest – more like a tapestry, for instance, with innumerable shades of color but lacking precisely the colors appropriate to forests. Such is the new and perishable universe that has just been created. It will last until the next geological catastrophe unleashed by a new painter or writer with an original view of the world.

(The Guermantes Way, pp.323–325, trans. Mark Treharne.) There’s an obvious comparison to be made here, which I won’t belabor. Wikipedia isn’t Renoir, and its entry for poor Eugène Fromentin, whose paintings are probably better left forgotten, is cribbed from the 1911 Encyclopædia Britannica. But like the gallery-goers who needed to learn to look at Renoir, we need to learn to read Wikipedia, to read it as a new form that certainly inherits some traits from what we’re used to reading, but one that differs in fundamental ways. That’s a process that’s going to take time.

another round: britannica versus wikipedia

britannica-to-wikipediasm.jpg The Encyclopedia Britannica versus Wikipedia saga continues. As Ben has recently posted, Britannica has been confronting Nature on its article which found that the two encyclopedias were fairly equal in the accuracy of their science articles. Today, the editors and the board of directors of Encyclopedia Britannica, have taken out a half page ad in today New York Times (A19) to present an open letter to Nature which requests for a public reaction of the article.
Several interesting things are going on here. Because Britannica chose to place an ad in the Times, it shifted the argument and debate away from the peer review / editorial context into one of rhetoric and public relations. Further, their conscious move to take the argument to the “public” or the “masses” with an open letter is ironic because the New York TImes does not display its print ads online, therefore access of the letter is limited to the Time’s print readership. (Not to mention, the letter is addressed to the Nature Publishing Group located in London. If anyone knows that a similar letter was printed in the UK, please let us know.) Readers here can click on the thumbnail image to read the entire text of the letter. Ben raised an interesting question here today, asking where one might post a similar open letter on the Internet.
Britannica cites many important criticisms of Nature’s article, including: using text not from Britannica, using excerpts out of context, giving equal weight to minor and major errors, and writing a misleading headline. If their accusations are true, then Nature should redo the study. However, to harp upon Nature’s methods is to miss the point. Britannica cannot do anything to stop Wikipedia, except to try to discredit to this study. Disproving Nature’s methodology will have a limited effect on the growth of Wikipedia. People do not mind that Wikipedia is not perfect. The JKF assassination / Seigenthaler episode showed that. Britannica’s efforts will only lead to more studies, which will inevitably will show errors in both encyclopedias. They acknowledge in today’s letter that, “Britannica has never claimed to be error-free.” Therefore, they are undermining their own authority, as people who never thought about the accuracy of Britannica are doing just that now. Perhaps, people will not mind that Britannica contains errors as well. In their determination to show the world that of the two encyclopedias which both content flaws, they are also advertising that of the two, the free one has some-what more errors.
In the end, I agree with Ben’s previous post that the Nature article in question has a marginal relevance to the bigger picture. The main point is that Wikipedia works amazingly well and contains articles that Britannica never will. It is a revolutionary way to collaboratively share knowledge. That we should give consideration to the source of our information we encounter, be it the Encyclopedia Britannica, Wikipedia, Nature or the New York Time, is nothing new.

britannica bites back (do we care?)

Www.wikipedia.org_screenshot.png britannica header.gif Late last year, Nature Magazine let loose a small shockwave when it published results from a study that had compared science articles in Encyclopedia Britannica to corresponding entries in Wikipedia. Both encyclopedias, the study concluded, contain numerous errors, with Britannica holding only a slight edge in accuracy. Shaking, as it did, a great many assumptions of authority, this was generally viewed as a great victory for the five-year-old Wikipedia, vindicating its model of decentralized amateur production.
Now comes this: a document (download PDF) just published on the Encyclopedia Britannica website claims that the Nature study was “fatally flawed”:

Almost everything about the journal’s investigation, from the criteria for identifying inaccuracies to the discrepancy between the article text and its headline, was wrong and misleading.

What are we to make of this? And if Britannica’s right, what are we to make of Nature? I can’t help but feel that in the end it doesn’t matter. Jabs and parries will inevitably be exchanged, yet Wikipedia continues to grow and evolve, containing multitudes, full of truth and full of error, ultimately indifferent to the censure or approval of the old guard. It is a fact: Wikipedia now contains over a million articles in english, nearly 223 thousand in Polish, nearly 195 thousand in Japanese and 104 thousand in Spanish; it is broadly consulted, it is free and, at least for now, non-commercial.
At the moment, I feel optimistic that in the long arc of time Wikipedia will bend toward excellence. Others fear that muddled mediocrity can be the only result. Again, I find myself not really caring. Wikipedia is one of those things that makes me hopeful about the future of the web. No matter how accurate or inaccurate it becomes, it is honest. Its messiness is the messiness of life.

presidents’ day

Few would disagree that Presidents’ Day, though in theory a celebration of the nation’s highest office, is actually one of our blandest holidays — not so much about history as the resuscitation of commerce from the post-holiday slump. Yesterday, however, brought a refreshing change.

dolley madison.jpg
Daguerreotype of Dolley Madison

Spending the afternoon at the institute was Holly Shulman, a historian from the University of Virginia well known among digital scholarship circles as the force behind the Dolley Madison Project — a comprehensive online portal to the life, letters and times of one of the great figures of the early American republic. So, for once we actually talked about presidential history on Presidents’ Day — only, in this case from the fascinating and chronically under-studied spousal perspective.
Shulman came to discuss possible collaboration on a web-based history project that would piece together the world of America’s founding period — specifically, as experienced and influenced by its leading women. The question, in terms of form, was how to break out of the mould of traditional web archives, which tend to be static and exceedingly hierarchical, and tap more fully into the energies of the network? We’re talking about something you might call open source scholarship — new collaborative methods that take cues from popular social software experiments like Wikipedia, Flickr and del.icio.us yet add new layers and structures that would better ensure high standards of scholarship. In other words: the best of both worlds.
Shulman lamented that the current generation of historians are highly resistant to the idea of electronic publication as anything more than supplemental to print. Even harder to swallow is the open ethos of Wikipedia, commonly regarded as a threat to the hierarchical authority and medieval insularity of academia.
Again, we’re reminded of how fatally behind the times the academy is in terms of communication — both communication among scholars and with the larger world. Shulman’s eyes lit up as we described the recent surge on the web of social software and bottom-up organizational systems like tagging that could potentially create new and unexpected avenues into history.
A small example that recurred in our discussion: Dolley Madison wrote eloquently on grief, mourning and widowhood, yet few would know to seek out her perspective on these matters. Think of how something like tagging, still in an infant stage of development, could begin to solve such a problem, helping scholars, students and general readers unlock the multiple facets of complex historical figures like Madison, and deepening our collective knowledge of subjects — like death and war — that have historically been dominated by men’s accounts. It’s a small example, but points toward something grand.

the value of voice

We were discussing some of the core ideas that circulate in the background of the Institute and flow in and around the projects we work on—Sophie, nexttext, Thinking Out Loud—and how they contrast with Wikipedia (and other open-content systems). We seem obsessed with Wikipedia, I know, but it presents us with so many points to contrast with traditional styles of authorship and authority. Normally we’d make a case for Wikipedia, the quality of content derived from mass input, and the philosophical benefits of openness. Now though, I’d like to step back just a little ways and make a case for the value of voice.

65986930_153b214708_m.jpg
A beautiful sunset by curiouskiwi. One individual’s viewpoint.

Presumably the proliferation of blogs and self-publishing indicates that the cultural value of voice is not in any danger of being swallowed by collaborative mass publishing. On the other hand, the momentum surrounding open content and automatic recombination is discernibly mounting to challenge the author’s historically valued perch.
I just want to note that voice is not the same as authority. We’ve written about the crossover between authorship and authority here, here, and here. But what we talked about yesterday was not authority—rather, it was a discussion about the different ethos that a work has when it is imbued with a recognizable voice.
Whether the devices employed are thematic, formal, or linguistic, the individual crafts a work that is centripetal, drawing together in your mind even if the content is wide-ranging. This is the voice, the persona that enlivens pages of text with feeling. At an emotional level, the voice is the invisible part of the work that we identify and connect with. At a higher level, voice is the natural result of the work an author has put effort into researching and collating the information.
Open systems naturally struggle to develop the singular voice of highly authored work. An open system’s progress relies on rules to manage the continual process of integrating content written by different contributors. This gives open works a mechanical sensibility, which works best with fact-based writing and a neutral point of view. Wikipedia, as a product, has a high median standard for quality. But that quality is derived at the expense of distinctive voices.

50 people see the sunset
50 beautiful sunsets, programatically collapsed into a single image. By brevity and flickr.

This is not to say that Wikipedia is without voice. I think most people would recognize a Wikipedia article (or, really, any encyclopedia article) by its broad brush strokes and purposeful disengagement with the subject matter. And this is the fundamental point of divide. An individual’s work is in intimate dialogue with the subject matter and the reader. The voice is the unique personality in the work.
Both approaches are important, and we at the Institute hope to navigate the territory between them by helping authors create texts equipped for openness, by exploring boundaries of authorship, and by enabling discourse between authors and audiences in a virtuous circle. We encourage openness, and we like it. But we cannot underestimate the enduring value of individual voice in the infinite digital space.