Monthly Archives: August 2006

the trouble with wikis in china

I’ve just been reading about this Chinese online encyclopedia, modeled after Wikipedia, called “e-Wiki,” which last month was taken offline by its owner under pressure from the PRC government. Reporters Without Borders and The Sydney Morning Herald report that it was articles on Taiwan and the Falun Gong (more specifically, an article on an activist named James Lung with some connection to FG) that flagged e-Wiki for the censors.

baidu logo.gif

Baidu: the heavy paw of the state.

Meanwhile, “Baidupedia,” the user-written encyclopedia run by leading Chinese search engine Baidu is thriving, with well over 300,000 articles created since its launch in April. Of course, “Baidu Baike,” as the site is properly called, is heavily censored, with all edits reviewed by invisible behind-the-scenes administrators before being published.
Wikipedia’s article on Baidu Baike points out the following: “Although the earlier test version was named ‘Baidu WIKI’, the current version and official media releases say the system is not a wiki system.” Which all makes sense: to an authoritarian, wikis, or anything that puts that much control over information in the hands of the masses, is anathema. Indeed, though I can’t read Chinese, looking through it, pages on Baidu Baike do not appear to have the customary “edit” links alongside sections of text. Rather, there’s a text entry field at the bottom of the page with what seems to be a submit button. There’s a big difference between a system in which edits are submitted for moderation and a totally open system where changes have to be managed, in the open, by the users themselves.
All of which underscores how astonishingly functional Wikipedia is despite its seeming vulnerability to chaotic forces. Wikipedia truly is a collectively owned space. Seeing how China is dealing with wikis, or at least, with their most visible cultural deployment, the collective building of so-called “reliable knowledge,” or encyclopedias, underscores the political implications of this oddly named class of web pages.
Dan, still reeling from three days of Wikimania, as well as other meetings concerning MIT’s One Laptop Per Child initiative, relayed the fact that the word processing software being bundled into the 100-dollar laptops will all be wiki-based, putting the focus on student collaboration over mesh networks. This may not sound like such a big deal, but just take a moment to ponder the implications of having all class writing assignments being carried out wikis. The different sorts of skills and attitudes that collaborating on everything might nurture. There a million things that could go wrong with the One Laptop Per Child project, but you can’t accuse its developers of lacking bold ideas about education.
But back to the Chinese. An odd thing remarked on the talk page of the Wikipedia article is that Baidu Baike actually has an article about Wikipedia that includes more or less truthful information about Wikipedia’s blockage by the Great Firewall in October ’05, as well as other reasonably accurate, and even positive, descriptions of the site. Wikipedia contributor Miborovsky notes:

Interestingly enough, it does a decent explanation of WP:NPOV (Wikipedia’s Neutral Point of View policy) and paints Wikipedia in a positive light, saying “its activities precisely reflects the web-culture’s pluralism, openness, democractic values and anti-authoritarianism.”

But look for Wikipedia on Baidu’s search engine (or on Google, Yahoo and MSN’s Chinese sites for that matter) and you’ll get nothing. And there’s no e-Wiki to be found.

u.c. offers up stacks to google

APTFrontPage.jpg
The APT BookScan 1200. Not what Google and OCA are using (their scanners are human-assisted), just a cool photo.

Less than two months after reaching a deal with Microsoft, the University of California has agreed to let Google scan its vast holdings (over 34 million volumes) into the Book Search database. Google will undoubtedly dig deeper into the holdings of the ten-campus system’s 100-plus libraries than Microsoft, which is a member of the more copyright-cautious Open Content Alliance, and will focus primarily on books unambiguously in the public domain. The Google-UC alliance comes as major lawsuits against Google from the Authors Guild and Association of American Publishers are still in the evidence-gathering phase.
Meanwhile, across the drink, French publishing group La Martiniè re in June brought suit against Google for “counterfeiting and breach of intellectual property rights.” Pretty much the same claim as the American industry plaintiffs. Later that month, however, German publishing conglomerate WBG dropped a petition for a preliminary injunction against Google after a Hamburg court told them that they probably wouldn’t win. So what might the future hold? The European crystal ball is murky at best.
During this period of uncertainty, the OCA seems content to let Google be the legal lightning rod. If Google prevails, however, Microsoft and Yahoo will have a lot of catching up to do in stocking their book databases. But the two efforts may not be in such close competition as it would initially seem.
Google’s library initiative is an extremely bold commercial gambit. If it wins its cases, it stands to make a great deal of money, even after the tens of millions it is spending on the scanning and indexing the billions of pages, off a tiny commodity: the text snippet. But far from being the seed of a new literary remix culture, as Kevin Kelly would have us believe (and John Updike would have us lament), the snippet is simply an advertising hook for a vast ad network. Google’s not the Library of Babel, it’s the most sublimely sophisticated advertising company the world has ever seen (see this funny reflection on “snippet-dangling”). The OCA, on the other hand, is aimed at creating a legitimate online library, where books are not a means for profit, but an end in themselves.
Brewster Kahle, the founder and leader of the OCA, has a rather immodest aim: “to build the great library.” “That was the goal I set for myself 25 years ago,” he told The San Francisco Chronicle in a profile last year. “It is now technically possible to live up to the dream of the Library of Alexandria.”
So while Google’s venture may be more daring, more outrageous, more exhaustive, more — you name it –, the OCA may, in its slow, cautious, more idealistic way, be building the foundations of something far more important and useful. Plus, Kahle’s got the Bookmobile. How can you not love the Bookmobile?

clifford lynch takes on computation and open access

Academic Commons mentions that Clifford Lynch has written a chapter, entitled, “Open Computation: Beyond Human-Reader-Centric Views of Scholarly Literatures” in an upcoming book on open access edited by Neil Jacobs of the Joint Information Committee. His chapter, which is available online, looks at the potential computational analyses that could be formed by collecting scholarly literature into a digital repository. These “large scholarly literature corpora” would be openly accessible and used for new branches of research currently not possible.
He takes cues from the current work in text mining and large scale collections of scholarly documents, such as the Persus Digital Library hosted by Tufts Unviersity. Lynch also acknowledges the skepticism that many scholars hold to the value of text mining analysis in the humanities. Further, he discusses the limitations that current intellectual property regimes place on the creation of a large, accessible scholarly corpora. Although many legal and technical obstacles exist, his proposal does seem more feasible than something like Ted Nelson’s Project Xanadu because the corpora he describes have boundaries, as well as supporters who believe that these bodies of literature should be accessible.
Small scale examples show the challenges Lynch’s proposal faces. I am reminded of the development of meta-analysis in the field of statistics. Although the term meta-analysis is much older, the contemporary usage refers to statistical techniques developed in the 1970s to aggregate results from a group of studies. These techniques are particularly popular in the medical research and the public health sciences (often because their data sets are small.) Thirty years on, these methods are frequently used and their resulted published. However, the methods are still questioned in certain circles.
Gene Glass gives a good overview of meta-analysis, concluding with a reflection on how the criticisms of its use reveal insights on fundamental problems with research in his field of education research. He notes the difference in the “fundamental unit” of his research, which is a study, versus physics, which is lower level, accessible and generalizable. Here, even taking a small step back reveals new insights on the fundamentals of his scholarship.
Lynch speculates on how the creation of corpora might play out, but he doesn’t dwell on the macro questions that we might investigate. Perhaps it is premature to think about these ideas, but the possible directions of inquiry are what lingered in my mind after reading Lynch’s chapter.
I am struck by the challenge of graphically representing the analysis of these corpora. Like the visualizations of the blogosphere, these technologies could not only analyze the network of citations, but also word choice and textual correlations. Moreover, how does the body of literature change over time and space, as ideas and thoughts emerge or fall out of favor. In the humanities, can we graphically represent theoretical shifts from structuralist to post-structuralist thought, or the evolution from pre-feminist to feminist to post-feminist thought? What effect did each of these movements have on each other over time?
The opportunity also exists of exploring the possible ways of navigating corpora of this size. Using the metaphor of Google Earth, where one can zoom in from the entire Earth down to a single home, what can we gain from being able to view the sphere of scholarly literature in such a way? Glass took one step back to analyze groups of studies, and found insight on the nature of education research. What are the potential insights can we learn from viewing the entire corpus of scholarly knowledge from above?
Lynch describes expanding our analysis beyond the human scale. Even if his proposal never reaches fruition, his thought experiments revealed (at least to me) how knowledge acquisition occurs over a multidimensional spectrum. You can have a close reading of a text or merely skim the first sentence of each paragraph. Likewise, you can read an encyclopedia entry on a field of study or spend a year reading 200 books to prepare for a doctoral qualifying exam. However, as people, we have limits to the amount of information we can comprehend and analyze.
Purists will undoubtedly frown upon the use of computation that cannot be replicated by humans in scholarly research. Another example is the use of computational for solving proofs in mathematics, which is still controversial. The humanities will be no different, if not more so. A close reading of certain texts will always be important, however the future that Lynch offers just may give that close reading an entirely new context and understanding. One of the great things about inquiry is that sometimes you do not know where you will end up until you get there.

jaron lanier’s essay on “the hazards of the new online collectivism”

In late May John Brockman’s Edge website published an essay by Jaron Lanier“Digital Maoism: The Hazards of the New Online Collectivism”. Lanier’s essay caused quite a flurry of comment both pro and con. Recently someone interested in the work of the Institute asked me my opinion. I thought that in light of Dan’s reportage from the Wikimania conference in Cambridge i would share my thoughts about Jaron’s critique of Wikipedia . . .
I read the article the day it was first posted on The Edge and thought it so significant and so wrong that I wrote Jaron asking if the Institute could publish a version in a form similar to Gamer Theory that would enable readers to comment on specific passages as well as on the whole. Jaron referred me to John Brockman (publisher of The Edge), who although he acknowledged the request never got back to us with an answer.
From my perspective there are two main problems with Jaron’s outlook.
a) Jaron misunderstands the Wikipedia. In a traditional encyclopedia, experts write articles that are permanently encased in authoritative editions. The writing and editing goes on behind the scenes, effectively hiding the process that produces the published article. The standalone nature of print encyclopedias also means that any discussion about articles is essentially private and hidden from collective view. The Wikipedia is a quite different sort of publication, which frankly needs to be read in a new way. Jaron focuses on the “finished piece”, ie. the latest version of a Wikipedia article. In fact what is most illuminative is the back-and-forth that occurs between a topic’s many author/editors. I think there is a lot to be learned by studying the points of dissent; indeed the “truth” is likely to be found in the interstices, where different points of view collide. Network-authored works need to be read in a new way that allows one to focus on the process as well as the end product.
b) At its core, Jaron’s piece defends the traditional role of the independent author, particularly the hierarchy that renders readers as passive recipients of an author’s wisdom. Jaron is fundamentally resistant to the new emerging sense of the author as moderator — someone able to marshal “the wisdom of the network.”
I also think it is interesting that Jaron titles his article Digital Maoism, with which he hopes to tar the Wikipedia with the brush of bottom-up collectivism. My guess is that Jaron is unaware of Mao’s famous quote: “truth emerges in the course of struggle [around ideas]”. Indeed, what I prize most about the Wikipedia is that it acknowledges the messiness of knowledge and the process by which useful knowledge and wisdom accrete over time.

harpercollins takes on online book browsing

In general, people in the US do not seem to be reading a lot of books, with one study citing that 80% of US families did not buy or read a book last year. People are finding their information in other ways. Therefore it is not surprising that HarpersCollins announced it “Browse Inside” feature, which to allows people to view selected pages from books by ten leading authors, including Michael Crichton and C.S. Lewis. They compare this feature with “Google Book Search” and Amazon’s “Search Inside.”
The feature is much closer to “Search Inside” than “Google Book Search.” Although Amazon.com has a nice feature “Surprise Me” which comes closer to replicating the experience of flipping randomly to a page in a book off the shelf. Of course “Google Book Search” actually lets you search the book and comes the closest to giving people the experiences of browsing through books in a physical store.
In the end, HarperCollins’ feature is more like a movie trailer. That is, readers get a selected pages to view that were pre-detereminded. This is nothing like the experience of randomly opening a book, or going to the index to make sure the book covers the exact information you need. The press release from HarperCollins states that they will be rolling out additional features and content for registered users soon. However, for now, without any unique features, it is unclear to me, why someone would go to the HarperCollins site to get a preview of only their books, rather than go to the Amazon and get previews across many more publishers.
This initiative is a small step in the correct direction. At the end of the day, it’s a marketing tool, and limits itself to that. Because they added links to various book sellers on the page, they can potentially reap the benefits of the long tail, by assisting readers to find the more obscure titles in their catalogue. However, their focus is still on selling the physical book. They specifically stated that they do not want to be become booksellers. (Although through their “Digital Media Cafe,” they are experimenting with selling digital content through their website.)
As readers increasingly want to interact with their media and text, a big question remains. Is Harper Collins and the publishing industry ready to release control they traditionally held and reinterpret their purpose? With POD, search engines, emergent communities, we are seeing the formation of new authors, filters, editors and curators. They are playing the roles that publishers once traditional filled. It will be interesting to see how far Harper Collins goes with these initiatives. For instance, Harper Collins also has intentions to start working with myspace and facebook to add links to books on their site. Are they prepared for negative commentary associated with those links? Are they ready to allow people to decide which books get attention?
If traditional publishers do not provide media (including text) in ways we are increasingly accustomed to receiving it, their relevance is at risk. We see them slowly trying to adapt to the shifting expectations and behaviors of people. However, in order to maintain that relevance, they need to deeply rethink what a publisher is today.

controversy in a MMORPG

image source: confessions of an aca/fan
Henry Jenkins gives a fascinating account of an ongoing controversy occurring in a MMORPG in the People’s Republic of China, the fastest growing market for these online games. Operated by Netease, Fantasy Westward Journey (FWJ) has 22 million users, with an average of over 400,000 concurrent players. Last month, game administrators locked down the account of an extremely high ranking character, for having an anti-Japanese name, as well as leading a 700 member guild with a similarly offensive name. The character would be “jailed” and his guild would be dissolved unless he changed his character and guild’s name. The player didn’t back down and went public with accusations of ulterior motives by Netease. Rumors flew across FWJ about its purchase by a Japanese firm which was dictating policy decisions. A few days late, an alarming protest of nationalism broke out, consisting of 80,000 players on one of the gaming servers, which was 4 times the typical number of players on a server.
The ongoing incidents are important for several reasons. One is that it is another demonstration of how people (from any nation) bring their conceptualization of the real world into the virtual space. Sino-Japanese relations are historically tense. Particularly, memories of war and occupation by the Japan during World War II are still fresh and volatile in the PRC. In a society whose current calender year is 4703, the passage of seventy years accounts for a relatively short amount of time. Here, political and racial sentinment seamlessly interweave between the real and the virtual. However, these spaces and the servers which house them are privately owned.
The second point is that concentrations of economic and cultural production is being redistributed across the globe. The points where the real and the virtual worlds become porous are likewise spreading to places throughout Asia. Therefore, coverage of these events outside of Asia should not be considered fringe, but I see important incentives to track, report and discuss these events as I would local and regional phenomenon.

wikimania: the importance of naming things

wikimania logoI’ll write up what happened on the second day of Wikimania soon – I saw a lot of talks about education – but a quick observation for now. Brewster Kahle delivered a speech after lunch entitled “Universal Access to All Knowledge”, detailing his plans to archive just about everything ever & the various issues he’s confronted along the way, not least Jack Valenti. Kahle learned from Valenti: it’s important to frame the terms of the debate. Valenti explained filesharing by declaring that it was Artists vs. Pirates, an obscuring dichotomy, but one that keeps popping up. Kahle was happy that he’d succeeded in creating a catch phrase in naming “orphan works” – a term no less loaded – before the partisans of copyright could.

Wikimania is dominated by Wikipedia, but it’s not completely about Wikipedia – it’s about wikis more generally, of which Wikipedia is by far the largest. There are people here using wikis to do immensely different things – create travel guides, create repositories of lesson plans for K–12 teachers, using wikis for the State Department’s repositories of information. Many of these are built using MediaWiki, the software that runs Wikipedia, but not all by any means. All sorts of different platforms have been made to create websites that can be edited by users. All of these fall under the rubric “wiki”. we could just as accurately refer to wikis as “collaboratively written websites”, the least common denominator of all of these sites. I’d argue that the word has something to do with the success of the model: nobody would feel any sense of kinship about making “collaboratively written websites” – that’s a nebulous concept – but when you slap the name “wiki” on it, you have something easily understood, a form about which people can become fanatical.

wikimania day 1: wrap up

wikimania logoThere was something of a valedictory feeling around Wikimania yesterday, springing perhaps from Jimmy Wales’s plenary talk: the feeling that a magnificent edifice had been constructed, and all that remained was to convince people to actually use it. If we build it, they will come & figure it out. Wales declared that it was time to stop focusing on quantity in Wikipedia and to start focusing on quality: Wikipedia has pages for just about everything that needs a page, although many of the pages aren’t very good. I won’t disagree with that, but there’s something else that needs to happen: the negotiation involved as their new technology increasingly hits the rest of the world.

This was the narrative arc traced by Larry Lessig in his plenary: speaking about how he got more and more enthusiastic about the potential of freely shared media before running into the brick wall of the Supreme Court. At that point, he realized, it was time to regroup and assess what would be politically & socially necessary to bring free media to the masses. There’s something similar going on in the wiki community as a whole. It’s a tremendously fertile time technologically, but there are increasingly social issues that scream for engagement.

One of the most interesting presentations that I saw yesterday afternoon was Daniel Caeton’s presentation on negotiating truth. Caeton’s talk was based on his upcoming book entitled The Wild, Wild Wiki: Unsettling the Frontiers of Cyperspace. Caeton teaches writing at California State University in Fresno; he experimented in having students explore & contribute to the WIkipedia. The issues that arose surprised him. His talk focused on the experiences of Emina, a Bosnian Muslim student: she looked at how Bosnian Muslims were treated in the Wikipedia and found immensely diverging opinions. She found herself in conversation with other contributors about the meaning of the word “Bosniak”. In doing so she found herself grappling with the core philosophy of Wikipedia: that truth is never objective, always in negotiation. Introducing this sort of thinking is something that needs to be taught just as much as Wiki markup syntax, though it hasn’t had nearly as much attention.

Today there’s a whole track on using Wikis in education: I’ll be following & reporting back from that.

transmitting live from cambridge: wikimania 2006

wikimania logoI’m at the Wikimania 2006 conference at Harvard Law School, from where I’ll be posting over the course of the three-day conference (schedule). The big news so far (as has already been reported in a number of blogs) came from this morning’s plenary address by Jimmy Wales, when he announced that Wikipedia content was going to be included in the Hundred Dollar Laptop. Exactly what “Wikipedia content” means isn’t clear to me at the moment – Wikipedia content that’s not on a network loses a great deal of its power – but I’m sure details will filter out soon.

This move is obvious enough, perhaps, but there are interesting ramifications of this. Some of these were brought out during the audience question period during the next panel that I attended, in which Alex Halavis talked about issues of evaluating Wikipedia’s topical coverage, and Jim Giles, the writer of the Nature study comparing the Wikipedia & the Encyclopædia Britannica. The subtext of both was the problem of authority and how it’s perceived. We measure the Wikipedia against five hundred years of English-language print culture, which the Encyclopædia Britannica represents to many. What happens when the Wikipedia is set loose in a culture that has no print or literary tradition? The Wikipedia might assume immense cultural importance. The obvious point of comparison is the Bible. One of the major forces behind creating Unicode – and fonts to support the languages used in the developing world – is SIL, founded with the aim of printing the Bible in every language on Earth. It will be interesting to see if Wikipedia gets as far.

three glimpses at the future of television

radio_tv.jpg
1. When radio was the main electronic media source, families would gather around the radio and listen to music, news, or entertainment programming, not unlike traditional television viewing. Today, radio listening habits have shifted, and I only hear the radio in cars and offices. Television viewing (if you can even call it that) is experiencing a similar shift, as people multitask at home, with the television playing in the background. With the roll out of Digital Multimedia Broadcasting (DMB) in South Korean last year, the use of television is starting to resemble radio even more. DBM is a digital radio transmission system which allows television signals to play on mobile devices. Since its 2005 debut, a slew of DMB capability devices, such as GPS units and the PM80 PDA from LG have been released in Korea. DBM systems are being planned throughout Europe and Asia, which may make mobile television viewing ubiquitous and the idea of a family sitting in front of a television at home seem quaint.
nbc_logo.jpg2. I recently posted on a partnership between youtube and NBC, which will create a channel on the video sharing site to promote new shows from NBC this autumn. NBC seems to have taken the power of youtube to heart as is producing new episodes of the failed WB pilot, “Nobody’s Watching,” which never aired. The pilot was leaked to youtube and viewed by over 450,000 people. I’m waiting to see how far NBC is willing experiment proactively with youtube and its community to create better programming.

head_abc_video2.jpg
3. In the US, the shifting of television from large boxes residing in living rooms to desktops, laptops, and portable media players, has often meant viewing pirated programming uploaded onto video sharing sites like youtube or downloading files from bit torrent. For those who don’t want to break the law, Jeff Jarvis reports that legal streamed and downloaded content will be helped by an announcement by ABC that 87% of viewers of their streamed video were able to recall its advertising, which is over 3 times the average recall of standard television advertising. While legal content is important, I hope it doesn’t kill remix culture or the anyone can be a star ability that youtube provides.