Tag Archives: yahoo

why google and yahoo love wikipedia

wikipedia.png From Dan Cohen’s excellent Digital Humanities Blog comes a discussion of the Wikipedia story that Cohen claims no one seems to be writing about — namely, the question of why Google and Yahoo give so much free server space and bandwith to Wikipedia. Cohen points out that there’s more going on here than just the open source ethos of these tech companies: in fact, the two companies are becoming increasingly dependent on Wikipedia as a resource, both as something to repackage for commercial use (in sites such as Answers.com), and as a major component in the programming of search algorithms. Cohen writes:
Let me provide a brief example that I hope will show the value of having such a free resource when you are trying to scan, sort, and mine enormous corpora of text. Let’s say you have a billion unstructured, untagged, unsorted documents related to the American presidency in the last twenty years. How would you differentiate between documents that were about George H. W. Bush (Sr.) and George W. Bush (Jr.)? This is a tough information retrieval problem because both presidents are often referred to as just “George Bush” or “Bush.” Using data-mining algorithms such as Yahoo’s remarkable Term Extraction service, you could pull out of the Wikipedia entries for the two Bushes the most common words and phrases that were likely to show up in documents about each (e.g., “Berlin Wall” and “Barbara” vs. “September 11” and “Laura”). You would still run into some disambiguation problems (“Saddam Hussein,” “Iraq,” “Dick Cheney” would show up a lot for both), but this method is actually quite a powerful start to document categorization.
Cohen’s observation is a valuable reminder that all of the discussion of Wikipedia’s accuracy and usefulness as an academic tool is really only skimming the surface of how and why the open-souce encyclopedia is reshaping the way knowledge is made and accessed. Ultimately, the question of whether or not Wikipedia should be used in the classroom might be less important than whether — or how — it is used in the boardroom, by companies whose function is to repackage, reorganize and return “the people’s knowledge” back to the people at a tidy profit.

yahoo buys del.icio.us and takes on google?

Just as we were creating a del.icio.us account and linking it to our site, Yahoo announced the purchase of the company. This strategy of purchasing successful web service start-ups is nothing new for Yahoo (for example, flckr and egroups.) Del.icio.us’s popularity has prompted lots of discussion has been going on across the internet, notably on slashdot as well as social software.
Del.icio.us started with the simple idea of putting bookmarks on the web. By making them public, it added a social networking component to the experience. Bookmarks, in a way, are an external representation of notable ideas in the mind of the owner.
They also announced a new partnership with Six Apart, who created Moveable Type. Although, they did not purchase Six apart. Six Apart has optimized their blogging software to work with Yahoo’s small business hosting service.
In the end, these strategies make sense for Yahoo and other large media companies, because they are buying proven technologies and a strong user base. Small companies are often more nimble in thought and speed, and then able to develop novel technology.
Interestingly, the online discussion seem to be framing this event in terms of Yahoo versus Google. Microsoft is noticeably absent in the discussion. Perhaps, as Lisa suggested, they are focused on gaming right now. With each new initiative and acquisition, the debates about the services and strategies of Yahoo and Google sound more like discussions about competing fall line-ups of ABC, NBC and CBS.