Category Archives: search

fingerprinting text in the age of cut-and-paste

Lexis Nexis has installed new software for detecting plagiarism. As described on their site:

LexisNexis CopyGuard uses pattern-matching technology to identify suspect passages in submitted documents. An easy-to-read report underlines and color codes questionable sentences, with links to the original sources.

This could be an important tool for assuring integrity not only in professional journalism, but also in the emerging class of amateur reporters. But apply it to blogs and CopyGuard might overload and shut down. Bloggers are constantly recycling text, often without clear attribution, or obvious demarcation between quote and original commentary. The bounds of plagiarism seem a bit less clear when you consider that cutting and pasting is one of the main ways we converse online.
(NY Times has story)

the meaning of life? can you find an answer on the web?

On October 10, 2004, I was sitting with my laptop at a cafe in New York City trying to avoid writing a paper for my first-year humanities class. In a moment of despair, I typed “what is the meaning of life?” into an online forum. Fifty thousand hits and two thousand answers later…
That’s the cover copy for David Seaman’s first book “The Real Meaning of Life.,” due out this September. The book is a print version of the impromptu networked book, generated online in response to his question. Aphorisms like “be grease not glue,” and “there is not point to life, and that is exactly what makes it so special,” came from Buddhists, born-again Christians, atheists, waitresses, students, and recovering heart attack patients.
The public platform that the web offers ordinary people, introduces a new way to contemplate this perennial question. Typing “what is the meaning of life?” into wikipedia. yields an extensive post with over 500 edits and a lively discussion page. Here is an excerpt:

The person who asks “What is the meaning of life?” is pondering life’s purpose, in the context “Why are we here?”, or is searching for a justification or goal as in “What should I do with my life”? Thus, we’ve separated the main query into two different questions: one about the objective purpose of life (“Why are we here?”, and the other about subjective purpose in life (“What should I do with my life?”). Many claim that life has an objective purpose, though they differ as to what this purpose is, or where it comes from. Others deny that an objective purpose of anything is possible. Purposes, they argue, are by their very nature purely subjective. Subjective purpose of course varies from person to person. In some ways the quandary is a circular argument, the enquirer is in the midst of life seeking to validate life, or be it the meaning of it.

Books have, traditionally, been vehicles for the contemplation of this circular question. Scripture, scholarly texts, poetry, novels, self-help books, how-to books, grapple with the issue–“why are we here? And what should I do with my life?”–in various ways. It is interesting to see how the question plays out in the interactive space of the web.
Type “what is the meaning of life?” into the Google search engine and it yields 62,300 responses. Including an “Ask Yahoo” page from 1998 in which Juan asks the Yahoo search team to find the meaning of life for him. The letter he gets back reccommends a visit to the Yahoo meaning of life page. It also offers this advice:

Now, if you’re looking for the meaning of your life in particular, then we’re afraid we have to fall back on the somewhat predictable response: “It’s up to you.” Many people try to give lasting meaning to their lives by making the world a better place than when they entered it, either through scientific, philosophical, or artistic contributions. Others try by raising children that can themselves make contributions and preserve important societal and religious values for future generations.

There are also quite a few personal web pages that address the question. One particularly poignent example is JaredStory.com a site by and about Jared High, a young boy who took his own life shortly after a violent beating by a school bully. This heartbreaking site is filled with biblical quotations, audio and video of Jared, information about suicide, bullying, and a transciption of the lawsuit filed by his grieving parents.
Taken together these online “answers” create a wonderful mosaic of humanity striving to know itself and to connect with the universe. The web gives us an opportunity to read this interlinked accumulation of wisdom on a scale never before possible.

“finally, I have a Memex!”

There’s an essay worth reading in the ny times book review this past sunday by Steven Johnson about a powerful semantic desktop management and search tool recently released for Macs. vannevar.gif The software (called DEVONthink) not only helps organize and briskly sift through readings, clippings, quotes, and one’s own past writings, but assists in the mysterious mental processes that are at the heart of writing – associative trains, useful non sequiturs, serendipitous stumbles. In effect, we now have a tool resembling the Memex device described in the seminal 1945 essay, As We May Think by visionary engineer Vannevar Bush. Working with the cutting edge technologies of his day – microfilm, thermionic tubes, and punch, or “Hollerith,” cards – Bush pondered how technology might help humanity to manage and make use of its vast systems of information. His recognition of the basic problem is no less relevant today: “Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing.” Fast forward to 2005. Now, the holy grail of search is the Semantic Web – moving beyond the artificiality of crude content-based queries and bringing meaning, relevance, and associations into the mix.
“Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.” – Vannevar Bush

memex-1.jpg

It’s quite suggestive that DEVONthink’s semantic search function can to an extent be trained, taking the obnoxious little puppy on Windows search toward its full potential – a sleek, truffle-tuned hound. When Johnson loads his body of work onto the computer, the hound picks up the distinctive scent of his writing, which in turn suggests affinities, similarities, and connections to other materials – truffles – that will find their way into later works.
20truf-dog.jpg Says Johnson on his latest blog post, which goes into much greater detail than the Times piece:
“I have pre-filtered the results by selecting quotes that interest me, and by archiving my own prose. The signal-to-noise ratio is so high because I’ve eliminated 99% of the noise on my own.”
But it is significant that DEVONthink is not useful for searching entire books (the author’s own manuscripts notwithstanding). Currently, the tool is ideal for locating chunks of text that fall within the “sweet spot” of 50-500 words. If your archives include entire book-length texts, then the honing power is diminished. DEVONthink is optimal as a clip searcher. File searching remains a frustrating enterprise.
Johnson makes note of this:
“So the proper unit for this kind of exploratory, semantic search is not the file, but rather something else, something I don’t quite have a word for: a chunk or cluster of text, something close to those little quotes that I’ve assembled in DevonThink. If I have an eBook of Manual DeLanda’s on my hard drive, and I search for “urban ecosystem” I don’t want the software to tell me that an entire book is related to my query. I want the software to tell me that these five separate paragraphs from this book are relevant. Until the tools can break out those smaller units on their own, I’ll still be assembling my research library by hand in DevonThink.”
Another point (from the Times piece) worth highlighting here, which relates to our discussion of the networked book:
“If these tools do get adopted, will they affect the kinds of books and essays people write? I suspect they might, because they are not as helpful to narratives or linear arguments; they’re associative tools ultimately. They don’t do cause-and-effect as well as they do ‘x reminds me of y.’ So they’re ideally suited for books organized around ideas rather than single narrative threads: more ‘Lives of a Cell’ and ‘The Tipping Point’ than ‘Seabiscuit.'”
dog.gif And what about other forms of information – images, video, sound etc.? These media will come to play a larger role in the writing process, given the ease of processing them in a PC/web context. Images and music trump language in their associative power (a controversial assertion, please debate it!), and present us with layers of meaning that are harder to dissect, certainly by machine. It is an inchoate hound to be sure.

tower of babel or trivial pursuit?

Read New York Times Article
In an article in yesterday’s NY Times, Alberto Manguel compares the Genesis story of Babel and the library at Alexandria with their alleged modern-day counterpart–Google’s commitment to digitize all human knowledge. Are we constructing a modern-day tower of Babel? A monument to the hubris of what might be possible if we could just get a little smarter. Will Google help us find answers to the big questions: where did we come from, and what’s the meaning of it all? I went online to find out. I Googled the question “What is the meaning of it all?” and got the following:

Continue reading

enter the cybrarian

inside1-googling-libraries.jpg The recent buzz surrounding Google’s library intitiative has everyone talking about the future of research, which inevitably raises the question: how will the digitization of library collections change the role of the librarian? I would guess that, far from becoming obsolete, their role will in fact be elevated in importance, if not necessarily in status. They could very well come to be our indispensible guides through the labyrinth – if perhaps invisible, engineering behind the digital walls.
It’s also important to consider the question of visualization. When you run a search on Google you are given an enormous list. This is already deeply ingrained in the day-to-day business of finding information. But these lists are basically the electronic equivelant of scrolls, with the items algorithmically determined to be most relevant placed at the top. But sooner or later we have to admit that using scrolls for this kind of business is ludicrous. There has to be a better way of arraying these vast harvests of information in a way that allows the researcher to zoom across degrees of specificity and through associative chains of context and meaning. I see no reason why a search shouldn’t take place in some kind of virtual library, emulating the physical architecture of research settings, and allowing for some of the associative or accidental echoes that so often enrich a paper trail blazed through a brick-and-mortar library. Or cannot knowledge resemble a tree, or an arterial matrix? Must we be bound to the scroll?
Returning to the question of the librarian’s role, I recalled this passage from James J O’Donnell’s 1996 paper The Pragmatics of the New: Trithemius, McLuhan, Cassiodorus:
“The librarians of the world have, moreover, already led the way, for academics at least, into the new information environment, not least because they are caught between rising demand from their customers (faculty and students) and rising supply and prices from their suppliers, and so have already been making reality-based decisions about ownership versus access, print versus electronics, and so on. In short, they are just now our leading pragmatists. Can we imagine a time in our universities when the librarians are the well-paid principals and the teachers their mere acolytes in a distribution chain? I do not think we can or should rule out that possibility for a moment”
oldgoogle.jpg
Related articles:
“Questions and Praise for Google Web Library” – NY Times
“Google’s library plan ‘a huge help'” – USA Today
“Making books readable on computer proves trying task” – USA Today
Also, I found this on Searchblog. For a trip down memory lane, check out the original Google in the Stanford archives (click on picture to right). Unfortunately, although it seems interactive, a search just brings up a bunch of stylesheets.

google and big brother

Can Google remain true to its promise to “do no evil,” now that it has shareholders to worry about, advertisers to please, and an ever-increasing reach into the repositories of human knowledge? Google still gives you that warm and fuzzy feeling. It’s got the goofy name, those cute seasonal tailorings of its masthead, the lava lamps. And this is not to mention the various amusing pastimes – the “Google Whack” game in which you try to find two words that cohabit only one of the search engine’s eight billion web pages; or every writer’s guilty pleasure, the Googling of the self, the “auto-Google,” that delicious act of cyber-onanism.
But where might it lead? One day, when I open my fridge, might a sensor not read my searching eye and know that I am looking for milk? And knowing that I have run out, suggest an array of retailers who might be able to replenish my supply? Could Google come to mediate every exchange of information, no matter how inane, or how carnal?
Or could it come to resemble something like the Central Intelligence Corporation in Neal Stephenson’s Snow Crash – a cross between the CIA, the Library of Congress, and DARPA’s “Total Information Awareness” program?
MercuryNews.com | 12/14/2004 | Does Google move augur commercialization of libraries?