Category Archives: oed

a dictionary in transition

oed.jpg
James Gleick had a fascinating piece in the Times Sunday magazine on how the Oxford English Dictionary is reinventing itself in the digital age. The O.E.D. has always had to keep up with a rapidly evolving English language. It took over 60 years and two major supplements to arrive at a second edition in 1989, around the same time Tim Berners-Lee and others at the CERN particle physics lab in Switzerland were creating up with the world wide web. Ever since then, the O.E.D. been hard at work on a third edition but under radically different conditions. Now not only the language but the forms in which the language is transmitted are in an extreme state of flux:

In its early days, the O.E.D. found words almost exclusively in books; it was a record of the formal written language. No longer. The language upon which the lexicographers eavesdrop is larger, wilder and more amorphous; it is a great, swirling, expanding cloud of messaging and speech: newspapers, magazines, pamphlets; menus and business memos; Internet news groups and chat-room conversations; and television and radio broadcasts.

Crucial to this massive language research program is a vast alphabet soup known as the Oxford English Corpus, a growing database of more than a billion words, culled mostly from the web, which O.E.D. lexicographers analyze through various programs that compare and contrast contemporary word usages in contexts ranging from novels and academic papers to teen chat rooms and fan sites. Together this data comprises what the O.E.D. calls “the fullest, most accurate picture of the language today” (I’m curious to know how broadly they survey the world’s general adoption of English. I’m under the impression that it’s still largely an Anglo-American affair).
Marshall McLuhan famously summarized the shift from oral tradition to the written word as “an eye for an ear”: a general migration of thought and expression away from the folkloric soundscapes of tribal society toward encounters by individuals with visual symbols on a page, a movement that climaxed in the age of print, and which McLuhan saw at last reversed in the global village of electronic mass media. The curious thing that McLuhan did not live long enough to witness was the fusion of eye-ear cultures in the fast-moving textual traditions of cell phones and the Internet. Written language has acquired an immediacy and a malleability almost matching oral speech, and the effect is a disorienting blurring of boundaries where writing is almost the same as speaking, reading more like overhearing.
So what is a dictionary to do? Or be? Such fundamental change in the process of maintaining “the definitive record of the English language” must have an effect on the product. Might the third “edition” be its final never-ending one? Gleick again:

No one can say for sure whether O.E.D.3 will ever be published in paper and ink. By the point of decision, not before 20 years or so, it will have doubled in size yet again. In the meantime, it is materializing before the world’s eyes, bit by bit, online. It is a thoroughgoing revision of the entire text. Whereas the second edition just added new words and new usages to the original entries, the current project is researching and revising from scratch — preserving the history but aiming at a more coherent whole.

They’ve even experimented with bringing readers into the process, working with the BBC earlier this year to solicit public aid in locating first usages for a list of particularly hard-to-trace words. One wonders how far they’d go in this direction. It’s one thing to let people contribute at the edges — the 50 words in that list are all from the 20th century — but to open the full source code is quite another. It seems the dictionary’s challenge is to remain a sturdy ark for the English language during this period of flood, and to proceed under the assumption that we may have seen the last of the land.
(image by Kenneth Moyle)

what we talk about when we talk about books

I spent the past weekend at the Fourth International Conference on the Book, hosted by Emerson College in Boston this year. I was there for a conversation with Sven Birkerts (author of The Gutenberg Elegies) which happened to kick off the conference. The two of us had been invited to discuss the future of the book, which is a great deal to talk about in an hour. While Sven was cast as the old curmudgeon and I was the Young Turk, I’m not sure that our positions are that dissimilar. We both value books highly, though I think my definition of book is a good deal broader than his. Instead of a single future of the book, I suggested that we need to be talking about futures of the book.
This conciliatory note inadvertently described the conference as a whole, the schedule of which can be inspected here. The subjects discussed wandered all over the place, from people trying to carry out studies on how well students learned with an ebook device to a frankly reactionary presentation of book art. Bob Young of Lulu proclaimed the value of print on demand for authors; Jason Epstein proclaimed the value of print on demand for publishers. Publishers wondered whether the recent rash of falsified memoirs would hurt sales. Educators talked about the need for DRM to encrypt online texts. There was a talk on using animals to deliver books which I’m very sorry that I missed. A Derridean examination of the paratexts of Don Quixote suggested out that for Cervantes, the idea of publishing a book – as opposed to writing one – suggested death, perhaps what I’d been trying to argue last week.
Everyone involved was dealing with books in some way or another; a spectrum could be drawn from those who were talking about the physical form of the book and those who were talking about content of the book entirely removed from the physical. These are two wildly different things, which made this a disorienting conference. The cumulative effect was something like if you decided to convene a conference on people and had a session with theologians arguing about the soul in one room while in another room a bunch of body builders tried to decide who was the most attractive. Similarly, everyone at the Conference on the Book had something to do with books; however, many people weren’t speaking the same language.
This isn’t necessarily their fault. One of the most apt presentations was by Catherine Zekri of the Université de Montréal, who attempted to decipher exactly what a “book” was from usage. She noted the confusion between the object of the book and its contents, and pointed out that this confusion carried over into the electronic realm, where “ebook” can either mean a device (like the Sony Reader) or the text that’s being read on the device. A thirty-minute session wasn’t nearly long enough to suss out the differences being talked about, and I’ll be interested to read her paper when it’s finally published.
As an experiment paralleling Zekri’s, here are three objects:

threebooks.png

There are certain similarities all of these objects share: they’re all made of paper and have a cover and pages. Some similarities are only shared by some of the objects: what’s the best way of grouping these? Three relationships seem possible. Objects 1 & 2 were bought containing text; object 3 was blank when bought, though I’ve written in it since. Objects 2 & 3 are bound by staples; object 1 is bound by glue. Objects 1 & 3 were written by a single person (Maurice Blanchot in the case of 1, myself in the case of 3); object 2 was written by a number people.
If we were to classify these objects, how would we do it? Linguistically, the decision has already been made: object 1 is a book, object 2 is a magazine, and object 3 is a notebook, which is, the Oxford English Dictionary says, “a small book with blank or ruled pages for writing notes in”. By the words we use to describe them, objects 1 & 3 are books. A magazine isn’t a book: it’s “a periodical publication containing articles by various writers” (the OED again). This is something seems intuitive: a magazine isn’t a book. It’s a magazine.
But why isn’t a magazine a book, especially if a notebook is a book? If you look again at the relationships I suggested between the three objects above, the shared attributes of the book and the magazine seem more logical and important than the attributes shared between the book and the notebook. Why don’t we think of a magazine as a book? To use the language of evolutionary biology, the word “book” seems to be a polyphyletic taxon, a group of descendants from a common ancestor that excludes other descendants from the same ancestor.
One answer might be that a single issue of a magazine isn’t complete; rather, it is part of a sequence in time, a sequence which can be called a magazine just as easily as a single issue can. I can say that I’ve read a book, which presumably means that I’ve read and understood every word in it. I can say the same thing about a particular issue of The Atlantic (“I read that magazine.”). I can’t say the same thing about the entire run of The Atlantic, which started long before I was born and continues today. A complete edition of The Atlantic might be closer to a library than a book. Or maybe the problem is time: the date on the cover foregrounds a magazine’s existence in time in a way that a book’s existence in time isn’t something we usually think about.
To expand this: I looked up these definitions in the online OED, where the dictionary exists as a database that can be queried. Is this a book? I have a single-volume OED at home with much the same content, though the online version has changed since the print edition: it points out that since 1983, the word “notebook” can also mean a portable computer. My copy of the OED at home is clearly a book; is the online edition, with its evolving content, also a book? (A stylistic question: we italicize the title of a book when we use it in text – do we italicize the title of a database?)
We’ve been calling things like Wikipedia, which goes even further than the online OED in terms of its mutability over time, a “networked book”. But even with much simpler online projects, issues arise: take Gamer Theory, for example. If much the content of what appears on the Gamer Theory website appears in Harvard University Press’s version of the book, most people would agree that the online version is a book, or a draft of one. But what are the boundaries of this kind of book? Are the comments in the website part of the book? Is the forum part of the book? Are the spam comments that we deleted from the forum part of the book? This also has something to do with Bob’s post on Monday, where he wondered how sharply defined the authorial voice of a book needs to be to make it worthwhile as a book.
What we have here is a language problem: the forms that we can create are evolving faster than our language – and possibly our understanding – can keep up with them.