Category Archives: gutenberg

“the bookish character of books”: how google’s romanticism falls short

tristramgbs.gif
Check out, if you haven’t already, Paul Duguid’s witty and incisive exposé of the pitfalls of searching for Tristram Shandy in Google Book Search, an exercise which puts many of the inadequacies of the world’s leading digitization program into relief. By Duguid’s own admission, Lawrence Sterne’s legendary experimental novel is an idiosyncratic choice, but its many typographic and structural oddities make it a particularly useful lens through which to examine the challenges of migrating books successfully to the digital domain. This follows a similar examination Duguid carried out last year with the same text in Project Gutenberg, an experience which he said revealed the limitations of peer production in generating high quality digital editions (also see Dan’s own take on this in an older if:book post). This study focuses on the problems of inheritance as a mode of quality assurance, in this case the bequeathing of large authoritative collections by elite institutions to the Google digitization enterprise. Does simply digitizing these – ?books, imprimaturs and all – ?automatically result in an authoritative bibliographic resource?
Duguid’s suggests not. The process of migrating analog works to the digital environment in a way that respects the orginals but fully integrates them into the networked world is trickier than simply scanning and dumping into a database. The Shandy study shows in detail how Google’s ambition to organizing the world’s books and making them universally accessible and useful (to slightly adapt Google’s mission statement) is being carried out in a hasty, slipshod manner, leading to a serious deficit in quality in what could eventually become, for better or worse, the world’s library. Duguid is hardly the first to point this out, but the intense focus of his case study is valuable and serves as a useful counterpoint to the technoromantic visions of Google boosters such as Kevin Kelly, who predict a new electronic book culture liberated by search engines in which readers are free to find, remix and recombine texts in various ways. While this networked bibliotopia sounds attractive, it’s conceived primarily from the standpoint of technology and not well grounded in the particulars of books. What works as snappy Web2.0 buzz doesn’t necessarily hold up in practice.
As is so often the case, the devil is in the details, and it is precisely the details that Google seems to have overlooked, or rather sprinted past. Sloppy scanning and the blithe discarding of organizational and metadata schemes meticulously devised through centuries of librarianship, might indeed make the books “universally accessible” (or close to that) but the “and useful” part of the equation could go unrealized. As we build the future, it’s worth pondering what parts of the past we want to hold on to. It’s going to have to be a slower and more painstaking a process than Google (and, ironically, the partner libraries who have rushed headlong into these deals) might be prepared to undertake. Duguid:

The Google Books Project is no doubt an important, in many ways invaluable, project. It is also, on the brief evidence given here, a highly problematic one. Relying on the power of its search tools, Google has ignored elemental metadata, such as volume numbers. The quality of its scanning (and so we may presume its searching) is at times completely inadequate. The editions offered (by search or by sale) are, at best, regrettable. Curiously, this suggests to me that it may be Google’s technicians, and not librarians, who are the great romanticisers of the book. Google Books takes books as a storehouse of wisdom to be opened up with new tools. They fail to see what librarians know: books can be obtuse, obdurate, even obnoxious things. As a group, they don’t submit equally to a standard shelf, a standard scanner, or a standard ontology. Nor are their constraints overcome by scraping the text and developing search algorithms. Such strategies can undoubtedly be helpful, but in trying to do away with fairly simple constraints (like volumes), these strategies underestimate how a book’s rigidities are often simultaneously resources deeply implicated in the ways in which authors and publishers sought to create the content, meaning, and significance that Google now seeks to liberate. Even with some of the best search and scanning technology in the world behind you, it is unwise to ignore the bookish character of books. More generally, transferring any complex communicative artifacts between generations of technology is always likely to be more problematic than automatic.

Also take a look at Peter Brantley’s thoughts on Duguid:

Ultimately, whether or not Google Book Search is a useful tool will hinge in no small part on the ability of its engineers to provoke among themselves a more thorough, and less alchemic, appreciation for the materials they are attempting to transmute from paper to gold.

johannes who?

firstmovabletype.jpg
This is the oldest existing document in the world printed with metal movable type: an anthology of Zen teachings, Goryeo Dynasty, Korea… 1377. It’s a little known fact, at least in the West, that movable type was first developed in Korea circa 1230, over 200 years before that goldsmith from Mainz came on the scene. I saw this today in the National Library of Korea in Seoul (more on that soon). This book is actually a reproduction. The original resides in Paris and is the subject of a bitter dispute between the French and Korean governments.

google and the future of print

Veteran editor and publisher Jason Epstein, the man who first introduced paperbacks to American readers, discusses recent Google-related books (John Battelle, Jean-Noël Jeanneney, David Vise etc.) in the New York Review, and takes the opportunity to promote his own vision for the future of publishing. As if to reassure the Updikes of the world, Epstein insists that the “sparkling cloud of snippets” unleashed by Google’s mass digitization of libraries will, in combination with a radically decentralized print-on-demand infrastructure, guarantee a bright future for paper books:

[Google cofounder Larry] Page’s original conception for Google Book Search seems to have been that books, like the manuals he needed in high school, are data mines which users can search as they search the Web. But most books, unlike manuals, dictionaries, almanacs, cookbooks, scholarly journals, student trots, and so on, cannot be adequately represented by Googling such subjects as Achilles/wrath or Othello/jealousy or Ahab/whales. The Iliad, the plays of Shakespeare, Moby-Dick are themselves information to be read and pondered in their entirety. As digitization and its long tail adjust to the norms of human nature this misconception will cure itself as will the related error that books transmitted electronically will necessarily be read on electronic devices.

Epstein predicts that in the near future nearly all books will be located and accessed through a universal digital library (such as Google and its competitors are building), and, when desired, delivered directly to readers around the world — made to order, one at a time — through printing machines no bigger than a Xerox copier or ATM, which you’ll find at your local library or Kinkos, or maybe eventually in your home.
espressobookmachine.jpg Predicated on the “long tail” paradigm of sustained low-amplitude sales over time (known in book publishing as the backlist), these machines would, according to Epstein, replace the publishing system that has been in place since Gutenberg, eliminating the intermediate steps of bulk printing, warehousing, retail distribution, and reversing the recent trend of consolidation that has depleted print culture and turned book business into a blockbuster market.
Epstein has founded a new company, OnDemand Books, to realize this vision, and earlier this year, they installed test versions of the new “Espresso Book Machine” (pictured) — capable of producing a trade paperback in ten minutes — at the World Bank in Washington and (with no small measure of symbolism) at the Library of Alexandria in Egypt.
Epstein is confident that, with a print publishing system as distributed and (nearly) instantaneous as the internet, the codex book will persist as the dominant reading mode far into the digital age.

mapping books

Gutenkarte is an effort to map books by MetaCarta. The website takes text from books in Project Gutenberg, searches them for the appearance of place names, and plots them on a map of the world using their own GeoParser API, creating an astonishing visualization of the world described in a text. Here, for example, is a map of Edward Gibbon’s Decline and Fall of the Roman Empire:

roman.empire.png

(Click on the picture to view the live map.) It’s not perfect yet: note that “china” is in the Ivory Coast, and “Asia” seems to be located just off the coast of Cameroon. But the map does give an immediate sense of the range of Gibbon’s book: in this case, the extent of the Roman world. The project is still in its infancy: eventually, users will be able to correct mistakes.

Gutenkarte suggests ways of looking at texts not dissimilar from that of Franco Moretti, who in last year’s Graphs, Maps, Trees: Abstract Models for Literary History (discussed by The Valve here) discussed how making maps of places represented in literature could afford a new way of discussing texts. Here, for example, is a map he constructed of Parisian love affairs in the novel, demonstrating that lovers were usually separated by the Seine:

seine.map.png

(from the “Maps” chapter, online here if you have university access to the New Left Review.) Moretti constructed his maps by hand, with the help of grad student labor; it will be interesting to see if Gutenkarte will make this sort of visualization accessible to all.

a book is not a text: the noise made by people

The frontispiece for _Tristram Shandy_
Momus – a.k.a. Nick Currie, electronic folk musician, Wired columnist, and inveterate blogger – has posted an interesting short video on his blog, Click Opera. He’s teaching a class on electronic music composition & narrative for Benneton’s Fabrica in Venice. His video encourages students to listen for the environmental sounds that they can make with electronic instruments: not the sounds that they’re designed to make, but the incidental noises that they make – the clicking of keys on a Powerbook, for example – that we usually ignore as being just that, incidental. We ignore the fact that these noises are made directly by people, without the machine’s intercession.

Momus’s remarks put me in mind of something said by Jerome McGann at the Transliteracies conference in Santa Barbara last June – maybe the most important thing that was said at the conference, even if it didn’t warrant much attention at the time. What we tend to forget when talking about reading, he said, was that books – even regular old print books – are full of metadata. (Everybody was talking about metadata in June, like they were talking about XML a couple of years ago – it was the buzzword that everyone knew they needed to have an opinion about. If not, they swung the word about feverishly in the hopes of hitting something.) McGann qualified his remarks by referring to Ezra Pound’s idea of melopoeia, phanopoeia, and logopoeia – specific qualities in language that make it evocative:

. . . you can still charge words with meaning mainly in three ways, called phanopoeia, melopoeia, logopoeia. You can use a word to throw a visual image on to the reader’s imagination, or you charge it by sound, or you use groups of words to do this.

(The ABC of Reading, p.37) In other words, words aren’t always just words: when used well, they refer beyond themselves. This process of referring, McGann was claiming, is a sort of metadata, even if technologists don’t think about it this way: the way in which words are used provides the attuned reader with information about their composition beyond the meaning of the words themselves.

But thinking about McGann’s comments in terms of book design might suggest wider implications for the future of the book. Let’s take a quick excursion to the past of the book. Once it was true that you couldn’t judge a book by its cover. Fifty years ago, master book designer Jan Tschichold opined about book jackets:

A jacket is not an actual part of the book. The essential portion is the inner book, the block of pages . . . [U]nless he is a collector of book jackets as samples of graphic art, the genuine reader discards it before he begins.

(“Jacket and Wrapper,” in The Form of the Book: Essays on the Morality of Good Design) Tschichold’s statement seems bizarre today: nobody throws away book jackets, especially not collectors. Why? Because today we take it for granted that we judge books by their covers. The cover has been subsumed into our idea of the book: it’s a signifying part of the book. By looking at a cover, you, the prospective book-buyer, can immediately tell if a recently-published piece of fiction is meant to be capital-L Literature, Nora Roberts-style fluff, or somewhere in between. Contextual details like the cover are increasingly important.

Where does the electronic book fit into this, if at all? Apologists for the electronic book are constantly about the need for an ideal device as the be-all and end-all: when we have e-Ink or e-Paper and a well-designed device which can be unrolled like a scroll, electronic books will suddenly take off. This isn’t true, and I think it has something to do with the way people read books, something that hasn’t been taken into account by soi-disant futurists, and something like what Jerome McGann was gesturing at. A book is not a text. It’s more than a text. It’s a text and a collection of information around that text, some of which we consciously recognize and some of which we don’t.

A few days ago, I excoriated Project Gutenberg’s version of Tristram Shandy. This is why: a library of texts is not the same thing as a library of books. A quick example: download, if you wish, the plain text or HTML version of Tristram Shandy, which you can get here. Look at the opening pages of the HTML version. Recognizing that this particular book needs to be more than plain old seven-bit ASCII, they’ve included scans of the engravings that appear in the book (some by William Hogarth, like this; a nice explication of this quality of the book can be found here). What’s interesting to me about these illustrations that Project Gutenberg is how poorly done these are. These are – let’s not beat around the bush – bad scans. The contrast is off; things that should be square look rectangular. The Greek on the title page is illegible.

Let’s go back to Momus listening to the unintentional noises made by humans using machines: what we have here is the debris of another noisy computer, the noise of a key that we weren’t supposed to notice. Something about the way these scans is dated in a very particular way – half of the internet looked like this in 1997, before everyone learned to use Photoshop properly. Which is when, in fact, this particular document was constructed. In this ugliness we have, unintentionally, humanity. John Ruskin (not a name often conjured with when talking about the future) declared that one of the hallmarks of the Gothic as an architectural style was a perceived “savageness”: it was not smoothed off like his Victorian contemporaries would have liked. But “savageness”, for him, was no reproach: instead, it was a trace of the labor that went into it, a trace of the work’s humanity. Perfection, for him, was inhumane: humanity

. . . was not intended to work with the accuracy of tools, to be precise and perfect in all their actions. If you will have that precision out of them, and make their fingers measure degrees like cog-wheels, and their arms strike curves like compasses, you must unhumanize them . . 

(The Stones of Venice) What we have here is, I think, something similar. While Project Gutenberg is probably ashamed of the quality of these graphics, there’s something to be appreciated here. This is a text on its way to becoming a book; it unintentionally reveals its human origins, the labor of the anonymous worker who scanned in the illustrations. It’s a step in the right direction, but there’s a great distance still to go.