I just came across the pre-pub materials for a book, due out this November from the University of Chicago Press, by Jean-Noël Jeanneney, president of the Bibliothè que Nationale de France and famous critic of the Google Library Project. You’ll remember that within months of Google’s announcement of partnership with a high-powered library quintet (Oxford, Harvard, Michigan, Stanford and the New York Public), Jeanneney issued a battle cry across Europe, warning that Google, far from creating a universal world library, would end up cementing Anglo-American cultural hegemony across the internet, eroding European cultural heritages through the insidious linguistic uniformity of its database. The alarm woke Jacques Chirac, who, in turn, lit a fire under all the nations of the EU, leading them to draw up plans for a European Digital Library. A digitization space race had begun between the private enterprises of the US and the public bureaucracies of Europe.
Now Jeanneney has funneled his concerns into a 96-page treatise called Google and the Myth of Universal Knowledge: a View from Europe. The original French version is pictured above. From U. Chicago:
Jeanneney argues that Google’s unsystematic digitization of books from a few partner libraries and its reliance on works written mostly in English constitute acts of selection that can only extend the dominance of American culture abroad. This danger is made evident by a Google book search the author discusses here–one run on Hugo, Cervantes, Dante, and Goethe that resulted in just one non-English edition, and a German translation of Hugo at that. An archive that can so easily slight the masters of European literature–and whose development is driven by commercial interests–cannot provide the foundation for a universal library.
Now I’m no big lover of Google, but there are a few problems with this critique, at least as summarized by the publisher. First of all, Google is just barely into its scanning efforts, so naturally, search results will often come up threadbare or poorly proportioned. But there’s more that complicates Jeanneney’s charges of cultural imperialism. Last October, when the copyright debate over Google’s ambitions was heating up, I received an informative comment on one of my posts from a reader at the Online Computer Library Center. They had recently completed a profile of the collections of the five Google partner libraries, and had found, among other things, that just under half of the books that could make their way into Google’s database are in English:
More than 430 languages were identified in the Google 5 combined collection. English-language materials represent slightly less than half of the books in this collection; German-, French-, and Spanish-language materials account for about a quarter of the remaining books, with the rest scattered over a wide variety of languages. At first sight this seems a strange result: the distribution between English and non-English books would be more weighted to the former in any one of the library collections. However, as the collections are brought together there is greater redundancy among the English books.
Still, the “driven by commercial interests” part of Jeanneney’s attack is important and on-target. I worry less about the dominance of any single language (I assume Google wants to get its scanners on all books in all tongues), and more about the distorting power of the market on the rankings and accessibility of future collections, not to mention the effect on the privacy of users, whose search profiles become company assets. France tends much further toward the enlightenment end of the cultural policy scale — witness what they (almost) achieved with their anti-DRM iTunes interoperability legislation. Can you imagine James Billington, of our own Library of Congress, asserting such leadership on the future of digital collections? LOC’s feeble World Digital Library effort is a mere afterthought to what Google and its commercial rivals are doing (they even receive private investment from Google). Most public debate in this country is also of the afterthought variety. The privatization of public knowledge plows ahead, and yet few complain. Good for Jeanneney and the French for piping up.
I do think that the fear of English-dominated digital archives is very real. As I have argued, among my peers and outside, that academics who work in non-English languages – Arabic, Sanskrit, Thai, etc. – have a right and responsibility to make sure that the mass-digitization projects do not leave such archives behind.
The issue isn’t merely scanning such materials but incorporating correct metadata and search criteria. How does one input arabic search terms? Display embedded ones?
The crucial thing that Jeanneney raises for me is “unsystematic digitization” which I read as “unguided” even. The onus, to be clear, is not necessarily on Google but on academics and libraries to give non-English archives the priority that they are NOT receiving at the moment.
The over 400 languages are represented sounds kinda cool, until one looks at the distributions and the %ages = predominantly European texts.
For what it’s worth, it’s not very hard to search in Arabic: widespread use of Unicode has made that simple. If you’re on a modern computer system, you can type in Arabic without too much trouble anywhere. The problem, I think, would be OCRing Arabic manuscripts: there’s much more use of calligraphic Arabic than with any Roman scripts. After scanning, manual entry/cleanup of the text would take more time.
The Times Literary Supplement just published an article by Lawrence Venuti nicely summing up the “trade deficit” in culture between the U.S. and non-English speaking countries. This isn’t just a digital problem: it’s a broader cultural issue that deserves further exploration.
It does seem that the question of digitization has been defined, through West-centric media coverage and general myopia, within a transatlantic paradigm. Clearly, the cultural trade defecit is much broader and more dire, and, as you say, Dan, not just a digital problem.
I wonder if countries with weaker intellectual property enforcement are better poised to fix the imbalance through do-it-yourself efforts. I know that in Russia, for example, there is a vast trove of Russian literature, both classic and contemporary, that people have just typed up or scanned by the sweat of their brow. Copyright is not taken too seriously over there, so people just toss up whatever they can get their hands on without much concern. The problem is it’s totally unsystematic — utterly unguided. Metadata standards? Not likely.
What we really need is global insurrection by pirate librarians.
Hmmm, but what you’re asking for is a coordinated insurrection by the pirate librarians, so that they had standards. A tricky proposition–.–.–.
“For what it’s worth, it’s not very hard to search in Arabic: widespread use of Unicode has made that simple. If you’re on a modern computer system, you can type in Arabic without too much trouble anywhere”
Yes, I wish that it was as easy as typing a unicode query. here, for example, is a query for “No” in Arabic – a common enough word. Result: nada. here is one for Harb or War. Result: bupkiss, again.
So, yeah.
If digital imaging of print libraries does not mirror processes of collection development that enabled print libraries then little issues of culture bias are irrelevant. Screen presentation is a processing action that needs only tagged accumulation without regard to collection scope, collection coherency or collection quality. Even the collection items are dissolved into word frequencies, search terms and tagged images.
Perhaps the French are really concerned with democracy. Imaging of print libraries for surrogate access overlays the same issues as electronic voting. The electronic access to print does more than risk a corrupted election. The arbitration, fragmentation, mutation and censorship of knowledge can be invisible to the digital researcher.
And incidentally, the page paced capture of retrospective print will never keep up. While publishers will publish simultaneous paper/screen editions, the advance of public domain works will forever fall beyond transmitted digital text. After the 70 year lapse only the paper version will be left. Oh, and don’t hold your breath on the retrospective capture of the first 1% of the print collections. And there is also the possibility that print is not obsolete.
The advocates enthused with dynamic change in the transmission of knowledge should consider the implications of their stance. In the knowledge transmission business print has already assimilated and been driven by the digital revolution and is positioned for new behaviors of bionic reading beyond the screen.
the myth of universal knowledge 2: hyper-nodes and one-way flows
My post a couple of weeks ago about Jean-Noël Jeanneney’s soon-to-be-released anti-Google polemic sparked a discussion here about the cultural trade deficit and the linguistic diversity (or lack thereof) of digital collections. Around that time, Rüd…
No body is perfect, and you got to start from somewhere. Google and all the participating institutions deserve applause for being the first to “eat a crab”.
In November issue of Smart Libraies newsletter, Tom Peters wrote one article titled “Google Infiltrates Wisconsin, Spain, India,
and News Archives”. It says “…in mid-October, the University of Wisconsin joined the project, and approximately a month prior, Google announced that Complutense University of Madrid had joined too.
Complutense has the second largest collection in Spain and the largest academic collection. In addition to obvious strengths in Spanish-language books, the Complutense collections are also strong in other languages.”
Don’t worry, it’s coming!