librarians, hold google accountable

I’m quite disappointed by this op-ed on Google’s library intiative in Tuesday’s Washington Post. It comes from Richard Ekman, president of the Council of Independent Colleges, which represents 570 independent colleges and universities in the US (and a few abroad). Generally, these are mid-tier schools — not the elite powerhouses Google has partnered with in its digitization efforts — and so, being neither a publisher, nor a direct representative of one of the cooperating libraries, I expected Ekman might take a more measured approach to this issue, which usually elicits either ecstatic support or vociferous opposition. Alas, no.

assumption library.jpg
Emmanuel d’Alzon Library, Assumption College, Worcester MA

To the opposition, namely, the publishing industry, Ekman offers the usual rationale: Google, by digitizing the collections of six of the english-speaking world’s leading libraries (and, presumably, more are to follow) is doing humanity a great service, while still fundamentally respecting copyrights — so let’s not stand in its way. With Google, however, and with his own peers in education, he is less exacting.

The nation’s colleges and universities should support Google’s controversial project to digitize great libraries and offer books online. It has the potential to do a lot of good for higher education in this country.

Now, I’ve poked around a bit and located the agreement between Google and the U. of Michigan (freely available online), which affords a keyhole view onto these grand bargains. Basically, Google makes scans of U. of M.’s books, giving them images and optical character recognition files (the texts gleaned from the scans) for use within their library system, keeping the same for its own web services. In other words, both sides get a copy, both sides win.
If you’re not Michigan or Google, though, the benefits are less clear. Sure, it’s great that books now come up in web searches, and there’s plenty of good browsing to be done (and the public domain texts, available in full, are a real asset). But we’re in trouble if this is the research tool that is to replace, by force of market and by force of users’ habits, online library catalogues. That’s because no sane librarian would outsource their profession to an unaccountable private entity that refuses to disclose the workings of its system — in other words, how does Google’s book algorithm work, how are the search results ranked? And yet so many librarians are behind this plan. Am I to conclude that they’ve all gone insane? Or are they just so anxious about the pace of technological change, driven to distraction by fears of obsolescence and diminishing reach, that they are willing to throw their support uncritically behind the company, who, like a frontier huckster, promises miracle cures and grand visions of universal knowledge?

naropa library.jpg
Allen Ginsberg Library, Naropa University, Boulder CO

We may be resigned to the steady takeover of college bookstores around the country by Barnes and Noble, but how do we feel about a Barnes and Noble-like entity taking over our library systems? Because that is essentially what is happening. We ought to consider the Google library pact as the latest chapter in a recent history of consolidation and conglomeratization in publishing, which, for the past few decades (probably longer, I need to look into this further) has been creeping insidiously into our institutions of higher learning. When Google struck its latest deal with the University of California, and its more than 100 libraries, it made headlines in the technology and education sections of newspapers, but it might just as well have appeared in the business pages under mergers and acquisitions.
So what? you say. Why shouldn’t leaders in technology and education seek each other out and forge mutually beneficial relationships, relationships that might yield substantial benefits for large numbers of people? Okay. But we have to consider how these deals among titans will remap the information landscape for the rest of us. There is a prevailing attitude today, evidenced by the simplistic public debate around this issue, that one must accept technological advances on the terms set by those making the advances. To question Google (and its collaborators) means being labeled reactionary, a dinosaur, or technophobic. But this is silly. Criticizing Google does not mean I am against digital libraries. To the contrary, I am wholeheartedly in favor of digital libraries, just the right kind of digital libraries.
What good is Google’s project if it does little more than enhance the world’s elite libraries and give Google the competitive edge in the search wars (not to mention positioning them in future ebook and print-on-demand markets)? Not just our little institute, but larger interest groups like the CIC ought to be voices of caution and moderation, celebrating these technological breakthroughs, but at the same time demanding that Google Book Search be more than a cushy quid pro quo between the powerful, with trickle-down benefits that are dubious at best. They should demand commitments from the big libraries to spread the digital wealth through cooperative web services, and from Google to abide by certain standards in its own web services, so that smaller librarians in smaller ponds (and the users they represent) can trust these fantastic and seductive new resources. But Ekman, who represents 570 of these smaller ponds, doesn’t raise any of these questions. He just joins the chorus of approval.

obelin library.jpg
Main Library, Seeley G. Mudd Center, Oberlin College, Oberlin OH

What’s frustrating is that the partner libraries themselves are in the best position to make demands. After all, they have the books that Google wants, so they could easily set more stringent guidelines for how these resources are to be redeployed. But why should they be so magnanimous? Why should they demand that the wealth be shared among all institutions? If every student can access Harvard’s books with the click of a mouse, than what makes Harvard Harvard? Or Stanford Stanford?
Enlightened self-interest goes only so far. And so I repeat, that’s why people like Ekman, and organizations like the CIC, should be applying pressure to the Harvards and Stanfords, as should organizations like the Digital Library Federation, which the Michigan-Google contract mentions as a possible beneficiary, through “cooperative web services,” of the Google scanning. As stipulated in that section (4.4.2), however, any sharing with the DLF is left to Michigan’s “sole discretion.” Here, then, is a pressure point! And I’m sure there are others that a more skilled reader of such documents could locate. But a quick Google search (acceptable levels of irony) of “Digital Library Federation AND Google” yields nothing that even hints at any negotiations to this effect. Please, someone set me straight, I would love to be proved wrong.
Google, a private company, is in the process of annexing a major province of public knowledge, and we are allowing it to do so unchallenged. To call the publishers’ legal challenge a real challenge, is to misidentify what really is at stake. Years from now, when Google, or something like it, exerts unimaginable influence over every aspect of our informated lives, we might look back on these skirmishes as the fatal turning point. So that’s why I turn to the librarians. Raise a ruckus.
UPDATE (8/25): The University of California-Google contract has just been released. See my post on this.

13 thoughts on “librarians, hold google accountable

  1. bowerbird

    ben said:
    > That’s because no sane librarian would
    > outsource their profession to an unaccountable
    > private entity that refuses to disclose the workings
    > of its system — in other words, how does Google’s
    > book algorithm work, how are the search results ranked?
    ben, ben, ben, you used to be so right-on; what happened?
    first of all, if you dig a little bit, you will find that _many_
    librarian decision-makers have indeed outsourced lots of
    their operations to “unaccountable private entities” that
    not only refuse to disclose the workings of their systems,
    but which deliver systems that don’t really work too well.
    second, do you really expect google to reveal its methods?
    that’s so ridiculous it’s ludicrous. (sorry to be so frank.)
    i’m all in favor of librarians formulating search tools using
    an open-source methodology that’s fully transparent to all.
    in the absence of that, though, let’s not refuse everything else.
    and maybe i’m naive, but i believe the university of michigan
    (and the other libraries) will make their page-scans available
    to the general public, certainly for the public-domain books.
    after all, isn’t _availability_ exactly what librarians do?
    indeed, if i remember correctly, the official from the n.y.p.l.
    has already gone on-record as being committed to doing that.
    > Or are they just so anxious about the pace of technological change,
    > driven to distraction by fears of obsolescence and diminishing reach,
    > that they are willing to throw their support uncritically behind the company,
    > who, like a frontier huckster, promises miracle cures and grand visions
    > of universal knowledge?
    boy, the hyperbole is getting pretty deep in here.
    the university of michigan was working its own plan
    to digitize its library. they expected it to take 900 years.
    no, that’s not a typo. nine hundred years. 900. count ’em.
    dunno about you, but to me, that sounds like a long time.
    google offered to do it for them, for free, in about 5-10 years.
    to my mind, it would have been (completely) insane to say no.
    -bowerbird

  2. Gary Frost

    Ben,
    Magnificent stance! You have projected the first responsible commentary on the role of research libraries in the otherwize highly polarized contentions between paper based and screen based reading. Yes, the libraries will decide this thing and democratic, representation governance is an innocent by-stander, either to survive or be extinguished depending on the responsible or irresponsible transmission of knowledge.
    Your position is a latent manefesto worthy of those interested in the future of the book!
    At least three foundations define the research library. These are the integrity of a conceptual work conveyed in book format, an academic classification of knowledge and professional assistance to scholarly research. All three of these foundations are now being dissolved and diluted by a delivery technology that should only enhance them. Whats gone wrong?
    Most media crazes of the recent past gravitated to non-library markets, but not this one. Dark ages once fourished outside of libraries, this one may be nurtured from within. The progressive and enlightened transmission of knowledge has distinguished librarians, but not lately.
    This time its going to take some post-digital direction such as yours.

  3. Michael Becker

    I have to agree with bowerbird and with the benefits that Google’s scanning service offers. Wishing that a purely independent, academic source could maintain our libraries is nice, but economic reality sets in. Google has the time and the initiative to digitize the books in a hurry, and we should be thankful for that.
    There is, of course, concern over how Google’s search algorhythms work, and there should be. But the educated scholar who is using Google’s Book Search knows how to find what he or she is looking for, algorhythms be damned. Rather than worry about the mysterious inner working of Google’s code, librarians should become intimately familiar with how to massage the search results, then pass those techniques on to library users.

  4. Gary Frost

    Don’t believe Google when they say they will scan, process and post out of print books in less than 900 years. They have done 100,000 so far and will probably give up before they reach a half million. The reason is that Google is not that interested in accessing books. Goggle is interested in assuming the role of research libraries. To do that in the screen based reading mode actually requires very little scope or coverage.
    Meanwhile they will have further divided the pie of resources that can be allocated to the acquision, processing and accessing of print collections. And the libraries will be stuck with the expense of preserving the screen copies. From the library perspective digital libries are not free.
    But this will not the first time a massive reformatting agenda for print has proved costly. Remember microfilming? At least microfilm is relatively easy to preserve. The delivery of screen books will be the first time that a massive reformatting agenda is misrepresented as free.

  5. Jeff Ubois

    Good post. Digitization is one of the most exciting and challenging and promising developments in the library world, so to restrict librarians’ abilities to discuss digitization with each other, as Google’s NDAs require, sets a terrible precedent.
    Free discussion of pooling agreements, quality measurements, and the development of new finding aids is critical. Yet leading libraries are voluntarily opting out of that dialog by signing secret deals. Since when has it become chic and trendy for libraries, like UC’s, to require a FOIA request before sharing information about the disposition of assets purchased with public funds? That Microsoft, of all companies, is one of the good guys in book digitization speaks volumes about the quality of the deals Google is making.
    Bowerbird, it’s not necessary to require Google to disclose its operations. It is necessary that UC be free to cooperate with other libraries, and to use the scans it receives back from Google as it sees fit. Off campus sharing of Umich’s scans is contingent on approval from Google. The 900 year estimate from Umich should be seen in light of rapid and predictable technical progress; you know in even ten years image processing will be faster, and disk storage will be cheaper (even if manual scanning isn’t.) The libraries should resist being commodified, and recognize that their negotiating position is not so weak that they should be grateful for scraps from Google’s table.

  6. Joseph J. Esposito

    While we are discussing the various “responsibilities” libaries have, we should bear in mind that it is arguable that Google does not have the right to scan copyrighted material (this is what the current litigation is about); it is probable that libraries do not have the right to make the content of copyrighted books available to their institutions without a license, though this no doubt will end up in court; and it is highly improbable that libraries have the right to make this material available to other institutions or over the open Web. Thus the conversation here is taking the wrong turn. The question is not what is fair and whom it benefits. The question is who owns it. These are property rights. Whatever our views of the propriety of property rights (see John Willinsky on scholarly materials as a public good), it is as property rights that these matters will be resolved–no doubt in the courts.

  7. bowerbird

    gary said:
    > Don’t believe Google when they say they will
    > scan, process and post out of print books
    > in less than 900 years. They have done 100,000
    > so far and will probably give up
    > before they reach a half million.
    you’re wrong.
    ***
    jeff said:
    > The libraries should resist being
    > commodified, and recognize that their
    > negotiating position is not so weak that
    > they should be grateful for
    > scraps from Google’s table.
    “scraps”?
    the quality of the discussion here has
    taken a nose-dive. there are elements
    in the google-michigan agreement that
    could be the subject of serious debate,
    but nobody seems to be mentioning them.
    all i’m hearing is knee-jerk rhetoric…
    -bowerbird

  8. Gary Frost

    Ok, so say that Google can capture, process and post 10 million books in less than 900 years. Let’s say they do 3,000 per day for 365 days per year and finish in about ten years. Does anyone have an idea of where search engines will be in ten years? My guess is that the objectives of retrieval will have advanced far beyond the operational reading efficiency of print books. In fact advanced beyond the need for reference to books themsleves.
    My guess is that Google will lose interest while research libraries will be left to clean up the mess including a generation of crippled scholars.

  9. bowerbird

    clean up the mess? crippled scholars?
    with 10 million books _quite_literally_
    at their fingertips (hey, here’s a clue,
    the book is the content, not the paper),
    the only thing i see that might “cripple”
    our scholars will be the fact that they
    don’t have to wander through the stacks
    looking for research materials any more,
    so their legs fall off for lack of use…
    -bowerbird

  10. gary Frost

    Expressing yourself in sentences and paragraphs rather than phrases and fragments was once understood as a mark of intellectual maturity. Like so many technological innovations — think of TV versus newsprint as a means of conveying accurate, or at least testable, information — PowerPoint in practice represents a regression rather than an advance. Much is lost in the regression. The reasoning that leads from one thought to the next, for instance, disappears in the flash between the end of one PowerPoint slide and the beginning of another.” Andrew Ferguson
    I have not met anyone who does not like to use PowerPoint, but few people who enjoy watching the presentations. PowerPoint format has taken over live presentation in much the same way that Google Book Search will take over book reading. Screen reading presents a string of bullets just like PowerPoint, crippling both assililation and comprehension.
    Let’s just revisit this issue ten years from now. Google has changed the name of this book screen service three times already. Processing to searchable text from innumberable fonts, capturing halftone illustrations without scanning artifacts, integrating 2 bit / 8 bit and color in single servable volumes and scaling foldouts are just a few of the challenges of screen drawing 19th century print. Remember, the book was the first multimedia format.

  11. bowerbird

    gary said:
    > Screen reading presents
    > a string of bullets just like PowerPoint,
    > crippling both assililation and comprehension.
    i do believe that some people have difficulty
    when it comes to reading and writing on-screen.
    especially those who grew up in a different age.
    but i believe that people who grow up in this age
    will have no difficulty using screens just as paper.
    i know i don’t have problems with it. (for instance,
    i noticed right away you misspelled “assimilation”.)
    and i’m no youngster either. heck, one reason that
    i prefer to read on-screen is because i can up-size
    the text so it’s easier to read for my old-man eyes.
    now, i _do_ have trouble understanding many of
    your points, gary; but i think that’s likely because
    you’re not doing a good job of explaining them.
    and i suspect that’s probably because there just
    isn’t that much truth-value in what you believe.
    but i’m happy to put this on hold for 10 years.
    meanwhile, as cory says, as more time goes by,
    more people are reading more and more material
    on-screen, and less and less off of paper…
    not that i’m rejecting the use of paper, mind you.
    for a lot of purposes, paper is extremely handy,
    and will remain so for a long time into the future.
    thus, my crystal ball says that the approach that
    maximizes the synergy between paper and screen
    will be the one that has the most long-term utility.
    that’s where my research interests have been for the
    last few years — fine-tuning that synergy.
    -bowerbird

  12. Gary Frost

    …er, Ok, mention three research methods that exemplify the synergy between paper and screen. Truely interdependent actions.
    I can name three. Say you were interested in early books printed in Peru. (and you were in Peru) You could find a whole text from U Mich an obsure 19th c. description of the collectionat the 16th c. Recoleta convent library in in Arequipa as you were attempting to identify authentic Peruvian imprints on location. Lets call this the hybrid navigator action. Then there is the reverse of this. Let’s say we read about sender oriented contrasted with recipient oriented communication and attempt to find examples in blog postings. We could call this cross mode confirmation. Then there is the point that you bring up about younger readers and older readers. (I happen to think this a timeless pair, one audio/visual directed and the other symbol/text directed) We could ethnographically survey across modes.
    Of course I am just suggesting. What is the real taxonomy here? What are the actions of synergy between paper and screen? We are in complete accord; the synergy must occur and be sustained in the future between paper and screen.

  13. bowerbird

    gary said:
    > What are the actions of synergy
    > between paper and screen?
    are 10 years up already? :+)
    time flies when you’re having fun…
    by developing synergy between them,
    what i mean is that our electronic-books
    behave in similar ways to our paper-books,
    and in addition have all the qualities that
    accrue with digital objects in cyberspace,
    such as virtually cost-free reproduction
    and instant availability world-wide…
    i also mean a total fluidity between the
    two spaces, such that any e-book can be
    quickly and easily turned into a paper-book,
    and vice-versa, and the two are _identical_…
    (or at least can be made to _appear_ identical,
    which — since the paper-version is frozen —
    puts the entire burden on the e-version, which
    fortunately it is more than capable of handling,
    at least if we build the infrastructure correctly.)
    my current stance for “identical appearance” is
    against most thinking people held earlier that
    e-books are different creatures from p-books,
    and shouldn’t be “shackled by their traditions”.
    there is often some wisdom in those traditions,
    and it’s wise not to throw that out. and perhaps
    more importantly, if we make digital and physical
    work as similarly as possible, and make transition
    between the two effortless, then we can obtain the
    benefits of both, which ends up being a good thing.
    for a very good approximation to what i’m saying,
    see the “digital reprints” jose menendez has done:
    > http://www.ibiblio.org/ebooks/Mabie/Books_Culture.pdf
    > http://www.ibiblio.org/ebooks/Cather/Antonia/Antonia.pdf
    > http://www.ibiblio.org/ebooks/Einstein/Einstein_Relativity.pdf
    (the einstein book is especially fun, because jose has inserted
    corrections to some mistakes made by the original publisher.)
    the interesting thing you need to know about the “reprints” is
    a click on any page-number (in mabie and einstein) summons
    the _scan_ of that page — from google — for your reference,
    with the .pdf page side-by-side with the scan in your browser,
    this means there is a very tight coupling between the versions.
    it also means you can verify the two versions are highly similar
    (right down to the end-line hyphenated words being retained),
    and if you pull up a few pages you’ll see yourself it is true,
    except, of course, the .pdf doesn’t have any “broken type”, or
    crooked pages, or pictures of the scanner’s thumb in a corner.
    this gives some clue what i mean here by “identical appearance”.
    (pretty much self-explanatory, and means just what it says…)
    put one of these digital reprints on a machine like the “iliad”
    (assuming the iliad worked as well as we want such a machine
    to work, instead of the poor imitation which it currently is) —
    a form-factor that isn’t all that different from a paperback —
    and the synergy between the original p-book and the e-book is
    so tight that most people would consider them as “the same”…
    the difference, of course, is that the digital version can be
    distributed at virtually zero-cost, to anywhere on the planet.
    no trees wasted in printing. no oil wasted in transportation.
    when people can have any page of any book instantly delivered
    to a paperback-sized slim piece of plastic they carry all day,
    they won’t even _want_ a paper-book. “it’s too much to carry.”
    and when you tell them the digital version costs twenty cents,
    while the physical version costs twenty dollars, well, they’ll
    look at you like you’re crazy for even positing the “choice”…
    and anytime they want hard-copy of a specific page, they’ll be
    able to print it out, or just lay the iliad on a xerox machine.
    likewise, if they’ve got a piece of paper they need to “save”,
    they’ll digitize it (perhaps by laying it on a xerox machine,
    or by shooting it with their digital camera) and it will be
    shuttled to their iliad via a wire (usb) or wireless (wifi).
    so that’s what i’m talking about, a total seamlessness between
    the paper world and the digital world, with an ebb and a flow.
    indeed, at some point in time, the line between p-version and
    e-version will be so slippery that the “distinction” between
    the two will become erased into nothingness in people’s minds.
    that’s why any thinking that tears apart these two modalities
    and attempts to build a wall between them is fated to failure.
    back to the reprints, i discuss jose’s work at some length here:
    > http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-02-28,2
    jose’s “digital reprints” do a _very_ good job
    of replicating the paper-book. unfortunately,
    however, because they are .pdf, they fall down
    on a number of elements of being a good e-book,
    such as the ability for the text copied from it
    to retain its nice formatting, so that it can be
    “reused” and “remixed”… .pdf stinks at that…
    but even in its current state, as a .pdf, we can
    easily see how such reprints will be useful to us;
    just with the basics of filesize and digital text,
    it is head-and-shoulders above a pagescan .pdf.
    google’s pagescan version of mabie runs 2+ megs,
    and that’s pushing compression as far as we can.
    and remember, it has no text, so can’t be searched.
    jose’s version of the same book — replicating the
    appearance, remember, except with a better look —
    weighs in at less than 1 meg, so it’s half the size.
    and it _does_ have the text, so it could be searched.
    and once we have a better file-format than the .pdf,
    all the digital goodies will be available to us as well,
    including the ability to resize and reflow the text,
    and reformat it, and remix it, and reuse it too…
    and still we will have the congruence with paper.
    -bowerbird

Comments are closed.