Category Archives: google_book_search

google books API

Good news. Google has finally released an API (?) for Google Book Search:

Web developers can use the Books Viewability API to quickly find out a book’s viewability on Google Book Search and, in an automated fashion, embed a link to that book in Google Book Search on their own sites.
As an example of the API in use, check out the Deschutes Public Library in Oregon, which has added a link to “Preview this book at Google” next to the listings in their library catalog. This enables Deschutes readers to preview a book immediately via Google Book Search so that they can then make a better decision about whether they’d like to buy the book, borrow it from a library or whether this book wasn’t really the book they were looking for.

Tim Spalding of Library Thing has some initial comments on limitations:

The GBS API is a big step forward, but there are some technical limitations. Google data loads after the rest of the page, and may not be instant. Because the data loads in your web browser, with no data “passing through” LibraryThing servers, we can’t sort or search by it, and all-library searching is impossible. You can get something like this if you create a Google Books account, which is, of course, the whole point.

(via Peter Brantley)

how to keep google’s books open

Whip-smart law blogger Frank Pasquale works through his evolving views on digital library projects and search engines, proposing a compelling strategy for wringing some public good from the tangle of lawsuits surrounding Google Book Search. It hinges on a more expansive (though absolutely legally precedented) interpretation of fair use that takes the public interest and not just market factors into account. Recommended reading. (Thanks, Siva!)

au courant

Paul Courant is the University Librarian at the University of Michigan as well as a professor of economics. And he now has a blog. He leads off with a response to critics (including Brewster Kahle and Siva Vaidhyanathan) of Michigan’s book digitization partnership with Google. Siva responds back on Googlization of Everything. Great to see a university librarian entering the public debate in this way.

“digitization and its discontents”

Anthony Grafton’s New Yorker piece “Future Reading” paints a forbidding picture of the global digital library currently in formation on public and private fronts around the world (Google et al.). The following quote sums it up well – ?a refreshing counterpoint to the millenarian hype we so often hear w/r/t mass digitization:

The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we’ll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.

Grafton begins and ends in a nostalgic tone, with a paean to the New York Public Library and the critic Alfred Kazin: the poor son of immigrants, City College-educated, who researched his seminal study of American literature On Native Grounds almost entirely with materials freely available at the NYPL. Clearly, Grafton is a believer in the civic ideal of the public library – ?a reservoir of knowledge, free to all – ?and this animates his critique of the balkanized digital landscape of search engines and commercial databases. Given where he appears to stand, I wish he could have taken a stab at what a digital public library might look like, and what sorts of technical, social, political and economic reorganization might be required to build it. Obviously, these are questions that would have required their own article, but it would have been valuable for Grafton, whose piece is one of those occasional journalistic events that moves the issue of digitization and the future of libraries out of the specialist realm into the general consciousness, to have connected the threads. Instead Grafton ends what is overall a valuable and intelligent article with a retreat into print fetishism – ?”crowded public rooms where the sunlight gleams on varnished tables….millions of dusty, crumbling, smelly, irreplaceable documents and books” – ?which, while evocative, obscures more than it illuminates.
Incidentally, those questions are precisely what was discussed at our Really Modern Library meetings last month. We’re still compiling our notes but expect a report soon.

googlesoft gets the brush-off

This is welcome. Several leading American research libraries including the Boston Public and the Smithsonian have said no thanks to Google and Microsoft book digitization deals, opting instead for the more costly but less restrictive Open Content Alliance/Internet Archive program. The NY Times reports, and explains how private foundations like Sloan are funding some of the OCA partnerships.

siva podcast on the googlization of libraries

We’re just a couple of days away from launching what promises to be one of our most important projects to date, The Googlization of Everything, a weblog where Siva Vaidhyanathan (who’s a fellow here) will publicly develop his new book, a major critical examination of the Google behemoth. As an appetizer, check out this First Monday podcast conversation with Siva on the subject of Google’s book activities (mp3, transcript).
An excerpt:

Q: So what’s the alternative? Who are the major players, what are the major policy points?
SIVA: I think this is an important enough project where we need to have a nationwide effort. We have to have a publicly funded effort. Guided, perhaps led by the Library of Congress, certainly a consortium of public university libraries could do just as well to do it.
We’re willing to do these sorts of big projects in the sciences. Look at how individual states are rallying billions of dollars to fund stem cell research right now. Look at the ways the United States government, the French government, the Japanese government rallied billions of dollars for the Human Genome Project out of concern that all that essential information was going to be privatized and served in an inefficient and unwieldy way.
So those are the models that I would like to see us pursue. What saddens me about Google’s initiative, is that it’s let so many people off the hook. Essentially we’ve seen so many people say, “Great now we don’t have to do the digital library projects we were planning to do.” And many of these libraries involved in the Google project were in the process of producing their own digital libraries. We don’t have to do that any more because Google will do it for us. We don’t have to worry about things like quality because Google will take care of the quantity.
And so what I would like to see? I would like to see all the major public universities, public research universities, in the country gather together and raise the money or persuade Congress to deliver the money to do this sort of thing because it’s in the public interest, not because it’s in Google’s interest. If it really is this important we should be able to mount a public campaign, a set of arguments and convince the people with the purse strings that this should be done right.

e-book developments at amazon, google (and rambly thoughts thereon)

The NY Times reported yesterday that the Kindle, Amazon’s much speculated-about e-book reading device, is due out next month. No one’s seen it yet and Amazon has been tight-lipped about specs, but it presumably has an e-ink screen, a small keyboard and scroll wheel, and most significantly, wireless connectivity. This of course means that Amazon will have a direct pipeline between its store and its device, giving readers access an electronic library (and the Web) while on the go. If they’d just come down a bit on the price (the Times says it’ll run between four and five hundred bucks), I can actually see this gaining more traction than past e-book devices, though I’m still not convinced by the idea of a dedicated book reader, especially when smart phones are edging ever closer toward being a credible reading environment. A big part of the problem with e-readers to date has been the missing internet connection and the lack of a good store. The wireless capability of the Kindle, coupled with a greater range of digital titles (not to mention news and blog feeds and other Web content) and the sophisticated browsing mechanisms of the Amazon library could add up to the first more-than-abortive entry into the e-book business. But it still strikes me as transitional – ?a red herring in the larger plot.
A big minus is that the Kindle uses a proprietary file format (based on Mobipocket), meaning that readers get locked into the Amazon system, much as iPod users got shackled to iTunes (before they started moving away from DRM). Of course this means that folks who bought the cheaper (and from what I can tell, inferior) Sony Reader won’t be able to read Amazon e-books.
But blech… enough about ebook readers. The Times also reports (though does little to differentiate between the two rather dissimilar bits of news) on Google’s plans to begin selling full online access to certain titles in Book Search. Works scanned from library collections, still the bone of contention in two major lawsuits, won’t be included here. Only titles formally sanctioned through publisher deals. The implications here are rather different from the Amazon news since Google has no disclosed plans for developing its own reading hardware. The online access model seems to be geared more as a reference and research tool -? a powerful supplement to print reading.
But project forward a few years… this could develop into a huge money-maker for Google: paid access (licensed through publishers) not only on a per-title basis, but to the whole collection – ?all the world’s books. Royalties could be distributed from subscription revenues in proportion to access. Each time a book is opened, a penny could drop in the cup of that publisher or author. By then a good reading device will almost certainly exist (more likely a next generation iPhone than a Kindle) and people may actually be reading books through this system, directly on the network. Google and Amazon will then in effect be the digital infrastructure for the publishing industry, perhaps even taking on what remains of the print market through on-demand services purveyed through their digital stores. What will publishers then be? Disembodied imprints, free-floating editorial organs, publicity directors…?
Recent attempts to develop their identities online through their own websites seem hopelessly misguided. A publisher’s website is like their office building. Unless you have some direct stake in the industry, there’s little reason to bother know where it is. Readers are interested in books not publishers. They go to a bookseller, on foot or online, and they certainly don’t browse by publisher. Who really pays attention to who publishes the books they read anyway, especially in this corporatized era where the difference between imprints is increasingly cosmetic, like the range of brands, from dish soap to potato chips, under Proctor & Gamble’s aegis? The digital storefront model needs serious rethinking.
The future of distribution channels (Googlezon) is ultimately less interesting than this last question of identity. How will today’s publishers establish and maintain their authority as filterers and curators of the electronic word? Will they learn how to develop and nurture literate communities on the social Web? Will they be able to carry their distinguished imprints into a new terrain that operates under entirely different rules? So far, the legacy publishers have proved unable to grasp the way these things work in the new network culture and in the long run this could mean their downfall as nascent online communities (blog networks, webzines, political groups, activist networks, research portals, social media sites, list-servers, libraries, art collectives) emerge as the new imprints: publishing, filtering and linking in various forms and time signatures (books being only one) to highly activated, focused readerships.
The prospect of atomization here (a million publishing tribes and sub-tribes) is no doubt troubling, but the thought of renewed diversity in publishing after decades of shrinking horizons through corporate consolidation is just as, if not more, exciting. But the question of a mass audience does linger, and perhaps this is how certain of today’s publishers will survive, as the purveyors of mass market fare. But with digital distribution and print on demand, the economies of scale rationale for big publishers’ existence takes a big hit, and with self-publishing services like Amazon CreateSpace and Lulu.com, and the emergence of more accessible authoring tools like Sophie (still a ways away, but coming along), traditional publishers’ services (designing, packaging, distributing) are suddenly less special. What will really be important in a chaotic jumble of niche publishers are the critics, filterers and the context-generating communities that reliably draw attention to the things of value and link them meaningfully to the rest of the network. These can be big companies or light-weight garage operations that work on the back of third-party infrastructure like Google, Amazon, YouTube or whatever else. These will be the new publishers, or perhaps its more accurate to say, since publishing is now so trivial an act, the new editors.
Of course social filtering and tastemaking is what’s been happening on the Web for years, but over time it could actually supplant the publishing establishment as we currently know it, and not just the distribution channels, but the real heart of things: the imprimaturs, the filtering, the building of community. And I would guess that even as the digital business models sort themselves out (and it’s worth keeping an eye on interesting experiments like Content Syndicate, covered here yesterday, and on subscription and ad-based models), that there will be a great deal of free content flying around, publishers having finally come to realize (or having gone extinct with their old conceits) that controlling content is a lost cause and out of synch with the way info naturally circulates on the net. Increasingly it will be the filtering, curating, archiving, linking, commenting and community-building -? in other words, the network around the content -? that will be the thing of value. Expect Amazon and Google (Google, btw, having recently rolled out a bunch of impressive new social tools for Book Search, about which more soon) to move into this area in a big way.

“the bookish character of books”: how google’s romanticism falls short

tristramgbs.gif
Check out, if you haven’t already, Paul Duguid’s witty and incisive exposé of the pitfalls of searching for Tristram Shandy in Google Book Search, an exercise which puts many of the inadequacies of the world’s leading digitization program into relief. By Duguid’s own admission, Lawrence Sterne’s legendary experimental novel is an idiosyncratic choice, but its many typographic and structural oddities make it a particularly useful lens through which to examine the challenges of migrating books successfully to the digital domain. This follows a similar examination Duguid carried out last year with the same text in Project Gutenberg, an experience which he said revealed the limitations of peer production in generating high quality digital editions (also see Dan’s own take on this in an older if:book post). This study focuses on the problems of inheritance as a mode of quality assurance, in this case the bequeathing of large authoritative collections by elite institutions to the Google digitization enterprise. Does simply digitizing these – ?books, imprimaturs and all – ?automatically result in an authoritative bibliographic resource?
Duguid’s suggests not. The process of migrating analog works to the digital environment in a way that respects the orginals but fully integrates them into the networked world is trickier than simply scanning and dumping into a database. The Shandy study shows in detail how Google’s ambition to organizing the world’s books and making them universally accessible and useful (to slightly adapt Google’s mission statement) is being carried out in a hasty, slipshod manner, leading to a serious deficit in quality in what could eventually become, for better or worse, the world’s library. Duguid is hardly the first to point this out, but the intense focus of his case study is valuable and serves as a useful counterpoint to the technoromantic visions of Google boosters such as Kevin Kelly, who predict a new electronic book culture liberated by search engines in which readers are free to find, remix and recombine texts in various ways. While this networked bibliotopia sounds attractive, it’s conceived primarily from the standpoint of technology and not well grounded in the particulars of books. What works as snappy Web2.0 buzz doesn’t necessarily hold up in practice.
As is so often the case, the devil is in the details, and it is precisely the details that Google seems to have overlooked, or rather sprinted past. Sloppy scanning and the blithe discarding of organizational and metadata schemes meticulously devised through centuries of librarianship, might indeed make the books “universally accessible” (or close to that) but the “and useful” part of the equation could go unrealized. As we build the future, it’s worth pondering what parts of the past we want to hold on to. It’s going to have to be a slower and more painstaking a process than Google (and, ironically, the partner libraries who have rushed headlong into these deals) might be prepared to undertake. Duguid:

The Google Books Project is no doubt an important, in many ways invaluable, project. It is also, on the brief evidence given here, a highly problematic one. Relying on the power of its search tools, Google has ignored elemental metadata, such as volume numbers. The quality of its scanning (and so we may presume its searching) is at times completely inadequate. The editions offered (by search or by sale) are, at best, regrettable. Curiously, this suggests to me that it may be Google’s technicians, and not librarians, who are the great romanticisers of the book. Google Books takes books as a storehouse of wisdom to be opened up with new tools. They fail to see what librarians know: books can be obtuse, obdurate, even obnoxious things. As a group, they don’t submit equally to a standard shelf, a standard scanner, or a standard ontology. Nor are their constraints overcome by scraping the text and developing search algorithms. Such strategies can undoubtedly be helpful, but in trying to do away with fairly simple constraints (like volumes), these strategies underestimate how a book’s rigidities are often simultaneously resources deeply implicated in the ways in which authors and publishers sought to create the content, meaning, and significance that Google now seeks to liberate. Even with some of the best search and scanning technology in the world behind you, it is unwise to ignore the bookish character of books. More generally, transferring any complex communicative artifacts between generations of technology is always likely to be more problematic than automatic.

Also take a look at Peter Brantley’s thoughts on Duguid:

Ultimately, whether or not Google Book Search is a useful tool will hinge in no small part on the ability of its engineers to provoke among themselves a more thorough, and less alchemic, appreciation for the materials they are attempting to transmute from paper to gold.

cornell joins google book search

…offering up to 500,000 items for digitization. From the Cornell library site:

Cornell is the 27th institution to join the Google Book Search Library Project, which digitizes books from major libraries and makes it possible for Internet users to search their collections online. Over the next six years, Cornell will provide Google with public domain and copyrighted holdings from its collections. If a work has no copyright restrictions, the full text will be available for online viewing. For books protected by copyright, users will just get the basic background (such as the book’s title and the author’s name), at most a few lines of text related to their search and information about where they can buy or borrow a book. Cornell University Library will work with Google to choose materials that complement the contributions of the project’s other partners. In addition to making the materials available through its online search service, Google will also provide Cornell with a digital copy of all the materials scanned, which will eventually be incorporated into the university’s own digital library.

the open library

openLibrary.jpg A little while back I was musing on the possibility of a People’s Card Catalog, a public access clearinghouse of information on all the world’s books to rival Google’s gated preserve. Well thanks to the Internet Archive and its offshoot the Open Content Alliance, it looks like we might now have it – ?or at least the initial building blocks. On Monday they launched a demo version of the Open Library, a grand project that aims to build a universally accessible and publicly editable directory of all books: one wiki page per book, integrating publisher and library catalogs, metadata, reader reviews, links to retailers and relevant Web content, and a menu of editions in multiple formats, both digital and print.

Imagine a library that collected all the world’s information about all the world’s books and made it available for everyone to view and update. We’re building that library.

The official opening of Open Library isn’t scheduled till October, but they’ve put out the demo now to prove this is more than vaporware and to solicit feedback and rally support. If all goes well, it’s conceivable that this could become the main destination on the Web for people looking for information in and about books: a Wikipedia for libraries. On presentation of public domain texts, they already have Google beat, even with recent upgrades to the GBS system including a plain text viewing option. The Open Library provides TXT, PDF, DjVu (a high-res visual document browser), and its own custom-built Book Viewer tool, a digital page-flip interface that presents scanned public domain books in facing pages that the reader can leaf through, search and (eventually) magnify.
Page turning interfaces have been something of a fad recently, appearing first in the British Library’s Turning the Pages manuscript preservation program (specifically cited as inspiration for the OL Book Viewer) and later proliferating across all manner of digital magazines, comics and brochures (often through companies that you can pay to convert a PDF into a sexy virtual object complete with drag-able page corners that writhe when tickled with a mouse, and a paper-like rustling sound every time a page is turned).
This sort of reenactment of paper functionality is perhaps too literal, opting for imitation rather than innovation, but it does offer some advantages. Having a fixed frame for reading is a relief in the constantly scrolling space of the Web browser, and there are some decent navigation tools that gesture toward the ways we browse paper. To either side of the open area of a book are thin vertical lines denoting the edges of the surrounding pages. Dragging the mouse over the edges brings up scrolling page numbers in a small pop-up. Clicking on any of these takes you quickly and directly to that part of the book. Searching is also neat. Type a query and the book is suddenly interleaved with yellow tabs, with keywords highlighted on the page, like so:
openlibraryexample.jpg
But nice as this looks, functionality is sacrificed for the sake of fetishism. Sticky tabs are certainly a cool feature, but not when they’re at the expense of a straightforward list of search returns showing keywords in their sentence context. These sorts of references to the feel and functionality of the paper book are no doubt comforting to readers stepping tentatively into the digital library, but there’s something that feels disjointed about reading this way: that this is a representation of a book but not a book itself. It is a book avatar. I’ve never understood the appeal of those Second Life libraries where you must guide your virtual self to a virtual shelf, take hold of the virtual book, and then open it up on a virtual table. This strikes me as a failure of imagination, not to mention tedious. Each action is in a sense done twice: you operate a browser within which you operate a book; you move the hand that moves the hand that moves the page. Is this perhaps one too many layers of mediation to actually be able to process the book’s contents? Don’t get me wrong, the Book Viewer and everything the Open Library is doing is a laudable start (cause for celebration in fact), but in the long run we need interfaces that deal with texts as native digital objects while respecting the originals.
What may be more interesting than any of the technology previews is a longish development document outlining ambitious plans for building the Open Library user interface. This covers everything from metadata standards and wiki templates to tagging and OCR proofreading to search and browsing strategies, plus a well thought-out list of user scenarios. Clearly, they’re thinking very hard about every conceivable element of this project, including the sorts of things we frequently focus on here such as the networked aspects of texts. Acolytes of Ted Nelson will be excited to learn that a transclusion feature is in the works: a tool for embedding passages from texts into other texts that automatically track back to the source (hypertext copy-and-pasting). They’re also thinking about collaborative filtering tools like shared annotations, bookmarking and user-defined collections. All very very good, but it will take time.
Building an open source library catalog is a mammoth undertaking and will rely on millions of hours of volunteer labor, and like Wikipedia it has its fair share of built-in contradictions. Jessamyn West of librarian.net put it succinctly:

It’s a weird juxtaposition, the idea of authority and the idea of a collaborative project that anyone can work on and modify.

But the only realistic alternative may well be the library that Google is building, a proprietary database full of low-quality digital copies, a semi-accessible public domain prohibitively difficult to use or repurpose outside the Google reading room, a balkanized landscape of partner libraries and institutions left in its wake, each clutching their small slice of the digitized pie while the whole belongs only to Google, all of it geared ultimately not to readers, researchers and citizens but to consumers. Construed more broadly to include not just books but web pages, videos, images, maps etc., the Google library is a place built by us but not owned by us. We create and upload much of the content, we hand-make the links and run the search queries that program the Google brain. But all of this is captured and funneled into Google dollars and AdSense. If passive labor can build something so powerful, what might active, voluntary labor be able to achieve? Open Library aims to find out.