Category Archives: google

an opportunity to see Voyager’s Expanded Book series in its original habitat

Jason Ellis, now a prof at CUNY, made this video last spring to provide a detailed look at Voyager’s early efforts with electronic books.

do you remember the first time?

Siva Vaidhyanathan, the Institute’s fellow, is busy writing a book about Google, to be titled The Googlization of Everything. He’s working in public, and right now, he’s interested in hearing stories about how people – that means you! – began to use Google:

Do you remember the first time you used Google? When was it? How did you hear about Google? What was you first impression?
Please use the comments over on The Googlization of Everything to tell me stories.
As Mudbone (Richard Pryor’s character) used to say, “you only remember two times, your first and your last.”

There are a lot of interesting comments there already . . .

kerfluffle at britannica.com

I got a note from someone at Britannica online telling me about a discussion prompted by Clay Shirky’s riposte to Nicolas Carr’s Atlantic article, “Is Google Making Us Stupid?”
The conversation on the Britannica site, and the related posts on John Brockman’s EDGE, remind me as much as anything of the conversational swordplay typical of TV pundits, who are so enamored of their own words that they can barely be bothered to listen to or read each other’s ideas, much less respond sincerely.
(Can it possibly be a coincidence that all the players in this drama are male? Get a grip guys! This is not about scoring points. You’re dealing with issues central to the future of the species and the planet.)
And as long as we’re dealing with missing persons, i was stunned to realize that not one of these media gurus references McLuhan, who as far as i’m concerned, not only asked more profound questions about the effect of media on humans and their society, but provided first-pass answers which we would still do well to heed.
Of the myriad posts and pages that now comprise the Britannica Carr/Shirky discussion, three posts are particular interest.
The first is from the critic Sven Birkerts, whom many people consider conservative. I don’t. Rather, I see Birkerts as the most eloquent voice on behalf of what we are losing as we shed the culture of the Gutenberg age. Birkerts doesn’t entreat us to stop time or throw wrenches in the wheels of change. He’s just asking us to be conscious of what’s good about the present.
Another is from George Dyson who writes in a way that in my worst nightmares i fear is prescient:

Nicholas Carr asks a question that all of us should be asking ourselves:
“What if the cost of machines that think is people who don’t?”
It’s a risk. “The ancestors of oysters and barnacles had heads. Snakes have lost their limbs and ostriches and penguins their power of flight. Man may just as easily lose his intelligence,” warned J. B. S. Haldane in 1928.

The third is a comment by Blair Boland, which appears as a comment to Nicolas Carr’s response to Shirky. Not only does Boland provide a taut history lesson, setting the record straight on the Luddites, but he states a fundamental issue of our time more clearly than anyone else: “who controls technology and for what ends?”

What both critiques share in common and take for granted is a smugly false and typically misleading disparagement of so-called Luddism. The original, much maligned Luddites are commonly dismissed as cranks, or worse still, “murderous thugs” and the “essential fact” of Luddite “complaint” twisted to serve the ends of propagandists for capital. Ned Ludd and his followers were not necessarily opposed to technological ‘change’ or ‘progress’ per se but the social context in which it occurred and the economic consequences it presaged. As Ludd expressed it, “we will never lay down our arms…[’til]the House of Commons passes an act to put down all machinery Hurtful to Commonality”. They realized that these changes were being undertaken undemocratically for the benefit of a narrow class of economic elites. Luddite anxieties were well founded as was their understanding of the implications for the working class in general, even though they couldn’t have foreseen all of the consequences fully. Their protests and resistance was met with the most aggressive and “murderous” suppression by the British government of the day. Thousands of troops were dispatched to put down the rebellion, not only succeeding in ruthlessly exterminating the Luddite uprising but also serving notice to workers in general of the close bonds between the state and industrialists; and the means that could be employed to discipline intractable workers. The dire conditions of the working class in the new “industrial age’ that ensued proved Luddite premonitions largely prophetic. These conditions still exist in many parts of the world. So while it’s fine to fret over the impact of the net on the reading habits of the affluent, the concerns of the Luddites still haven’t gone away. The important principle then as now, is who controls technology and for what ends? Taylor’s time/motion practices further tightened the hold of the owners of production technology over the wage serfs operating that technology, again in a very undemocratic and restrictive way, “hurtful to commonality”. These, as noted, are the same principles that guide much technological development today and are among the most worrisome aspects of its ultimate applications. “And now we’re facing a similar challenge”, to see that the latent democratizing abundance of the net is not “shaped” into the greatest expansion of social control and commercial concentration of power the world has ever known.

google, digitization and archives: despatches from if:book

In discussing with other Institute folks how to go about reviewing four year’s worth of blog posts, I’ve felt torn at times. Should I cherry-pick ‘thinky’ posts that discuss a particular topic in depth, or draw out narratives from strings of posts each of which is not, in itself, a literary gem but which cumulatively form the bedrock of the blog? But I thought about it, and realised that you can’t really have one without the other.
Fair use, digitization, public domain, archiving, the role of libraries and cultural heritage are intricately interconnected. But the name that connects all these issues over the last few years has been Google. The Institute has covered Google’s incursions into digitization of libraries (amongst other things) in a way that has explored many of these issues – and raised questions that are as urgent as ever. Is it okay to privatize vast swathes of our common cultural heritage? What are the privacy issues around technology that tracks online reading? Where now for copyright, fair use and scholarly research?
In-depth coverage of Google and digitization has helped to draw out many of the issues central to this blog. Thus, in drawing forth the narrative of if:book’s Google coverage is, by extension, to watch a political and cultural stance emerging. So in this post I’ve tried to have my cake and eat it – to trace a story, and to give a sense of the depth of thought going into that story’s discussion.
In order to keep things manageable, I’ve kept this post to a largely Google-centric focus. Further reviews covering copyright-related posts, and general discussion of libraries and technology will follow.
2004-5: Google rampages through libraries, annoys Europe, gains rivals
In December 2004, if:book’s first post about Google’s digitization of libraries gave the numbers for the University of Michigan project.
In February 2005, the head of France’s national libraries raised a battle cry against the Anglo-centricity implicit in Google’s plans to digitize libraries. The company’s seemingly relentless advance brought Europe out in force to find ways of forming non-Google coalitions for digitization.
In August, Google halted book scans for a few months to appease publishers angry at encroachments on their copyright. But this was clearly not enough, as in October 2005, Google was sued (again) by a string of publishers for massive copyright infringement. However, undeterred either by European hostility or legal challenges, the same month the company made moves to expand Google Print into Europe. Also in October 2005, Yahoo! launched the Open Content Alliance, which was joined by Microsoft around the same time. Later the same month, a Wired article put the case for authors in favor of Google’s searchable online archive.
In November 2005 Google announced that from here on in Google Print would be known as Google Book Search, as the ‘Print’ reference perhaps struck too close to home for publishers. The same month, Ben savaged Google Print’s ‘public domain’ efforts – then recanted (a little) later that month.
In December 2005 Google’s digitization was still hot news – the Institute did a radio show/podcast with Open Source on the topic, and covered the Google Book Search debate at the American Bar Association. (In fact, most of that month’s posts are dedicated to Google and digitization and are too numerous to do justice to here).
2006: Digitization spreads
By 2006, digitization and digital archives – with attendant debates – are spreading. From January through March, three posts – ‘The book is reading you’ parts 1, 2 and 3 looked at privacy, networked books, fair use, downloading and copyright around Google Book Search. Also in March, a further post discussed Google and Amazon’s incursions into publishing.
In April, the Smithsonian cut a deal with Showtime making the media company a preferential media partner for documentaries using Smithsonian resources. Jesse analyzed the implications for open research.
In June, the Library of Congress and partners launched a project to make vintage newspapers available online. Google Book Search, meanwhile, was tweaked to reassure publishers that the new dedicated search page was not, in fact, a library. The same month, Ben responded thoughtfully in June 2006 to a French book attacking Google, and by extension America, for cultural imperialism. The debate continued with a follow-up post in July.
In August, Google announceddownloadable PDF versions of many of its public-domain books. Then, in August, the publication of Google’s contract with UCAL’s library prompted some debate the same month. In October we reported on Microsoft’s growing book digitization list, and some criticism of the same from Brewster Kahle. The same month, we reported that the Dutch government is pouring millions into a vast public digitization program.
In December, Microsoft launched its (clunkier) version of Google Books, Microsoft Live Book Search.

2007: Google is the environment
In January, former Netscape player Rich Skrenta crowned Google king of the ‘third age of computing’: ‘Google is the environment’, he declared. Meanwhile, having seemingly forgotten 2005’s tussles, the company hosted a publishing conference at the New York Public Library. In February the company signed another digitization deal, this time with Princeton; in August, this institution was joined by Cornell, and the Economist compared Google’s databases to the banking system of the information age. The following month, Siva’s first Monday podcast discussed the Googlization of libraries.
By now, while Google remains a theme, commercial digitization of public-domain archives is a far broader issue. In January, the US National Archives cut a digitization deal with Footnote, effectively paywalling digital access to a slew of public-domain documents; in August, a deal followd with Amazon for commercial distribution of its film archive. The same month, two major audiovisual archiving projects launched.
In May, Ben speculated about whether some ‘People’s Card Catalog’ could be devised to rival Google’s gated archive. The Open Archive launched in July, to mixed reviews – the same month that the ongoing back-and-forth between the Institute and academic Siva Vaidyanathan bore fruit. Siva’s networked writing project, The Googlization Of Everything, was announced (this would be launched in September). Then, in August, we covered an excellent piece by Paul Duguid discussing the shortcomings of Google’s digitization efforts.
In October, several major American libraries refused digitization deals with Google. By November, Google and digitization had found its way into the New Yorker; the same month the Library of Congress put out a call for e-literature links to be archived.

2008: All quiet?
In January we reported that LibraryThing interfaces with the British Library, and in March on the launch of an API for Google Books. Siva’s book found a print publisher the same month.
But if Google coverage has been slighter this year, that’s not to suggest a happy ending to the story. Microsoft abandoned its book scanning project in mid-May of this year, raising questions about the viability of the Open Content Alliance. It would seem as though Skrenta was right. The Googlization of Everything continues, less challenged than ever.

google books API

Good news. Google has finally released an API (?) for Google Book Search:

Web developers can use the Books Viewability API to quickly find out a book’s viewability on Google Book Search and, in an automated fashion, embed a link to that book in Google Book Search on their own sites.
As an example of the API in use, check out the Deschutes Public Library in Oregon, which has added a link to “Preview this book at Google” next to the listings in their library catalog. This enables Deschutes readers to preview a book immediately via Google Book Search so that they can then make a better decision about whether they’d like to buy the book, borrow it from a library or whether this book wasn’t really the book they were looking for.

Tim Spalding of Library Thing has some initial comments on limitations:

The GBS API is a big step forward, but there are some technical limitations. Google data loads after the rest of the page, and may not be instant. Because the data loads in your web browser, with no data “passing through” LibraryThing servers, we can’t sort or search by it, and all-library searching is impossible. You can get something like this if you create a Google Books account, which is, of course, the whole point.

(via Peter Brantley)

a few rough notes on knols

Think you’ve got an authoritative take on a subject? Write up an article, or “knol,” and see how the Web judgeth. If it’s any good, you might even make a buck.

Google’s new encyclopedia will go head to head with Wikipedia in the search rankings, though in format it more resembles other ad-supported, single-author info sources like the About.com or Squidoo. The knol-verse (how the hell do we speak of these things as a whole?) will be a Darwinian writers’ market where the fittest knols rise to the top. Anyone can write one. Google will host it for free. Multiple knols can compete on a single topic. Readers can respond to and evaluate knols through simple community rating tools. Content belongs solely to the author, who can license it in any way he/she chooses (all rights reserved, Creative Commons, etc.). Authors have the option of having contextual ads run to the side, revenues from which are shared with Google. There is no vetting or editorial input from Google whatsoever.
Except… Might not the ads exert their own subtle editorial influence? In this entrepreneurial writers’ fray, will authors craft their knols for AdSense optimization? Will they become, consciously or not, shills for the companies that place the ads (I’m thinking especially of high impact topic areas like health and medicine)? Whatever you may think of Wikipedia, it has a certain integrity in being ad-free. The mission is clear and direct: to build a comprehensive free encyclopedia for the Web. The range of content has no correlation to marketability or revenue potential. It’s simply a big compendium of stuff, the only mention of money being a frank electronic tip jar at the top of each page. The Googlepedia, in contrast, is fundamentally an advertising platform. What will such an encyclopedia look like?
In the official knol announcement, Udi Manber, a VP for engineering at Google, explains the genesis of the project: “The challenge posed to us by Larry, Sergey and Eric was to find a way to help people share their knowledge. This is our main goal.” You can see embedded in this statement all the trademarks of Google’s rhetoric: a certain false humility, the pose of incorruptible geek integrity and above all, a boundless confidence that every problem, no matter how gray and human, has a technological fix. I’m not saying it’s wrong to build a business, nor that Google is lying whenever it talks about anything idealistic, it’s just that time and again Google displays an astonishing lack of self-awareness in the way it frames its services -? a lack that becomes especially obvious whenever the company edges into content creation and hosting. They tend to talk as though they’re building the library of Alexandria or the great Encyclopédie, but really they’re describing an advanced advertising network of Google-exclusive content. We shouldn’t allow these very different things to become as muddled in our heads as they are in theirs. You get a worrisome sense that, like the Bushies, the cheerful software engineers who promote Google’s products on the company’s various blogs truly believe the things they’re saying. That if we can just get the algorithm right, the world can bask in the light of universal knowledge.
The blogosphere has been alive with commentary about the knol situation throughout the weekend. By far the most provocative thing I’ve read so far is by Anil Dash, VP of Six Apart, the company that makes the Movable Type software that runs this blog. Dash calls out this Google self-awareness gap, or as he puts it, its lack of a “theory of mind”:

Theory of mind is that thing that a two-year-old lacks, which makes her think that covering her eyes means you can’t see her. It’s the thing a chimpanzee has, which makes him hide a banana behind his back, only taking bites when the other chimps aren’t looking.
Theory of mind is the awareness that others are aware, and its absence is the weakness that Google doesn’t know it has. This shortcoming exists at a deep cultural level within the organization, and it keeps manifesting itself in the decisions that the company makes about its products and services. The flaw is one that is perpetuated by insularity, and will only be remedied by becoming more open to outside ideas and more aware of how people outside the company think, work and live.

He gives some examples:

Connecting PageRank to economic systems such as AdWords and AdSense corrupted the meaning and value of links by turning them into an economic exchange. Through the turn of the millennium, hyperlinking on the web was a social, aesthetic, and expressive editorial action. When Google introduced its advertising systems at the same time as it began to dominate the economy around search on the web, it transformed a basic form of online communication, without the permission of the web’s users, and without explaining that choice or offering an option to those users.

He compares the knol enterprise with GBS:

Knol shares with Google Book Search the problem of being both indexed by Google and hosted by Google. This presents inherent conflicts in the ranking of content, as well as disincentives for content creators to control the environment in which their content is published. This necessarily disadvantages competing search engines, but more importantly eliminates the ability for content creators to innovate in the area of content presentation or enhancement. Anything that is written in Knol cannot be presented any better than the best thing in Knol. [his emphasis]

And lastly concludes:

An awareness of the fact that Google has never displayed an ability to create the best tools for sharing knowledge would reveal that it is hubris for Google to think they should be a definitive source for hosting that knowledge. If the desire is to increase knowledge sharing, and the methods of compensation that Google controls include traffic/attention and money/advertising, then a more effective system than Knol would be to algorithmically determine the most valuable and well-presented sources of knowledge, identify the identity of authorites using the same journalistic techniques that the Google News team will have to learn, and then reward those sources with increased traffic, attention and/or monetary compensation.

For a long time Google’s goal was to help direct your attention outward. Increasingly we find that they want to hold onto it. Everyone knows that Wikipedia articles place highly in Google search results. Makes sense then that they want to capture some of those clicks and plug them directly into the Google ad network. But already the Web is dominated by a handful of mega sites. I get nervous at the thought that www.google.com could gradually become an internal directory, that Google could become the alpha and omega, not only the start page of the Internet but all the destinations.
It will be interesting to see just how and to what extent knols start creeping up the search results. Presumably, they will be ranked according to the same secret metrics that measure all pages in Google’s index, but given the opacity of their operations, who’s to say that subtle or unconscious rigging won’t occur? Will community ratings factor in search rankings? That would seem to present a huge conflict of interest. Perhaps top-rated knols will be displayed in the sponsored links area at the top of results pages. Or knols could be listed in order of community ranking on a dedicated knol search portal, providing something analogous to the experience of searching within Wikipedia as opposed to finding articles through external search engines. Returning to the theory of mind question, will Google develop enough awareness of how it is perceived and felt by its users to strike the right balance?
One last thing worth considering about the knol -? apart from its being possibly the worst Internet neologism in recent memory -? is its author-centric nature. It’s interesting that in order to compete with Wikipedia Google has consciously not adopted Wikipedia’s model. The basic unit of authorial action in Wikipedia is the edit. Edits by multiple contributors are combined, through a complicated consensus process, into a single amalgamated product. On Google’s encyclopedia the basic unit is the knol. For each knol (god, it’s hard to keep writing that word) there is a one to one correspondence with an individual, identifiable voice. There may be multiple competing knols, and by extension competing voices (you have this on Wikipedia too, but it’s relegated to the discussion pages).
Viewed in this way, Googlepedia is perhaps a more direct rival to Larry Sanger’s Citizendium, which aims to build a more authoritative Wikipedia-type resource under the supervision of vetted experts. Citizendium is a strange, conflicted experiment, a weird cocktail of Internet populism and ivory tower elitism -? and by the look of it, not going anywhere terribly fast. If knols take off, could they be the final nail in the coffin of Sanger’s awkward dream? Bryan Alexander wonders along similar lines.
While not explicitly employing Sanger’s rhetoric of “expert” review, Google seems to be banking on its commitment to attributed solo authorship and its ad-based incentive system to lure good, knowledgeable authors onto the Web, and to build trust among readers through the brand-name credibility of authorial bylines and brandished credentials. Whether this will work remains to be seen. I wonder… whether this system will really produce quality. Whether there are enough checks and balances. Whether the community rating mechanisms will be meaningful and confidence-inspiring. Whether self-appointed experts will seem authoritative in this context or shabby, second-rate and opportunistic. Whether this will have the feeling of an enlightened knowledge project or of sleezy intellectual link farming (or something perfectly useful in between).
The feel of a site -? the values it exudes -? is an important factor though. This is why I like, and in an odd way trust Wikipedia. Trust not always to be correct, but to be transparent and to wear its flaws on its sleeve, and to be working for a higher aim. Google will probably never inspire that kind of trust in me, certainly not while it persists in its dangerous self-delusions.
A lot of unknowns here. Thoughts?

sparkles from the wheel

Walt Whitman’s poem “Sparkles from the Wheel” beautifully captures the pleasure and exhilaration of watching work in progress:

1
WHERE the city’s ceaseless crowd moves on, the live-long day,
Withdrawn, I join a group of children watching – ?I pause aside with them.
By the curb, toward the edge of the flagging,
A knife-grinder works at his wheel, sharpening a great knife;
Bending over, he carefully holds it to the stone – ?by foot and knee,
With measur’d tread, he turns rapidly – ?As he presses with light but firm hand,
Forth issue, then, in copious golden jets,
Sparkles from the wheel.
2
The scene, and all its belongings – ?how they seize and affect me!
The sad, sharp-chinn’d old man, with worn clothes, and broad shoulder-band of leather;
Myself, effusing and fluid – ?a phantom curiously floating – ?now here absorb’d and arrested;
The group, (an unminded point, set in a vast surrounding;)
The attentive, quiet children – ?the loud, proud, restive base of the streets;
The low, hoarse purr of the whirling stone – ?the light-press’d blade,
Diffusing, dropping, sideways-darting, in tiny showers of gold,
Sparkles from the wheel.

I was reminded of this the other day while reading a brief report in Library Journal on Siva’s recent cross-blog argument with Michigan University Librarian Paul Courant about Google book digitization contracts. These sorts of exchanges are not new in themselves, but blogs have made it possible for them to occur much more spontaneously and, in Siva’s case, to put them visibly in the context of a larger intellectual project. It’s a nice snapshot of the sort of moment that can happen along the way when the writing process is made more transparent -? seeing an argument crystallize or a position get clarified. And there’s a special kind of pleasure and exhilaration that comes from reading this way, seeing Siva sharpening his knife -? or argument -? and the rhetorical sparks that fly off the screen. Here’s that Library Journal bit:

Discussion of Google Scan Plan Heats Up on Blogs:
Now this is why we love the Blogosphere. In launching his blog, University of Michigan’s (UM) dean of libraries Paul Courant recently offered a spirited defense of UM’s somewhat controversial scan plan with Google. That post drew quite a few comments, and a direct response from Siva Vaidhyanathan the author, blogger, and University of Virginia professor currently writing the Googlization of Everything online at the Institute for the Future of the Book; that of course drew a response from Courant. The result? A lively and illuminating dialog on Google’s book scanning efforts.

how to keep google’s books open

Whip-smart law blogger Frank Pasquale works through his evolving views on digital library projects and search engines, proposing a compelling strategy for wringing some public good from the tangle of lawsuits surrounding Google Book Search. It hinges on a more expansive (though absolutely legally precedented) interpretation of fair use that takes the public interest and not just market factors into account. Recommended reading. (Thanks, Siva!)

au courant

Paul Courant is the University Librarian at the University of Michigan as well as a professor of economics. And he now has a blog. He leads off with a response to critics (including Brewster Kahle and Siva Vaidhyanathan) of Michigan’s book digitization partnership with Google. Siva responds back on Googlization of Everything. Great to see a university librarian entering the public debate in this way.

“digitization and its discontents”

Anthony Grafton’s New Yorker piece “Future Reading” paints a forbidding picture of the global digital library currently in formation on public and private fronts around the world (Google et al.). The following quote sums it up well – ?a refreshing counterpoint to the millenarian hype we so often hear w/r/t mass digitization:

The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we’ll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.

Grafton begins and ends in a nostalgic tone, with a paean to the New York Public Library and the critic Alfred Kazin: the poor son of immigrants, City College-educated, who researched his seminal study of American literature On Native Grounds almost entirely with materials freely available at the NYPL. Clearly, Grafton is a believer in the civic ideal of the public library – ?a reservoir of knowledge, free to all – ?and this animates his critique of the balkanized digital landscape of search engines and commercial databases. Given where he appears to stand, I wish he could have taken a stab at what a digital public library might look like, and what sorts of technical, social, political and economic reorganization might be required to build it. Obviously, these are questions that would have required their own article, but it would have been valuable for Grafton, whose piece is one of those occasional journalistic events that moves the issue of digitization and the future of libraries out of the specialist realm into the general consciousness, to have connected the threads. Instead Grafton ends what is overall a valuable and intelligent article with a retreat into print fetishism – ?”crowded public rooms where the sunlight gleams on varnished tables….millions of dusty, crumbling, smelly, irreplaceable documents and books” – ?which, while evocative, obscures more than it illuminates.
Incidentally, those questions are precisely what was discussed at our Really Modern Library meetings last month. We’re still compiling our notes but expect a report soon.