{"id":1258,"date":"2008-06-16T15:35:05","date_gmt":"2008-06-16T15:35:05","guid":{"rendered":"\/ifbookblog\/?p=1258"},"modified":"2008-06-16T15:35:05","modified_gmt":"2008-06-16T15:35:05","slug":"google_digitization_and_archiv","status":"publish","type":"post","link":"https:\/\/futureofthebook.org\/blog\/2008\/06\/16\/google_digitization_and_archiv\/","title":{"rendered":"google, digitization and archives: despatches from if:book"},"content":{"rendered":"<p>In discussing with other Institute folks how to go about reviewing four year&#8217;s worth of blog posts, I&#8217;ve felt torn at times. Should I cherry-pick &#8216;thinky&#8217; posts that discuss a particular topic in depth, or draw out narratives from strings of posts each of which is not, in itself, a literary gem but which cumulatively form the bedrock of the blog? But I thought about it, and realised that you can&#8217;t really have one without the other.<br \/>\nFair use, digitization, public domain, archiving, the role of libraries and cultural heritage are intricately interconnected. But the name that connects all these issues over the last few years has been Google. The Institute has covered Google&#8217;s incursions into digitization of libraries (amongst other things) in a way that has explored many of these issues &#8211; and raised questions that are as urgent as ever. Is it okay to privatize vast swathes of our common cultural heritage? What are the privacy issues around technology that tracks online reading? Where now for copyright, fair use and scholarly research?<br \/>\nIn-depth coverage of Google and digitization has helped to draw out many of the issues central to this blog. Thus, in drawing forth the narrative of if:book&#8217;s Google coverage is, by extension, to watch a political and cultural stance emerging.  So in this post I&#8217;ve tried to have my cake and eat it &#8211; to trace a story, and to give a sense of the depth of thought going into that story&#8217;s discussion.<br \/>\nIn order to keep things manageable, I&#8217;ve kept this post to a largely Google-centric focus. Further reviews covering copyright-related posts, and general discussion of libraries and technology will follow.<br \/>\n<strong>2004-5: Google rampages through libraries, annoys Europe, gains rivals<\/strong><br \/>\nIn December 2004, if:book&#8217;s <a href=\"\/blog\/archives\/2004\/12\/google_takes_on_u_of_michigan.html\">first post about Google&#8217;s digitization of libraries<\/a> gave the numbers for the University of Michigan project.<br \/>\nIn February 2005, the head of France&#8217;s national libraries <a href=\"\/blog\/archives\/2005\/02\/non_merci.html\">raised a battle cry<\/a> against the Anglo-centricity implicit in Google&#8217;s plans to digitize libraries. The company&#8217;s seemingly relentless advance <a href=\"\/blog\/archives\/2005\/05\/europe_aims_canon_at_google.html\"> brought Europe out in force<\/a> to find ways of forming non-Google coalitions for digitization.<br \/>\nIn August, Google <a href=\"\/blog\/archives\/2005\/08\/google_halts_book_scans_until.html\">halted book scans for a few months<\/a> to appease publishers angry at encroachments on their copyright. But this was clearly not enough, as in October 2005, Google <a href=\"\/blog\/archives\/2005\/10\/google_is_sued_again.html\"> was sued (again) by a string of publishers for massive copyright infringement<\/a>. However, undeterred either by European hostility or legal challenges, the same month the company <a href=\"\/blog\/archives\/2005\/10\/google_expands_bookscanning_pr.html\">made moves to expand Google Print into Europe<\/a>. Also in October 2005, Yahoo! <a href=\"\/blog\/archives\/2005\/10\/yahoo_announces_bookscanning_p.html\">launched the Open Content Alliance<\/a>, which was <a href=\"\/blog\/archives\/2005\/10\/microsoft_joins_open_content_a.html\">joined by Microsoft<\/a> around the same time. Later the same month, <a href=\"\/blog\/archives\/2005\/10\/\">a Wired article<\/a> put the case for authors in favor of Google&#8217;s searchable online archive.<br \/>\nIn November 2005 Google <a href=\"\/blog\/archives\/2005\/11\/google_print_is_no_more.html\"> announced<\/a> that from here on in Google Print would be known as Google Book Search, as the &#8216;Print&#8217; reference perhaps struck too close to home for publishers. The same month, Ben <a href=\"\/blog\/archives\/2005\/11\/google_prints_notsopublic_doma.html\">savaged Google Print&#8217;s &#8216;public domain&#8217; efforts<\/a> &#8211; then <a href=\"\/blog\/archives\/2005\/11\/having_browsed_google_print_a.html\">recanted (a little) later that month<\/a>.<br \/>\nIn December 2005 Google&#8217;s digitization was still hot news &#8211; the Institute <a href=\"\/blog\/archives\/2005\/12\/google_libraries_podcast_now_a.html\">did a radio show\/podcast with Open Source<\/a> on the topic, and <a href=\"\/blog\/archives\/2005\/12\/google_book_search_debated_at.html\">covered the Google Book Search debate at the American Bar Association<\/a>. (In fact, most of that month&#8217;s posts are dedicated to Google and digitization and are too numerous to do justice to here).<br \/>\n<strong>2006: Digitization spreads<\/strong><br \/>\nBy 2006, digitization and digital archives &#8211; with attendant debates &#8211; are spreading. From January through March, three posts &#8211; &#8216;The book is reading you&#8217; parts <a href=\"\/blog\/archives\/2006\/01\/the_book_is_reading_you.html\">1<\/a>, <a href=\"\/blog\/archives\/2006\/03\/google_buys_writely_the_book_is_reading_you.html\">2<\/a> and <a href=\"\/blog\/archives\/2006\/03\/the_book_is_reading_you_part_3.html\">3<\/a> looked at privacy, networked books, fair use, downloading and copyright around Google Book Search. Also in March, a further post <a href=\"\/blog\/archives\/2006\/03\/googlezon_and_the_publishing_i.html\">discussed<\/a> Google and Amazon&#8217;s incursions into publishing.<br \/>\nIn April, the Smithsonian cut a deal with Showtime making the media company a preferential media partner for documentaries using Smithsonian resources. Jesse <a href=\"\/blog\/archives\/2006\/04\/corporate_creep.html\">analyzed the implications for open research<\/a>.<br \/>\nIn June, the Library of Congress and partners <a href=\"\/blog\/archives\/2005\/06\/web_news_as_gated_community.html\">launched a project to make vintage newspapers available online<\/a>. Google Book Search, meanwhile, was <a href=\"\/blog\/archives\/2005\/06\/google_print_gets_its_own_addr.html\">tweaked<\/a> to reassure publishers that the new dedicated search page was not, in fact, a library. The same month, Ben <a href=\"\/blog\/archives\/2006\/06\/google_and_the_myth_of_univers.html\">responded thoughtfully in June 2006<\/a> to a French book attacking Google, and by extension America, for cultural imperialism. The debate continued with <a href=\"\/blog\/archives\/2006\/07\/the_myth_of_universal_knowledg.html\">a follow-up post<\/a> in July.<br \/>\nIn August, Google  <a href=\"\/blog\/archives\/2006\/08\/googles_window_on_the_public_d.html\">announced<\/a>downloadable PDF versions of many of its public-domain books. Then, in August, the publication of Google&#8217;s contract with UCAL&#8217;s library <a href=\"\/blog\/archives\/2006\/08\/showtiming_our_libraries.html\">prompted some debate<\/a> the same month. In October we reported on <a href=\"\/blog\/archives\/2006\/10\/microsoft_steps_up_book_digiti.html\">Microsoft&#8217;s growing book digitization list<\/a>, and <a href=\"\/blog\/archives\/2006\/10\/literary_zeitgest_googlestyle.html>Google released its literary Zeitgeist data<\/a>. November saw further commentary on Google Book Search improvements, and <a href=\"\/blog\/archives\/2006\/11\/brewster_kahle_on_the_google_b.html\">some criticism of the same from Brewster Kahle<\/a>. The same month, we reported that the Dutch government is pouring millions into a vast public digitization program.<br \/>\nIn December, Microsoft <a href=\"\/blog\/archives\/2006\/12\/microsoft_launches_live_search.html\">launched<\/a> its (clunkier) version of Google Books, Microsoft Live Book Search.<br \/>\n<strong><br \/>\n2007: Google is the environment<\/strong><br \/>\nIn January, former Netscape player Rich Skrenta <a href=\"\/blog\/archives\/2007\/01\/has_google_already_won.html\">crowned Google king of the &#8216;third age of computing&#8217;<\/a>: &#8216;Google is the environment&#8217;, he declared. Meanwhile, having seemingly forgotten 2005&#8217;s tussles, the company <a href=\"\/blog\/archives\/2007\/01\/unbound_google_pulishing_confe.html\">hosted a publishing conference at the New York Public Library<\/a>. In February the company <a href=\"\/blog\/archives\/2007\/02\/google_library_dominoes.html\">signed another digitization deal, this time with Princeton<\/a>; in August, this institution <a href=\"\/blog\/archives\/2007\/08\/cornell_joins_google_book_sear.html\">was joined by Cornell<\/a>, and the Economist <a href=\"\/blog\/archives\/2007\/08\/jp_google.html\">compared Google&#8217;s databases to the banking system of the information age<\/a>. The following month, Siva&#8217;s first Monday podcast <a href=\"\/blog\/archives\/2007\/09\/siva_podcast_on_the_googlizati.html\">discussed the Googlization of libraries<\/a>.<br \/>\nBy now, while Google remains a theme, commercial digitization of public-domain archives is a far broader issue. In January, <a href=\"\/blog\/archives\/2007\/01\/national_archives_sell_out.html\">the US National Archives cut a digitization deal with  Footnote<\/a>, effectively paywalling digital access to a slew of public-domain documents; in August, <a href=\"\/blog\/archives\/2007\/08\/privatizing_public_goods_our_t.html\">a deal followd with Amazon<\/a> for commercial distribution of its film archive. The same month, <a href=\"\/blog\/archives\/2007\/08\/audiovisual_heritage_double_pl.html\">two major audiovisual archiving projects launched<\/a>.<br \/>\nIn May, Ben <a href=\"\/blog\/archives\/2007\/05\/the_peoples_card_catalog_a_tho.html\">speculated<\/a> about whether some &#8216;People&#8217;s Card Catalog&#8217; could be devised to rival Google&#8217;s gated archive. The Open Archive <a href=\"\/blog\/archives\/2007\/07\/the_open_library.html\">launched in July<\/a>, to mixed reviews &#8211; the same month that the ongoing back-and-forth between the Institute and academic Siva Vaidyanathan bore fruit.  Siva&#8217;s networked writing project, The Googlization Of Everything, was announced (this would be launched in September). Then, in August, we <a href=\"\/blog\/archives\/2007\/08\/the_bookish_character_of_books.html\">covered an excellent piece by Paul Duguid discussing the shortcomings of Google&#8217;s digitization efforts<\/a>.<br \/>\nIn October, several major American libraries refused digitization deals with Google. By November, Google and digitization had <a href=\"\/blog\/archives\/2007\/11\/digitization_and_its_discontents.html\">found its way into the New Yorker<\/a>; the same month <a href=\"\/blog\/archives\/2007\/11\/library_of_congress_to_archive.html\">the Library of Congress put out a call<\/a> for e-literature links to be archived.<br \/>\n<strong><br \/>\n2008: All quiet? <\/strong><br \/>\nIn January we <a href=\"\/blog\/archives\/2008\/01\/no_longer_separated_by_a_commo.html\">reported<\/a> that LibraryThing interfaces with the British Library, and in March on <a href=\"\/blog\/archives\/2008\/03\/google_books_api.html\">the launch of an API for Google Books<\/a>. Siva&#8217;s book <a href=\"\/blog\/archives\/2008\/03\/googlization_of_everything_now.html\">found a print publisher<\/a> the same month.<br \/>\nBut if Google coverage has been slighter this year, that&#8217;s not to suggest a happy ending to the story. Microsoft <a href=\"http:\/\/www.newser.com\/article\/d90s0aco0.html\">abandoned its book scanning project in mid-May of this year<\/a>, raising questions about the viability of the Open Content Alliance. It would seem as though Skrenta was right. The Googlization of Everything continues, less challenged than ever.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In discussing with other Institute folks how to go about reviewing four year&#8217;s worth of blog posts, I&#8217;ve felt torn at times. Should I cherry-pick &#8216;thinky&#8217; posts that discuss a particular topic in depth, or draw out narratives from strings of posts each of which is not, in itself, a literary gem but which cumulatively [&hellip;]<\/p>\n","protected":false},"author":23,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84,498,755,1059,1602],"tags":[3125,3215,2475,3309,3429],"class_list":["post-1258","post","type-post","status-publish","format-standard","hentry","category-archive","category-digitization","category-google","category-libraries","category-review","tag-archive","tag-digitization","tag-google-2","tag-libraries","tag-review"],"_links":{"self":[{"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/posts\/1258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/comments?post=1258"}],"version-history":[{"count":0,"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/posts\/1258\/revisions"}],"wp:attachment":[{"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/media?parent=1258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/categories?post=1258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/futureofthebook.org\/blog\/wp-json\/wp\/v2\/tags?post=1258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}