Author Archives: ben vershbow

no longer separated by a common language

LibraryThing now interfaces with the British Library and loads of other UK sources:

The BL is a catch in more than one way. It’s huge, of course. But, unlike some other sources, BL data isn’t normally available to the public. To get it, our friends at Talis, the UK-based library software company, have granted us special access to their Talis Base product, an elephantine mass of book data. In the case of the BL, that’s some twelve million unique records, two copies Gutenberg Bibles and two copies of the Magna Carta.

NEA reading debate round 2: an exchange between sunil iyengar and nancy kaplan

Last week I received an email from Sunil Iyengar of the National Endownment for the Arts responding to Nancy Kaplan’s critique (published here on if:book) of the NEA’s handling of literacy data in its report “To Read or Not to Read.” I’m reproducing the letter followed by Nancy’s response.
Sunil Iyengar:
The National Endowment for the Arts welcomes a “careful and responsible” reading of the report, To Read or Not To Read, and the data used to generate it. Unfortunately, Nancy Kaplan’s critique (11/30/07) misconstrues the NEA’s presentation of Department of Education test data as a “distortion,” although all of the report’s charts are clearly and accurately labeled.
For example, in Charts 5A to 5D of the full report, the reader is invited to view long-term trends in the average reading score of students at ages 9, 13, and 17. The charts show test scores from 1984 through 2004. Why did we choose that interval? Simply because most of the trend data in the preceding chapters–starting with the NEA’s own study data featured in Chapter One–cover the same 20-year period. For the sake of consistency, Charts 5A to 5D refer to those years.
Dr. Kaplan notes that the Department of Education’s database contains reading score trends from 1971 onward. The NEA report also emphasizes this fact, in several places. In 2004, the report observes, the average reading score for 17-year-olds dipped back to where it was in 1971. “For more than 30 years…17-year-olds have not sustained improvements in reading scores,” the report states on p. 57. Nine-year-olds, by contrast, scored significantly higher in 2004 than in 1971.
Further, unlike the chart in Dr. Kaplan’s critique, the NEA’s Charts 5A to 5D explain that the “test years occurred at irregular intervals,” and each test year from 1984 to 2004 is provided. Also omitted from the critique’s reproduction are labels for the charts’ vertical axes, which provide 5-point rather than the 10-point intervals used by the Department of Education chart. Again, there is no mystery here. Five-point intervals were chosen to make the trends easier to read.
Dr. Kaplan makes another mistake in her analysis. She suggests that the NEA report is wrong to draw attention to declines in the average reading score of adult Americans of virtually every education level, and an overall decline in the percentage of adult readers who are proficient. But the Department of Education itself records these declines. In their separate reports, the NEA and the Department of Education each acknowledge that the average reading score of adults has remained unchanged. That’s because from 1992 to 2003, the percentage of adults with postsecondary education increased and the percentage who did not finish high school decreased. “After all,” the NEA report notes, “compared with adults who do not complete high school, adults with postsecondary education tend to attain higher prose scores.” Yet this fact in no way invalidates the finding that average reading scores and proficiency levels are declining even at the highest education levels.
“There is little evidence of an actual decline in literacy rates or proficiency,” Dr. Kaplan concludes. We respectfully disagree.
Sunil Iyengar
Director, Research & Analysis
National Endowment for the Arts
Nancy Kaplan:
I appreciate Mr. Iyengar’s engagement with issues at the level of data and am happy to acknowledge that the NEA’s report includes a single sentence on pages 55-56 with the crucial concession that over the entire period for which we have data, the average scale scores of 17 year-olds have not changed: “By 2004, the average scale score had retreated to 285, virtually the same score as in 1971, though not shown in the chart.” I will even concede the accuracy of the following sentence: “For more than 30 years, in other words, 17year-olds have not sustained improvements in reading scores” [emphasis in the original]. What the report fails to note or account for, however, is that there actually was a period of statistically significant improvement in scores for 17 year-olds from 1971 to 1984. Although I did not mention it in my original critique, the report handles data from 13 year-olds in the same way: “the scores for 13-year-olds have remained largely flat from 1984-2004, with no significant change between the 2004 average score and the scores from the preceding seven test years. Although not apparent from the chart, the 2004 score does represent a significant improvement over the 1971 average – ?a four-point increase” (p. 56).
In other words, a completely accurate and honest assessment of the data shows that reading proficiency among 17 year-olds has fluctuated over the past 30 years, but has not declined over that entire period. At the same time, reading proficiency among 9 year-olds and 13 year-olds has improved significantly. Why does the NEA not state the case in the simple, accurate and complete way I have just written? The answer Mr. Iyengar proffers is consistency, but that response may be a bit disingenuous.
Plenty of graphs in the NEA report show a variety of time periods, so there is at best a weak rationale for choosing 1984 as the starting point for the graphs in question. Consistency, in this case, is surely less important than accuracy and completeness. Given the inferences the report draws from the data, then, it is more likely that the sample of data the NEA used in its representations was chosen precisely because, as Mr. Iyengar admits, that sample would make “the trends easier to read.” My point is that the “trends” the report wants to foreground are not the only trends in the data: truncating the data set makes other, equally important trends literally invisible. A single sentence in the middle of a paragraph cannot excuse the act of erasure here. As both Edward Tufte (The Visual Display of Quantitative Information) and Jacques Bertin (Semiology of Graphics), the two most prominent authorities on graphical representations of data, demonstrate in their seminal works on the subject, selective representation of data constitutes distortion of that data.
Similarly, labels attached to a graph, even when they state that the tests occurred at irregular intervals, do not substitute for representing the irregularity of the intervals in the graph itself (again, see Tufte and Bertin). To do otherwise is to turn disinterested analysis into polemic. “Regularizing” the intervals in the graphic representation distorts the data.
The NEA report wants us to focus on a possible correlation between choosing to read books in one’s leisure time, reading proficiency, and a host of worthy social and civic activities. Fine. But if the reading scores of 17 year-olds improved from 1971 to 1984 but there is no evidence that during the period of improvement these youngsters were reading more, the case the NEA is trying to build becomes shaky at best. Similarly, the reading scores of 13 year-olds improved from 1971 to 1984 but “have remained largely flat from 1984-2004 ….” Yet during that same period, the NEA report claims, leisure reading among 13 year-olds was declining. So what exactly is the hypothesis here -? that sometimes declines in leisure reading correlate with declines in reading proficiency but sometimes such a decline is not accompanied by a decline in reading proficiency? I’m skeptical.
My critique is aimed at the management of data (rather than the a-historical definition of reading the NEA employs, a somewhat richer and more potent issue joined by Matthew Kirschenbaum and others) because I believe that a crucial component of contemporary literacy, in its most capacious sense, includes the ability to understand the relationships between claims, evidence and the warrants for that evidence. The NEA’s data need to be read with great care and its argument held to a high scientific standard lest we promulgate worthless or wasteful public policy based on weak research.
I am a humanist by training and so have come to my appreciation of quantitative studies rather late in my intellectual life. I cannot claim to have a deep understanding of statistics, yet I know what “confounding factors” are. When the NEA report chooses to claim that the reading proficiency of adults is declining while at the same time ignoring the NCES explanation of the statistical paradox that explains the data, it is difficult to avoid the conclusion that the report’s authors are not engaging in a disinterested (that is, dispassionate) exploration of what we can know about the state of literacy in America today but are instead cherry-picking the elements that best suit the case they want to make.
Nancy Kaplan, Executive Director
School of Information Arts and Technologies
University of Baltimore

quiet

We’re all spending some time away from our computers so things will be pretty quiet round here till after new year’s. Happy holidays, everyone.
Btw, if:book just turned 3!

anatomy of a debate

The New York Times continues to do quality interactive work online. Take a look at this recent feature that allows you to delve through video and transcript from the final Democratic presidential candidate debate in Iowa (Dec. 13, ’07). It begins with a lovely navigation tool that allows you to jump through the video topic by topic. Clicking text in the transcript (center column) or a topic from the list (right column) jumps you directly to the corresponding place in the video.

The second part is a “transcript analyzer,” which gives a visual overview of the debate. The text is laid out in miniature in a simple, clean schematic, navigable by speaker. Click a name in the left column and the speaker’s remarks are highlighted on the schematic. Hover over any block of text and that detail of the transcript pops up for you to read. You can also search the debate by keyword and see word counts and speaking times for each candidate.

These are fantastic tools -? if only they were more widely available. These would be amazing extensions to CommentPress.

a few rough notes on knols

Think you’ve got an authoritative take on a subject? Write up an article, or “knol,” and see how the Web judgeth. If it’s any good, you might even make a buck.

Google’s new encyclopedia will go head to head with Wikipedia in the search rankings, though in format it more resembles other ad-supported, single-author info sources like the About.com or Squidoo. The knol-verse (how the hell do we speak of these things as a whole?) will be a Darwinian writers’ market where the fittest knols rise to the top. Anyone can write one. Google will host it for free. Multiple knols can compete on a single topic. Readers can respond to and evaluate knols through simple community rating tools. Content belongs solely to the author, who can license it in any way he/she chooses (all rights reserved, Creative Commons, etc.). Authors have the option of having contextual ads run to the side, revenues from which are shared with Google. There is no vetting or editorial input from Google whatsoever.
Except… Might not the ads exert their own subtle editorial influence? In this entrepreneurial writers’ fray, will authors craft their knols for AdSense optimization? Will they become, consciously or not, shills for the companies that place the ads (I’m thinking especially of high impact topic areas like health and medicine)? Whatever you may think of Wikipedia, it has a certain integrity in being ad-free. The mission is clear and direct: to build a comprehensive free encyclopedia for the Web. The range of content has no correlation to marketability or revenue potential. It’s simply a big compendium of stuff, the only mention of money being a frank electronic tip jar at the top of each page. The Googlepedia, in contrast, is fundamentally an advertising platform. What will such an encyclopedia look like?
In the official knol announcement, Udi Manber, a VP for engineering at Google, explains the genesis of the project: “The challenge posed to us by Larry, Sergey and Eric was to find a way to help people share their knowledge. This is our main goal.” You can see embedded in this statement all the trademarks of Google’s rhetoric: a certain false humility, the pose of incorruptible geek integrity and above all, a boundless confidence that every problem, no matter how gray and human, has a technological fix. I’m not saying it’s wrong to build a business, nor that Google is lying whenever it talks about anything idealistic, it’s just that time and again Google displays an astonishing lack of self-awareness in the way it frames its services -? a lack that becomes especially obvious whenever the company edges into content creation and hosting. They tend to talk as though they’re building the library of Alexandria or the great Encyclopédie, but really they’re describing an advanced advertising network of Google-exclusive content. We shouldn’t allow these very different things to become as muddled in our heads as they are in theirs. You get a worrisome sense that, like the Bushies, the cheerful software engineers who promote Google’s products on the company’s various blogs truly believe the things they’re saying. That if we can just get the algorithm right, the world can bask in the light of universal knowledge.
The blogosphere has been alive with commentary about the knol situation throughout the weekend. By far the most provocative thing I’ve read so far is by Anil Dash, VP of Six Apart, the company that makes the Movable Type software that runs this blog. Dash calls out this Google self-awareness gap, or as he puts it, its lack of a “theory of mind”:

Theory of mind is that thing that a two-year-old lacks, which makes her think that covering her eyes means you can’t see her. It’s the thing a chimpanzee has, which makes him hide a banana behind his back, only taking bites when the other chimps aren’t looking.
Theory of mind is the awareness that others are aware, and its absence is the weakness that Google doesn’t know it has. This shortcoming exists at a deep cultural level within the organization, and it keeps manifesting itself in the decisions that the company makes about its products and services. The flaw is one that is perpetuated by insularity, and will only be remedied by becoming more open to outside ideas and more aware of how people outside the company think, work and live.

He gives some examples:

Connecting PageRank to economic systems such as AdWords and AdSense corrupted the meaning and value of links by turning them into an economic exchange. Through the turn of the millennium, hyperlinking on the web was a social, aesthetic, and expressive editorial action. When Google introduced its advertising systems at the same time as it began to dominate the economy around search on the web, it transformed a basic form of online communication, without the permission of the web’s users, and without explaining that choice or offering an option to those users.

He compares the knol enterprise with GBS:

Knol shares with Google Book Search the problem of being both indexed by Google and hosted by Google. This presents inherent conflicts in the ranking of content, as well as disincentives for content creators to control the environment in which their content is published. This necessarily disadvantages competing search engines, but more importantly eliminates the ability for content creators to innovate in the area of content presentation or enhancement. Anything that is written in Knol cannot be presented any better than the best thing in Knol. [his emphasis]

And lastly concludes:

An awareness of the fact that Google has never displayed an ability to create the best tools for sharing knowledge would reveal that it is hubris for Google to think they should be a definitive source for hosting that knowledge. If the desire is to increase knowledge sharing, and the methods of compensation that Google controls include traffic/attention and money/advertising, then a more effective system than Knol would be to algorithmically determine the most valuable and well-presented sources of knowledge, identify the identity of authorites using the same journalistic techniques that the Google News team will have to learn, and then reward those sources with increased traffic, attention and/or monetary compensation.

For a long time Google’s goal was to help direct your attention outward. Increasingly we find that they want to hold onto it. Everyone knows that Wikipedia articles place highly in Google search results. Makes sense then that they want to capture some of those clicks and plug them directly into the Google ad network. But already the Web is dominated by a handful of mega sites. I get nervous at the thought that www.google.com could gradually become an internal directory, that Google could become the alpha and omega, not only the start page of the Internet but all the destinations.
It will be interesting to see just how and to what extent knols start creeping up the search results. Presumably, they will be ranked according to the same secret metrics that measure all pages in Google’s index, but given the opacity of their operations, who’s to say that subtle or unconscious rigging won’t occur? Will community ratings factor in search rankings? That would seem to present a huge conflict of interest. Perhaps top-rated knols will be displayed in the sponsored links area at the top of results pages. Or knols could be listed in order of community ranking on a dedicated knol search portal, providing something analogous to the experience of searching within Wikipedia as opposed to finding articles through external search engines. Returning to the theory of mind question, will Google develop enough awareness of how it is perceived and felt by its users to strike the right balance?
One last thing worth considering about the knol -? apart from its being possibly the worst Internet neologism in recent memory -? is its author-centric nature. It’s interesting that in order to compete with Wikipedia Google has consciously not adopted Wikipedia’s model. The basic unit of authorial action in Wikipedia is the edit. Edits by multiple contributors are combined, through a complicated consensus process, into a single amalgamated product. On Google’s encyclopedia the basic unit is the knol. For each knol (god, it’s hard to keep writing that word) there is a one to one correspondence with an individual, identifiable voice. There may be multiple competing knols, and by extension competing voices (you have this on Wikipedia too, but it’s relegated to the discussion pages).
Viewed in this way, Googlepedia is perhaps a more direct rival to Larry Sanger’s Citizendium, which aims to build a more authoritative Wikipedia-type resource under the supervision of vetted experts. Citizendium is a strange, conflicted experiment, a weird cocktail of Internet populism and ivory tower elitism -? and by the look of it, not going anywhere terribly fast. If knols take off, could they be the final nail in the coffin of Sanger’s awkward dream? Bryan Alexander wonders along similar lines.
While not explicitly employing Sanger’s rhetoric of “expert” review, Google seems to be banking on its commitment to attributed solo authorship and its ad-based incentive system to lure good, knowledgeable authors onto the Web, and to build trust among readers through the brand-name credibility of authorial bylines and brandished credentials. Whether this will work remains to be seen. I wonder… whether this system will really produce quality. Whether there are enough checks and balances. Whether the community rating mechanisms will be meaningful and confidence-inspiring. Whether self-appointed experts will seem authoritative in this context or shabby, second-rate and opportunistic. Whether this will have the feeling of an enlightened knowledge project or of sleezy intellectual link farming (or something perfectly useful in between).
The feel of a site -? the values it exudes -? is an important factor though. This is why I like, and in an odd way trust Wikipedia. Trust not always to be correct, but to be transparent and to wear its flaws on its sleeve, and to be working for a higher aim. Google will probably never inspire that kind of trust in me, certainly not while it persists in its dangerous self-delusions.
A lot of unknowns here. Thoughts?

a safe haven for fan culture

The Organization for Transformative Works is a new “nonprofit organization established by fans to serve the interests of fans by providing access to and preserving the history of fanworks and fan culture in its myriad forms.”
Interestingly, the OTW defines itself -? and by implication, fan culture in general -? as a “predominately female community.” The board of directors is made up of a distinguished and, diverging from fan culture norms, non-anonymous group of women academics spanning film studies, english, interaction design and law, and chaired by the bestselling fantasy author Naomi Novik (J.K. Rowling is not a member). In comments on his website, Ethan Zuckerman points out that

…it’s important to understand the definition of “fan culture” – media fandom, fanfic and vidding, a culture that’s predominantly female, though not exclusively so. I see this statement in OTW’s values as a reflection on the fact that politically-focused remixing of videos has received a great deal of attention from legal and media activists (Lessig, for instance) in recent years. Some women who’ve been involved with remixing television and movie clips for decades, producing sophisticated works often with incredibly primitive tools, are understandably pissed off that a new generation of political activists are being credited with “inventing the remix”.

In a nod to Virginia Woolf, next summer the OTW will launch “An Archive of One’s Own,” a space dedicated to the preservation and legal protection of fan-made works:

An Archive Of Our Own’s first goal is to create a new open-source software package to allow fans to host their own robust, full-featured archives, which can support even an archive on a very large scale of hundreds of thousands of stories and has the social networking features to make it easier for fans to connect to one another through their work.
Our second goal is to use this software to provide a noncommercial and nonprofit central hosting place for fanfiction and other transformative fanworks, where these can be sheltered by the advocacy of the OTW and take advantage of the OTW’s work in articulating the case for the legality and social value of these works.

OTW will also publish an academic journal and a public wiki devoted to fandom and fan culture history. All looks very promising.

ghost story

02138, a magazine aimed at Harvard alumni, has a great article about the widespread practice among professors of using low-wage student labor to research and even write their books.

…in any number of academic offices at Harvard, the relationship between “author” and researcher(s) is a distinctly gray area. A young economics professor hires seven researchers, none yet in graduate school, several of them pulling 70-hour work-weeks; historians farm out their research to teams of graduate students, who prepare meticulously written memos that are closely assimilated into the finished work; law school professors “write” books that acknowledge dozens of research assistants without specifying their contributions. These days, it is practically the norm for tenured professors to have research and writing squads working on their publications, quietly employed at stages of co-authorship ranging from the non-controversial (photocopying) to more authorial labor, such as significant research on topics central to the final work, to what can only be called ghostwriting.

Ideally, this would constitute a sort of apprentice system -? one generation of scholars teaching the next through joint endeavor. But in reality the collaborative element, though quietly sanctioned by universities (the article goes into this a bit), receives no direct blessing or stated pedagogical justification. A ghost ensemble works quietly behind the scenes to keep up the appearance of heroic, individual authorship.

cinematic reading

Random House Canada underwrote a series of short videos riffing on Douglas Coupland’s new novel The Gum Thief produced by the slick Toronto studio Crush Inc. These were forwarded to me by Alex Itin, who described watching them as a kind of “cinematic reading.” Watch, you’ll see what he means. There are three basic storylines, each consisting of three clips. This one, from the “Glove Pond” sequence, is particularly clever in its use of old magazines:

All the videos are available here at Crush Inc. Or on Coupland’s YouTube page.

flight paths: a networked novel

I’d like to draw your attention to an exciting new project: Flight Paths, a networked novel in progress by Kate Pullinger and Chris Joseph, co-authors most recently of the lovely multimedia serial “Inanimate Alice.” The Institute is delighted to be a partner on this experiment (along with the Institute of Creative Technologies at De Montfort University and Arts Council England), which marks our first foray into fiction. A common thread with our past experiments is that this book will involve its readers in the writing process. The story begins:

“I have finished my weekly supermarket shop, stocking up on provisions for my three kids, my husband, our dog and our cat. I push the loaded trolley across the car park, battling to keep its wonky wheels on track. I pop open the boot of my car and then for some reason, I have no idea why, I look up, into the clear blue autumnal sky. And I see him. It takes me a long moment to figure out what I am looking at. He is falling from the sky. A dark mass, growing larger quickly. I let go of the trolley and am dimly aware that it is getting away from me but I can’t move, I am stuck there in the middle of the supermarket car park, watching, as he hurtles toward the earth. I have no idea how long it takes – a few seconds, an entire lifetime – but I stand there holding my breath as the city goes about its business around me until…
He crashes into the roof of my car.”
The car park of Sainsbury’s supermarket in Richmond, southwest London, lies directly beneath one of the main flight paths into Heathrow Airport. Over the last decade, on at least five separate occasions, the bodies of young men have fallen from the sky and landed on or near this car park. All these men were stowaways on flights from the Indian subcontinent who had believed that they could find a way into the cargo hold of an airplane by climbing up into the airplane wheel shaft. It is thought that none could have survived the journey, killed by either the tremendous heat generated by the airplane wheels on the runway, crushed when the landing gear retracts into the plane after take off, or frozen to death once the airplane reaches altitude.
‘Flight Paths’ seeks to explore what happens when lives collide – an airplane stowaway and the fictional suburban London housewife, quoted above. This project will tell their stories.
Through the fiction of these two lives, and the cross-connections and contradictions they represent, a larger story about the way we live today will emerge. The collision between the unknown young man, who will be both memorialised and brought back to life by the piece, and the London woman will provide the focus and force for a piece that will explore asylum, immigration, consumer culture, Islam and the West, as well as the seemingly mundane modern day reality of the supermarket car park itself. This young man’s death/plummet will become a flight, a testament to both his extreme bravery and the tragic symbolism of his chosen route to the West.

Here the authors explain the participatory element:

The initial goal of this project is to create a work of digital fiction, a ‘networked book’, created on and through the internet. The first stage of the project will include a web iteration with, at its heart, this blog, opening up the research process to the outside world, inviting discussion of the large array of issues the project touches on. As well as this, Chris Joseph and Kate Pullinger will create a series of multimedia elements that will illuminate various aspects of the story. This will allow us to invite and encourage user-generated content on this website and any associated sites; we would like to open the project up to allow other writers and artists to contribute texts – both multimedia and more traditional – as well as images, sounds, memories, ideas. At the same time, Kate Pullinger will be writing a print novel that will be a companion piece to the project overall.

We’re very curious/excited to see how this develops. Go explore the site, which is just a preliminary framework right now, and get involved. And please spread the word to other potential reader/paticipants. A chance to play a part in a new kind of story.

generation gap?

A pair of important posts, one by Siva Vaidhyanathan and one by Henry Jenkins, call for an end to generationally divisive rhetoric like “digital immigrants” and “digital natives.”
From Siva:

Partly, I resist such talk because I don’t think that “generations” are meaningful social categories. Talking about “Generation X” as if there were some discernable unifying traits or experiences that all people born between 1964 and pick a year after 1974 is about as useful as saying that all Capricorns share some trait or experience. Yes, today one-twelfth of the world will “experience trouble at work but satisfaction in love.” Right.
Invoking generations invariably demands an exclusive focus on people of wealth and means, because they get to express their preferences (for music, clothes, electronics, etc.) in ways that are easy to count. It always excludes immigrants, not to mention those born beyond the borders of the United States. And it excludes anyone on the margins of mainstream consumer or cultural behavior.
In the case of the “digital generation,” the class, ethnic, and geographic biases could not be more obvious.

From Jenkins:

In reality, whether we are talking about games or fan culture or any of the other forms of expression which most often get associated with digital natives, we are talking about forms of cultural expression that involve at least as many adults as youth. Fan culture can trace its history back to the early part of the 20th century; the average gamer is in their twenties and thirties. These are spaces where adults and young people interact with each other in ways that are radically different from the fixed generational hierarchies affiliated with school, church, or the family. They are spaces where adults and young people can at least sometimes approach each other as equals, can learn from each other, can interact together in new terms, even if there’s a growing tendency to pathologize any contact on line between adults and youth outside of those familiar structures.
As long as we divide the world into digital natives and immigrants, we won’t be able to talk meaningfully about the kinds of sharing that occurs between adults and children and we won’t be able to imagine other ways that adults can interact with youth outside of these cultural divides. What once seemed to be a powerful tool for rethinking old assumptions about what kinds of educational experiences or skills were valuable, which was what excited me about Prensky’s original formulation [pdf], now becomes a rhetorical device that short circuits thinking about meaningful collaboration across the generations.