More fun with book metadata. Hot on the heels of Bkkeepr comes Booklert, an app that lets you keep track of the Amazon rank of your (or anyone else’s) book. Writer, thinker and social media maven Russell Davies speculated that he’d love to have such a thing for keeping track of his book. No sooner was this said than MCQN had built it; so far it has few users, but fairly well-connected ones.
Reading MCQN’s explanation I get a picture of Booklert as a time-saving tool for hypercompetitive and stat-obsessed writers, or possibly as a kind of masochistic entertainment for publishers morbidly addicted to seeing their industry flounder. Then perhaps I’m being uncharitable: assuming you accept the (deeply dodgy) premise that the only meaningful book sales are those conducted through Amazon, Booklert – or something similar – could be used to create personalized bestseller lists, adding a layer of market data to the work of trusted reviewers and curators. I’d be interested to find out which were the top-selling titles in the rest of the Institute’s personal favourites list; I’d also be interested to find out what effect a few weeks’ endorsement by a high-flying member of the digerati might have on a handful of books.
But whether or not it is, as Davies asserts, “exactly the sort of thing a major book business could have thought of, should have thought of, but didn’t”, Booklert illustrates the extent to which, in the context of the Web, most of the key developments around the future of the book do not concern the form, purposes or delivery mechanism of the book. They concern metadata: how it is collected, who owns it, who can make use of it. Whether you’re talking DRM, digitization, archiving, folksonomies or feeds, the Web brings a tendency – because an ability – to see the world less in terms of static content than in terms of dynamic patterns, flows and aggregated masses of user-generated behavior. When thus measured as units in a dynamic system, what the books themselves actually contain is only of secondary importance. What does this say about the future of serious culture in the world of information visualization?
Category Archives: metadata
google flirts with image tagging
Ars Technica reports that Google has begun outsourcing, or “crowdsourcing,” the task of tagging its image database by asking people to play a simple picture labeling game. The game pairs you with a randomly selected online partner, then, for 90 seconds, runs you through a sequence of thumbnail images, asking you to add as many labels as come to mind. Images advance whenever you and your partner hit upon a match (an agreed-upon tag), or when you agree to take a pass.
I played a few rounds but quickly grew tired of the bland consensus that the game encourages. Matches tend to be banal, basic descriptors, while anything tricky usually results in a pass. In other words, all the pleasure of folksonomies — splicing one’s own idiosyncratic sense of things with the usually staid task of classification — is removed here. I don’t see why they don’t open the database up to much broader tagging. Integrate it with the image search and harvest a bigger crop of metadata.
Right now, it’s more like Tom Sawyer tricking the other boys into whitewashing the fence. Only, I don’t think many will fall for this one because there’s no real incentive to participation beyond a halfhearted points system. For every matched tag, you and your partner score points, which accumulate in your Google account the more you play. As far as I can tell, though, points don’t actually earn you anything apart from a shot at ranking in the top five labelers, which Google lists at the end of each game. Whitewash, anyone?
In some ways, this reminded me of Amazon’s Mechanical Turk, an “artificial artificial intelligence” service where anyone can take a stab at various HIT’s (human intelligence tasks) that other users have posted. Tasks include anything from checking business hours on restaurant web sites against info in an online directory, to transcribing podcasts (there are a lot of these). “Typically these tasks are extraordinarily difficult for computers, but simple for humans to answer,” the site explains. In contrast to the Google image game, with the Mechanical Turk, you can actually get paid. Fees per HIT range from a single penny to several dollars.
I’m curious to see whether Google goes further with tagging. Flickr has fostered the creation of a sprawling user-generated taxonomy for its millions of images, but the incentives to tagging there are strong and inextricably tied to users’ personal investment in the production and sharing of images, and the building of community. Amazon, for its part, throws money into the mix, which (however modest the sums at stake) makes Mechanical Turk an intriguing, and possibly entertaining, business experiment, not to mention a place to make a few extra bucks. Google’s experiment offers neither, so it’s not clear to me why people should invest.
if not rdf, then what?: part II
I had an exchange about my previous post with an RDF expert who explained to me that API’s are not like RDF and it would be incorrect to try to equate them. She’s right – API’s do not replace the need for RDF, nor do they replicate the functionality of RDF. API’s do provide access to data, but that data can be in many forms, including XML bound RDF. This is one of the pleasures and priviledges of writing on this blog: the audience contributes at a very high level of discourse, and is endowed with extremely deep knowledge about the topics under discussion.
I want to reiterate my point with a new inflection. By suggesting that API’s were an alternative to RDF, I was trying to get at a point that had more to do with adoption than functionality. I admit, I did not make the point well. So let me make a second attempt: API’s are about data access, and that, currently (and from my anecdotal experience) is where the value proposition lies for the new breed of web services. You have your data in someone’s database. That data is accessible to developers to manipulate and represent back to you in new, innovative, and useful ways. Most of the attention in the webdev community is turning towards the development of new interfaces—not towards the development of new tools to manage and enrich the data (again, anecdotal evidence only). Yes, there are people still interested in semantic data; we are indebted to them for continuing to improve the way our systems interact at a data level. But the focus of development has shifted to the interface. API’s make the gathering of data as simple as setting parameters, leaving only the work of designing the front-end experience.
Another note on RDF from my exchange: it was pointed out that practitioners of RDF prefer not to read it in XML, but instead use Notation 3 (N3), which is undeniably easier to read than XML. I don’t know enough about N3 to make a proper example, but I think you can get the idea if you look at the examples here and here.
google on the air
Open Source’s hour on the Googlization of libraries was refreshingly light on the copyright issue and heavier on questions about research, reading, the value of libraries, and the public interest. With its book-scanning project, Google is a private company taking on the responsibilities of a public utility, and Siva Vaidhyanathan came down hard on one of the company’s chief legal reps for the mystery shrouding their operations (scanning technology, algorithms and ranking system are all kept secret). The rep reasonably replied that Google is not the only digitization project in town and that none of its library partnerships are exclusive. But most of his points were pretty obvious PR boilerplate about Google’s altruism and gosh darn love of books. Hearing the counsel’s slick defense, your gut tells you it’s right to be suspicious of Google and to keep demanding more transparency, clearer privacy standards and so on. If we’re going to let this much information come into the hands of one corporation, we need to be very active watchdogs.
Our friend Karen Schneider then joined the fray and as usual brought her sage librarian’s perspective. She’s thrilled by the possibilities of Google Book Search, seeing as it solves the fundamental problem of library science: that you can only search the metadata, not the texts themselves. But her enthusiasm is tempered by concerns about privatization similar to Siva’s and a conviction that a research service like Google can never replace good librarianship and good physical libraries. She also took issue with the fact that Book Search doesn’t link to other library-related search services like Open Worldcat. She has her own wrap-up of the show on her blog.
Rounding out the discussion was Matthew G. Kirschenbaum, a cybertext studies blogger and professor of english at the University of Maryland. Kirschenbaum addressed the question of how Google, and the web in general, might be changing, possibly eroding, our reading practices. He nicely put the question in perspective, suggesting that scattershot, inter-textual, “snippety” reading is in fact the older kind of reading, and that the idea of sustained, deeply immersed involvement with a single text is largely a romantic notion tied to the rise of the novel in the 18th century.
A satisfying hour, all in all, of the sort we should be having more often. It was fun brainstorming with Brendan Greeley, the Open Source on “blogger-in-chief,” on how to put the show together. Their whole bit about reaching out to the blogosphere for ideas and inspiration isn’t just talk. They put their money where their mouth is. I’ll link to the podcast when it becomes available.
image: Real Gabinete Português de Literatura, Rio de Janeiro – Claudio Lara via Flickr
online retail influencing libraries
The NY Times reports on new web-based services at university libraries that are incorporating features such as personalized recommendations, browsing histories, and email alerts, the sort of thing developed by online retailers like Amazon and Netflix to recreate some of the experience of browsing a physical store. Remember Ranganathan’s fourth law of library science: “save the time of the reader.” The reader and the customer are perhaps becoming one in the same.
It would be interesting if a social software system were emerging for libraries that allowed students and researchers to work alongside librarians in organizing the stacks. Automated recommendations are just the beginning. I’m talking more about value added by the readers themselves (Amazon has does this with reader reviews, Listmania, and So You’d Like To…). A social card catalogue with a tagging system and other reader-supplied metadata where readers could leave comments and bread crumb trails between books. Each card catalogue entry with its own blog and wiki to create a context for the book. Books are not just surrounded by other volumes on the shelves, they are surrounded by people, other points of view, affinities — the kinds of thing that up to this point were too vaporous to collect. This goes back to David Weinberger’s comment on metadata and Google Book Search.
the book in the network – masses of metadata
In this weekend’s Boston Globe, David Weinberger delivers the metadata angle on Google Print:
…despite the present focus on who owns the digitized content of books, the more critical battle for readers will be over how we manage the information about that content-information that’s known technically as metadata.
…we’re going to need massive collections of metadata about each book. Some of this metadata will come from the publishers. But much of it will come from users who write reviews, add comments and annotations to the digital text, and draw connections between, for example, chapters in two different books.
As the digital revolution continues, and as we generate more and more ways of organizing and linking books-integrating information from publishers, libraries and, most radically, other readers-all this metadata will not only let us find books, it will provide the context within which we read them.
The book in the network is a barnacled spirit, carrying with it the sum of its various accretions. Each book is also its own library by virtue not only of what it links to itself, but of what its readers are linking to, of what its readers are reading. Each book is also a milk crate of earlier drafts. It carries its versions with it. A lot of weight for something physically weightless.
premature burial, or, the electronic word in time and space
We were talking yesterday (and Bob earlier) about how to better organize content on if:book – how to highlight active discussion threads, or draw attention to our various categories. Something more dynamic than a list of links on the sidebar, or a bunch of hot threads advertised at the top. A significant problem with blogs is the tyranny of the vertical column, where new entries call out for attention on a stack of rapidly forgotten material, much of which might still be worth reading even though it was posted back in the dark ages (i.e. three days ago). Some of the posts that get buried still have active discussions stemming from them. Just today, “ways of seeing, ways of writing” – posted nearly two weeks ago – received another comment. The conversation is still going. (See also Dan’s “blog reading: what’s left behind”.)
This points to another thorny problem, still unsolved nearly 15 years into the world wide web, and several years into the blogging craze: how to visualize asynchronous conversations – that is, conversations in which time lapses between remarks. If the conversation is between only two people, a simple chronological column works fine – it’s a basic back-and-forth. But consider the place where some of the most dynamic multi-person asynchronous conversations are going on: in the comment streams of blog entries. Here you have multiple forking paths, hopping back and forth between earlier and later remarks, people sticking close to the thread, people dropping in and out. But again, you have the tyranny of the vertical column.
We’re using an open source platform called Drupal for our NextText project, which has a blog as its central element but can be expanded with modular units to do much more than we’re able to do here. The way Drupal handles comments is nice. You have the usual column arranged chronologically, with comments streaming downward, but readers have the option of replying to specific comments, not just to the parent post. Replies to specific comments are indented slightly, creating a sort of sub-stream, and the the fork can keep on going indefinitely, indenting rightward.
This handles forks and leaps fairly well, but offers at best only a partial solution. We’re still working with a print paradigm: the outline. Headers, sub-headers, bullet points. These distinguish areas in a linear stream, but they don’t handle the non-linear character of complex conversations. There is always the linear element of time, but this is extremely limiting as an organizing principle. Interesting conversations make loops. They tangle. They soar. They sag. They connect to other conversations.
But the web has so far been dominated by time as an organizing principle, new at the top and old at the bottom (or vice versa), and this is one the most-repeated complaints people have about it. The web favors the new, the hot, the immediate. But we’re dealing with a medium than can also handle space, or at least the perception of space. We need not be bound to lists and outlines, we need not plod along in chronological order. We could be looking at conversations as terrains, as topographies.
The electronic word finds itself in an increasingly social context. We need to design a better way to capture this – something that gives the sense of the whole (the big picture), but allows one to dive directly into the details. This would be a great challenge to drop into a design class. Warren Sack developed a “conversation map” for news groups in the late 90s. From what I can tell, it’s a little overwhelming. I’m talking about something that draws people right in and gets them talking. Let’s look around.