Category Archives: corpora

clifford lynch takes on computation and open access

Academic Commons mentions that Clifford Lynch has written a chapter, entitled, “Open Computation: Beyond Human-Reader-Centric Views of Scholarly Literatures” in an upcoming book on open access edited by Neil Jacobs of the Joint Information Committee. His chapter, which is available online, looks at the potential computational analyses that could be formed by collecting scholarly literature into a digital repository. These “large scholarly literature corpora” would be openly accessible and used for new branches of research currently not possible.
He takes cues from the current work in text mining and large scale collections of scholarly documents, such as the Persus Digital Library hosted by Tufts Unviersity. Lynch also acknowledges the skepticism that many scholars hold to the value of text mining analysis in the humanities. Further, he discusses the limitations that current intellectual property regimes place on the creation of a large, accessible scholarly corpora. Although many legal and technical obstacles exist, his proposal does seem more feasible than something like Ted Nelson’s Project Xanadu because the corpora he describes have boundaries, as well as supporters who believe that these bodies of literature should be accessible.
Small scale examples show the challenges Lynch’s proposal faces. I am reminded of the development of meta-analysis in the field of statistics. Although the term meta-analysis is much older, the contemporary usage refers to statistical techniques developed in the 1970s to aggregate results from a group of studies. These techniques are particularly popular in the medical research and the public health sciences (often because their data sets are small.) Thirty years on, these methods are frequently used and their resulted published. However, the methods are still questioned in certain circles.
Gene Glass gives a good overview of meta-analysis, concluding with a reflection on how the criticisms of its use reveal insights on fundamental problems with research in his field of education research. He notes the difference in the “fundamental unit” of his research, which is a study, versus physics, which is lower level, accessible and generalizable. Here, even taking a small step back reveals new insights on the fundamentals of his scholarship.
Lynch speculates on how the creation of corpora might play out, but he doesn’t dwell on the macro questions that we might investigate. Perhaps it is premature to think about these ideas, but the possible directions of inquiry are what lingered in my mind after reading Lynch’s chapter.
I am struck by the challenge of graphically representing the analysis of these corpora. Like the visualizations of the blogosphere, these technologies could not only analyze the network of citations, but also word choice and textual correlations. Moreover, how does the body of literature change over time and space, as ideas and thoughts emerge or fall out of favor. In the humanities, can we graphically represent theoretical shifts from structuralist to post-structuralist thought, or the evolution from pre-feminist to feminist to post-feminist thought? What effect did each of these movements have on each other over time?
The opportunity also exists of exploring the possible ways of navigating corpora of this size. Using the metaphor of Google Earth, where one can zoom in from the entire Earth down to a single home, what can we gain from being able to view the sphere of scholarly literature in such a way? Glass took one step back to analyze groups of studies, and found insight on the nature of education research. What are the potential insights can we learn from viewing the entire corpus of scholarly knowledge from above?
Lynch describes expanding our analysis beyond the human scale. Even if his proposal never reaches fruition, his thought experiments revealed (at least to me) how knowledge acquisition occurs over a multidimensional spectrum. You can have a close reading of a text or merely skim the first sentence of each paragraph. Likewise, you can read an encyclopedia entry on a field of study or spend a year reading 200 books to prepare for a doctoral qualifying exam. However, as people, we have limits to the amount of information we can comprehend and analyze.
Purists will undoubtedly frown upon the use of computation that cannot be replicated by humans in scholarly research. Another example is the use of computational for solving proofs in mathematics, which is still controversial. The humanities will be no different, if not more so. A close reading of certain texts will always be important, however the future that Lynch offers just may give that close reading an entirely new context and understanding. One of the great things about inquiry is that sometimes you do not know where you will end up until you get there.