In my role as a student researcher at Digital Scholarship Services, I worked through Text Analysis with R for Students of Literature, a workbook by Matthew L. Jockers, a professor of Data Analytics at Washington State University. Having completed the workbook, I turned to Jockers’ book Macroanalysis for more context about analyzing trends and patterns in literature. Macroanalysis is not a quick avenue to analysis; it isn’t a foolproof book that solves all of the problems of literature scholarship, and it isn’t trying to be. Although critics of the method of distant reading accuse the practice of being too simplistic, Jockers makes sure to go through each technique methodologically, running through each of its flaws and strengths. Jockers is a true statistician, as he makes sure to properly explain his results in context. However, it is fair to criticize the lack of meaningful conclusions he is able to make.
Jockers says that the macroanalytic techniques described are best used to inform close readings. This already in itself dispels a common criticism of Jockers, that he is trying to replace and discourage traditional literature analysis. In fact, that is the opposite of what he is doing; he is trying to create methods that provide a holistic approach to literature analysis.
One method Jockers describes is analyzing metadata in order to find trends in genres, locations, and others. “We will better understand the context in which individual texts exist and thereby better understand those individual texts” (27), Jockers writes. He plots Irish American fiction in terms of how many were published by gender, region, and year. However, I’m not convinced by this analysis. The sample is limited to only writers that are published; what about writers that were not published? A limited sample means a limited conclusion.
What does interest me is when Jockers deals with analyzing the texts themselves. Jockers talks about the number of texts by Irish American authors that use ethnic markers for Irish characters. To me this is a more compelling topic than what was being published. This argument, instead of trying to generalize about an entire population of Irish writers, doesn’t rely on defining a large population in order to be meaningful. The scope of arguments based on textual analysis are smaller and easier to define than those based on entire populations that are impossible to sample accurately. Other examinations Jockers makes based on explorations of the text include lexical richness, comparing vocabularies (particularly pronouns) of female versus male authors, and topic modeling.
However, text analysis can get muddled and confusing. For example, Jockers tries to find classify an author’s writing style using algorithms that analyze their corpora by creating a euclidean distance between books, in order to label an anonymous text. This, to me, has little practical use. As a data scientist, when do you find a book or text where you don’t know the author? Perhaps a book would be mentioned in archived correspondence, or perhaps an untitled manuscript might exist in an archival collection. But Jockers’s algorithm works best with a large corpora by a published author. It may be interesting that Jockers is capable of identifying an anonymous text, but he doesn’t offer a practical example.
The discussion in macroanalysis of topic modeling, a buzzworthy area of digital scholarship, is helpful. However, I came away slightly confused. Jockers explains that topics in books are determined by an algorithm and are not human generated, yet he checks these topics across multiple books. That doesn’t quite make sense to me, but I might be misinterpreting what he is actually doing. Regardless, I do like the idea of comparing books by certain themes (clusters of words). This is something that can help give a close reading more context, which is where the strength of macroanalysis lies.
I appreciate that Jockers is consistent in understanding the limits of his methods and the importance of expanding the sample size of the corpi he uses. Although some of his outcomes are quite confusing and possibly irrelevant, I trust his process because he makes his intentions clear. Much of the criticism I make is based on the practicality of the methods described, not the methods themselves. The thoroughness of his practices are clear, and I do believe his results might be accurate. My question is, how much do they reveal to us that we otherwise wouldn’t know? My assessment is that when Jockers focuses on working with text, new analyses that have practical purposes are revealed. When he is working with broader metadata that depend on a full sample, I am less convinced.