Julia Crawford '19 reflects on her work with the Alan Justiss Project.
The Alan Justiss Project team is comprised of a group of students and faculty at Lafayette College. Team members in neuroscience, psychology, linguistics, art, and digital humanities bring their combined knowledge to analyze a vast source of rich data: a collection of over 13,000 poems by Alan Justiss, a late poet based out of Jacksonville, Florida. As the poems were written between 1992 and 2010 on a typewriter, the initial goal of the project was to digitize the collection.
On our first day, we were greeted by mountainous stacks of poems to sort through; our first task was to chronologically organize them. While Justiss was organized in his creating of the poetry (he dated and numbered nearly all of them very methodologically), the way in which they arrived to us was rather disorganized. We initially had to paperclip all the corresponding dates together, as they were in a random order. In one stack, we would find one random week in October of 2002, and then all of a sudden you would turn to July 1997. In addition to grouping them by date, the poems were organized in the order in which he wrote them (e.g. the last poem he wrote in a series would be on the top, and number one would be on the bottom). Sometimes poems would be missing from a certain day, but even days or months would not appear in our collection. Several thousand paperclips and roughly four months later, the poems were chronologically organized into file boxes.
Eventually, once all the poems were organized, we started a spreadsheet to inventory them. Each year had its own file box, and we would take the poems out and record the date, the number of poems written, and then the actual physical number of poems we had available to us (since we were missing some, and sometimes had undated and unnumbered extra poems). By doing the inventory after chronologically organizing the poems, we could double-check and make sure nothing was organized incorrectly. After the poems were inventoried, we began to scan them (which produced PDFs).
At first, we were not sure about the method in which we wanted to scan the poems. We had the option to scan with a state-of-the-art scanner, the BookEye, and a more regular method of scanning using an ordinary copy machine. We did several tests using the BookEye, but we could only scan one page at a time, and decided that we wanted quantity over quality. We used a scanner that could scan around 50 pages at a time, which was helpful, seeing as we had over 13,000 poems to record digitally. The entirety of the scanning work was done over three months.
We are currently in the stage of converting the PDF files (which serve as digital copies of the original physical works) into text files. We convert the poems from PDFs to Word documents through a program called ABBYY (ABBYY is the name of the software company, and the program). It converts the symbols of the typed letters into a text file. However, it did present some issues as you need to train it to be familiarized with the font. In our case, since the poems were written on a typewriter, there are ink splotches and other errors it had to become accustomed to. There was also the issue of letter replacement. Occasionally in Justiss’s collection, a certain letter would break on his keyboard, and he would replace it with a different symbol. Most notably, he replaced Es with dashes (e.g. th-) for a period of time, and Rs with 5s at another time. These replaced letters have to be entered manually into the RTFs, as training ABBYY to recognize them as an entirely different letter would not be beneficial when those symbols (such as the hyphen) were used in their original context. For some years the type-script is less faded than others, and ABBYY has an easy time converting them. In other cases we have to switch to hand-transcribing the poems because ABBYY cannot read them.
The text files serve as our corrected and annotated versions of the poems. If we find something written that we cannot decipher, we underline it. When we come across wrong grammar, we change it to the correct grammar. When we find misspelled words, we correct them. We are doing this so that analytical software programs such as LIWC and WordSmith are able to “read” the poems. They have to be in correct English for that to be done effectively, and this will be required for our analysis goals. Once we have all the text files, they can easily be converted into text files, which is the format in which the text analysis software will be able to “read” the poems. From there, LIWC and WordSmith can help us find patterns in Justiss’s written word, whether they be linguistic, literary, psychological, or otherwise.