Mining Text

One thing that scholars have been doing with all of the historical sources and literary works online these days is to apply tools of linguistic analaysis to the texts. For instance, if you take Woodrow Wilson’s Congressional Government from 1885 and look at weighted word frequencies through what is called a word cloud, you can see what sort of language he was using

wwpl blog.png

Using the scans from the old book is not perfect, because you can see that it captures split words like –tion. However, one can notice a clear focus on the particular details of organizing government and they relate to the concrete elements of life such as power, business and economics.

If we look more generally at Wilson’s style of communication, the picture is a bit different. The structure of our digital archive here at WWPL allows us to pull out the text of the scanned letters. So, from the roughly one thousand letters we have that Woodrow Wilson wrote over the course of his lifetime we see him still discussing power and government, but also food, people, and thoughts.

Things get a bit more interesting when we can track changes over time or in direct comparison. It takes a bit more work, but if we look at how much Woodrow Wilson wrote about schools in his first year as president compared to his references to business, we see that the former university administrator clearly had his attention on the nation’s economy in 1913.

wwpl blog 3.png

When we compare the first year correspondence to the letters that President Wilson wrote concerning World War I, obviously war has become much more prominent among the terms mentioned. The first thing that jumps out though, is how business has remained an important topic even as the country enters the war, or at least much more important than farmers or schools.

wwpl blog 5.png

This kind of work never replaces research in the archives, but it can give you a general idea of what is going on in massive amounts of text without having to read millions of pages. For instance Google Books can tell us that interest in Woodrow Wilson grew much faster than contemporary politicians.

Perhaps more interesting, though, is when scholars who are familiar with the texts and have read many of them, find patterns that they cannot explain, or at least new correlations that give them ideas for further research.