Introduction: What is Voyant Tools and Why Should I Care?
I’d like to introduce you to Voyant Tools because Voyant is still forgetting to introduce itself. In Alexis Priestley’s earlier tutorial on the site, she noted that a link on Voyant’s homepage for further information was broken. That is still the case, but other resources are available on Voyant’s documentation site if you wish to learn more, including background information about the site’s development.
Voyant Tools is an open source, web-based “suite of analysis and exploration tools for digital texts.” The site allows users to upload a corpus or use one of Voyant’s pre-existing text collections, such as the works of Shakespeare. After uploading a corpus, users may apply various tools that reveal information such as word frequency and collocation.
The site was developed by Stéfan Sinclair, Associate Professor of Digital Humanities at McGill University, and Geoffrey Rockwell, Professor of Philosophy and Humanities Computing at the University of Alberta, Canada. The About page lists additional contributors to the site.
Now, onto an important question you may be asking:
How can I use Voyant Tools in my own research?
To answer this question, I’ll turn to the Examples Gallery in Voyant’s documentation site—which lists and links to a number of projects that have used Voyant Tools—in the hope that these examples can serve as models or inspiration.
A Sample from the Gallery: Voyant Tools in Scholarship
The gallery lists and links to examples from several categories: Examples of Voyant in Research, Critical Approaches to Digital Humanities and Voyant Tools, Conferences and Workshops, and Examples of Voyant in Teaching. The most recent examples are from the spring of 2013, so it seems the site managers may have stopped compiling them.
The most recent example from the Research section comes from the British Library’s Digital Scholarship Blog, which includes digital scholarship “contributions from colleagues across the Library and special guests.” The blog post, titled “On metadata and cartoons” and written by James Baker, describes a project in which Baker gathered metadata for a large number of British cartoons from the 1960s and 70s and analyzed it using a variety of tools. Baker used Voyant for topic modeling, which Miriam Posner describes as “a method for finding and tracing clusters of words (called ‘topics’ in shorthand) in large bodies of texts.” (Interact with Baker’s corpus here.)
Baker anticipates the question often posed about the results of text mining: “[W]hat did I actually discover in the data?” Examining a word cloud derived from Voyant’s Cirrus tool, Baker suggests that “[t]he themes of the cartoons in the corpus track the politics of the day.” He argues that “textual content within cartoons during the same period tended toward natural language.” Baker also uses Voyant’s Word Trends graphs to track the relative frequencies of key terms in the cartoons under examination.
While Baker derives useful information from Voyant Tools, he also acknowledges its limitations, including that it “can only handle text not text vs. date” and that he mostly “discovered what [he] expected to discover” from the data. In spite of these limitations, Baker calls Voyant “perhaps the easiest data discovery tool for newcomers to get to grips with.”
Baker’s project and others from the gallery indicate the kinds of projects to which text mining, and Voyant in particular, is suited. Ideally, one should have a large corpus of texts. In addition to topic modeling, text mining can be used to visualize citation patterns in academic publications, track word frequencies across a corpus, and classify texts. (Here is a list of examples of nonacademic applications of text mining.)
What You’ll Find Ahead…
I present here a tutorial on the use of Voyant’s “Knots” tool, which can be found in the Tools Index. As Knots’ documentation indicates, it “represents a corpus as a series of twisted [colored] lines.” The number of times the lines intersect represents their degree of linkage in the corpus. I can modify the terms to be analyzed as desired. For instance, I can remove some of the terms from the visualization by clicking on them.
For a very helpful primer on Voyant Tools’ basic or default functionalities, please see Alexis Priestley’s guide for beginners. Before describing the new features I’ll be demonstrating, I’ll quickly overview the site and the services it provides.
For this tutorial, I’ll be using part of a corpus I’m compiling for a larger research project. This project examines recent landmark cases of digital image manipulation in scientific publications and the language used to discuss image manipulation in specialist (i.e., scientific journals) and nonspecialist venues. Here I’m using a set of image submission guidelines and editorials about image manipulation published by scientists and intended primarily for other scientists.
Introduction Part II: Voyant Basics
The homepage, shown below, provides a text box in which the user can paste the text to be analyzed directly or URLs linking to documents within a corpus. The site accepts file formats for upload including plain text (.txt), HTML (.html), XML (.xml), MS Word (.doc, .docx), RTF (.rtf), and PDF (.pdf).
For this demonstration, I’ll explain the process of uploading files rather than pasting text directly into the text box. To do this, I first click on the “Upload” button in the lower lefthand corner, which brings up this interface:
To add files to my corpus, I click on the “Add…” button, which opens the file manager on my computer. I then select the first file I want to add to my corpus. Unfortunately, it seems I can add only one file at a time. After adding several files to the corpus, I see the following list. (Note that, after adding about nine files, you can’t scroll down to view all the files in the list.)
After I’ve uploaded all of my files, I click the reveal button, which brings me to Voyant’s main interface:
Cirrus (Word Clouds) and the Role of Stop Words in Text Mining
The above interface includes Voyant’s default analysis tools. Although these are not the focus of this tutorial, I’ll use Voyant’s word cloud feature, Cirrus, to demonstrate the effect of applying stop words to a corpus. Word clouds enable easy and intuitive visualization of the most frequently occurring words in a corpus and can help identify the most important or prominent terms for topic modeling or citation pattern mapping.
Before applying stop words, I can view the enlarged Cirrus display by clicking here:
In the enlarged version of the word cloud, I can clearly see a number of common words that are not relevant to my analysis:
Applying a stop word list to a corpus excludes certain words from appearing in visualizations like Cirrus. Including common words, like “the,” which do not contribute useful information to the analysis, can skew visualizations and obscure more interesting results. To omit these words from my analysis, I return to Voyant’s main interface and click here:
This brings up the following menu:
I click on the drop-down menu, which opens a list of stop words lists:
For my analysis, I’ll choose the English (Taporware) stop words list.
However, this list still leaves some undesired words behind. If I return to Cirrus, these words become apparent.
I want to exclude the terms “http,” “et,” “al,” and “doi” as well, so I must manually add them to the English (Taporware) stop words list to create a custom list. Back in the stop words menu, I click on “Edit Stop Words.”
This brings up the following menu, which displays all stop words in the list. I’ve added a title for the modified list I will create in the “Save List as” box at the bottom.
I then navigate to the bottom of the list, where I can enter additional words to be excluded from analysis.
After saving my modified list, I return to the previous menu, where I check “Apply Stop Words Globally” to exclude the selected words across the corpus and across different tools.
Now, when I return to the Cirrus tool, my word cloud contains only those words relevant to my analysis.
Now that I have uploaded a corpus and applied stop words, I can examine some of Voyant’s other tools.
Digging Deeper into Voyant’s Tools Index
To find a list of Voyant Tools’ other features, simply navigate to the Tools Index.
During my exploration of the Tools Index, the screenshot for the “Knots” tool struck me as a particularly interesting and unconventional method of visualizing text.
Loading a Corpus into a New Tool
I open Knots by clicking the “use it” link, which opens the following page.
Notice that the page is identical to Voyant’s homepage except for the addition of the /tool/Knots extension in the URL. At this point, you may be letting out an exasperated sigh, thinking you have to upload your corpus again. Not to worry, though. While you certainly can begin from this page and upload a corpus, you can also pull a preexisting corpus into this tool (and any other tool on Voyant). To do this, I return to the main interface—the one I saw just after uploading and revealing my corpus.
Now I can click on the button, labeled “Export,” in the upper right hand corner.
This brings up several options for navigating to the current corpus.
For this tutorial, I’ll use the URL option, which generates the following.
I have thus generated a URL that I can bookmark and use to easily access the same corpus in other tools. To transfer this corpus to another tool, I select and copy the portion of the url that follows .org/
Now, if I navigate back to Knots, I can access my corpus by pasting the corpus ID at the end of the URL, like so.
Notice that the URL contains a “stopList” ID so that I do not have to recreate my custom list in Knots. After entering the URL, press Enter to arrive at the Knots interface.
As Knots’ documentation indicates, each of the colored lines represents the word of the corresponding color in the upper left hand corner of the interface. The number of times the lines intersect represents the corresponding words’ degree of linkage in the corpus.
I can modify the terms to be analyzed as desired. For instance, I can remove some of the terms from the visualization by clicking on them.
I can also add terms, perhaps using my previously generated word cloud as a reference. To do this, I enter the terms into the box labeled “Find Term,” as shown below, and press Enter.
After I add terms, they appear as new colored lines. I can then activate or deactivate whichever terms I want to include or exclude from my analysis by clicking on them in the upper left.
I can also change three other attributes of the visualization: “Build Speed,” “Starting Angle,” and “Tangles.”
Changing Attributes of the Visualization
Build speed modifies how quickly the lines form and may help the user better see exactly where overlaps occur. Counterintuitively, the higher I place the slider, the slower the lines form. In the demonstration below, I’ve placed the slider at about the middle.
Starting angle determines the angle at which the lines form. It seems the best starting angle depends on the number of terms under consideration. With a smaller set of terms, a smaller angle can make it easier to see closely related terms. A larger angle can prevent visual clutter when examining a larger set.
Finally, Tangles changes the number of twists in the lines. Unfortunately, Knots’ documentation does not seem to make clear what non-overlapping tangles signify. They may represent parts of a document in which a term appears frequently in quick succession.
What Can Knots Tell Me?
Knots can reveal helpful information about collocates, which can be useful for visualizing citation patterns, for example. In the example I’ve used above, and in this example using one of Voyant’s preexisting texts (Jane Austen’s Persuasion), Knots can simply show which terms tend to occur together and separately. As James Baker points out regarding Voyant’s main interface tools, Knots may simply help you identify useful directions for further investigation or contradict a hypothesis about the texts under consideration. For instance, I would’ve expected a greater degree of collocation between the words “image” and “manipulation” in my corpus, but Knots seems to contradict that assumption.
Although Knots is an interesting and unusual way to visualize text, interpreting its results is perhaps less intuitive and clear-cut than those of the tools on Voyant’s main interface. As you will likely find if you explore the Tools Index more, difficulty interpreting Voyant’s tools often derives from insufficient or ambiguous documentation. At the end of each tool’s documentation page, there is a form in which users can leave comments. Providing feedback for the developers may lead to improvements and enrich the variety of available tools.