The Digital Toolbox – What’s the Right Tool for the Job?

My last three posts concern three very distinct digital tools I learned how to use for my course in Digital Public Humanities (DPH). I practiced with each of these tools using a set of files provided by my instructor that contained content pertaining to interviews conducted in the 1930s with former African American slaves. This information contained data (names, dates, topics, biographical information), metadata (categories used to define the data), georeferencing (longitude and latitude, street locations), and full text of the interviews. My usage of these tools revolved around the exploration of these documents and demonstrated how they might be used effectively (and ineffectively) to discover new perspectives on the same set of data.

In this post, I will offer a brief description of each tool, how I used it, and what I discovered about my dataset (a collection of data that can also be manipulated as individual units). I will then offer more direct comments on how these tools can be used together.

Voyant Tools

Voyant Tools is a web-accessible program that hosts a collection of text analysis/mining tools with minor visualization building capabilities. I previously wrote a post offering a brief guide on how to get started with these tools. Essentially, Voyant Tools allows the user to compose a corpus (body of documents) of text that is then run through computational analysis, conducting what is known as “distant-reading.” Once the corpus has been scanned, five default tools appear to present the analysis of data for interpretation. These default tools are: a word cloud, a reader view of full text, a word trends graph, a document summary, and a context reader.

When using this program for my data, which largely consisted of text, it became clear very quickly how relevant the functions were to my research. The word cloud gave me a simple visualization of the most common uncommon words across my corpus, the reader allowed me to identify the full text of keywords, the trends identified the rate and frequency of word usage, the summary organized theĀ  length of documents and highlighted distinctive words, and the context reader expanded on the immediate use of selected words. This allowed me to pinpoint trends across the 17 documents I had uploaded to the program within several hours as opposed to the days or even weeks it could have taken to do a close-reading of these documents.

To describe this tool as being used for analysis is accurate. The ability to identify trends as brought out by the algorithm is helpful because it can offer leads on research questions or be used to conduct research to produce novel and nuanced perspectives of the data that might otherwise go overlooked or unnoticed. However, to say it does “mining” is also appropriate. The former ability is good for when a researcher is searching for something they don’t know they’re searching for. But the latter is for the researcher who has full conviction. Using these tools when you know what you’re searching for is quite powerful because it will examine all the documents you want to include in the analysis and locate the desired information, on top of likely producing an excess of the desired data.

Speaking specifically about my documents, a key takeaway I gathered was how contextual the interview texts were. For example, usage of certain words were more prevalent in some interviews as opposed to others as they related to the location and experience of the interviewees. The word “Cherokees” was used more often in interviews from the state of Oklahoma compared to all other states. And this makes sense, for by the time these interviews were conducted–and even when the Civil War was occurring–the Cherokee Nation had been relocated to and established in Oklahoma. The text analysis yielded the ability to see these trends, compare them with others, and make informed assumptions about the dataset.

Kepler.gl

The kepler.gl program is a geographical information system (GIS) for geospatial (data that is tied to geographical locations and/or spaces) analysis that plots datasets onto a background to construct computational cartographic visualizations (maps) and can be manipulated to display, categorize, and reorganize said data in accordance with specified locality.

My post here details steps on how to get started with this program and its general functions. This tool provides the platform necessary to start building meaningful models with your acquired datasets. It allows you to use a variety of markers to plot the data, connect it with other points through lines/arcs, add or subtract geographical and topographical features that might inform or hinder a human analysis of the plotted data, compare maps side-by-side, utilize 3D perceptions of spaces, and install filters to define the data being displayed.

Using this program with my provided materials further increased my perception of how I can not only reinterpret my initial assumptions about said material, but discover new ways of understanding and approaching it. For example, plotting my data now gave me a geospatial frame-of-reference to see where exactly these interviews took place, where the interviewees came from, what spaces they gathered in, the time frame that these interviews occurred, and at what frequency did interviews happen across the target area. The questions I could imagine were no longer limited to inquires about the textual corpus, There was now a visual corpus to accompany my conceptualization that provided a more observable model, bringing the interviews into perspective with the rest of the world (or perhaps it brought me back into their world).

A specific example with the documents I was using to demonstrate this new layered perspective concerns distance and travel. The map allowed me to not only plot where the interviews took place, but also where all the interviewees were originally enslaved. This meant that I could observe where they came from (at least from before they were considered free from being enslaved) to where they ended up at for the eventual interviews. This raised several interesting questions such as: What motivated the interviewees to move to the locations they did? Is there a correlation between where they came from and where they chose to go? Did they intentionally gather into pocketed areas and form their own communities? Could we layer more data to plot a more literal migration pattern? Having a visual aid such as the map provided by kepler.gl allows these questions to be readily asked when text can now be depicted rather than just read.

Palladio

With the Palladio program, a user has several options. This tool is used to create both cartographic and network visualizations using datasets to place markers (plot points on a map) and nodes (points on a network graph) that depict the edges (connections, often in the form of a line drawn from one node to another) of related data. In this post, I provide a brief guide on using this program and a more in depth take on how I used it for my assignment.

This tool, while certainly not my favorite, did provide another new perspective with which to view my datasets on ex-slave interviews. As will be noted later, there are proper tools for each job. And this is so with Palladio. This program allows users to create both layered maps, similar to kepler.gl, and network graphs. The key to understanding this program and what it is good for is knowing how it works and what it means when we talk about networks. This is much more involved in the realm of computer science than I am prepared to speak on, but to summarize: networks are established through algorithms that compute the data we provide. This data is visualized through nodes and connected by edges. In these networks, the computational methods being used are highly specialized and are typically made to handle specific functions within its algorithm. With this in mind, we want to be aware of the purposes a tool is built for. Palladio is meant to relay, from what I understand, relational information about the datasets we input. In other words, it does have a map function, but we have better tools for that. The networking function is what we want to use if we are going to better explore the relationship between points of our data.

When entering data for this tool, it creates cluster markers to represent source data and target data. These two types of data can be filtered, defined, and even manipulated within the window view to expand or limit the amount of information associated with the marker. The edges then show us where the algorithm sees a connection based on our settings. For example, by selecting the interviewer for my source and the interviewees for my target, the diagram that appeared shows individual points for each interviewer and then individual points for each interviewee with a line connecting the interviewees to the center of the interviewer dot with each person who was interviewed by that person appearing to “orbit” the interviewer. Thus, we are seeing the relationship between the two as detected by our parameters set for the algorithm.

When conducting my research, this did add another layer of nuance for my dataset. This tool really drove home for me the need to understand what we are using to interpret our data. Though the network could account for geographical information, it wasn’t useful because it wasn’t plotted against any spatial reference. But when using qualitative elements such as characteristics of the people involved (age, gender, former slave position), the diagram showed lines that I was not always expecting. For example, one instance of the diagram suggested how certain interviewers had only interviewed either men or women, but not both. This raises a question as to why and how this occurred. Did the interviewer travel to areas where there were no male or female former slaves? Or did they intentionally avoid a particular gender when conducting their interviews? Did the interviewers even have access to areas where former slaves of either gender were located?

The Toolbox

A few years ago, I was on a different career path. I was a union carpenter’s apprentice and I was out building sheds, apartment complexes, skyscrapers, and footstools. I learned how to work with a variety of hand and power tools. I learned that every task has a proper tool to accomplish the job. Now that I am no longer in that field, I find that this sentiment still holds true.

Each of these tools can be used to coordinate research efforts and enhance the results garnered from one another. But in order to make effective use of these tools so that they compliment each other, we must be wary of not only how we use them, but how we perceive them.

Text analysis tools are good to get a sense of patterns, trends, and overlooked distinctions among large bodies of literature or data. But it cannot substitute meaning garnered from close-reading. The information presented needs to be contextualized so that it makes sense with all our other happenings outside of the algorithmic model generated by the computer. Text analysis does not provide the answers to research per se, but allows us to conduct further research and to ask questions in new ways or develop novel interpretations.

Mapping software provides form and structure to the abstract so we can increase our spatial awareness regarding our research, particularly if we’re not located in those areas. But they cannot recreate the lived experiences or the authenticity of place even if it is mapped out historically. Mapping software also needs to be contextualized within greater social dynamics to paint an accurate picture of the past and different geographic localities while working to undo things such as colonial narratives that have been embedded into cartography for many years. Essentially, maps cannot always be taken at face value.

And network graphing is interesting in that it comes close to achieving the goals of Humanists who seek to include and document the human experience within all systems, even computational ones. This type of tool allows us to visualize the abstract relationships we have founded our interpretations on and draw new connections that might have previously gone unnoticed. But it cannot (yet) graph the complexity of life or give meaning to the relationships identified by the computer. We cannot rely on the models to tell us fully what we need to know when it comes to the relationship between two entities and how they “weigh” against each other in terms of their importance.

Still, we can certainly use these tools to compliment each other. Because we are using them to discover new ways of looking at existing (or new) data, we can better refine the approaches we take. Finding trends through textual analysis means we can plot those onto maps to get the spatial reasoning behind such trends. Moving this to a network graph can suggest the relationships between those spaces that generated the trend in the first place. As an example, using the ex-slave narrative datasets for all of my assignments here really brought together a fuller picture to see how their experiences were unique to them and among their contemporaries (even if there was a lot of shared experience through oppression), how they gathered together in spaces close to each other, and that their relationships were identifiable through more than just their hardships, but in the very things they spoke about. Thus, these computational tools have the capability–a potential already being utilized–to further expand our ability to account for the experiences of those involved with these narratives. I believe the comparison of these tools demonstrate that we can push forward with qualitative means of doing computational work and discover other aspects with varying layers to better understand the context of our research. For it is not only us, right now, who are discovering new things, but those who will come after us too. They have yet to discover the things learned here so far and I believe these tools are bridging a gap that has widened over the years that sees qualitative approaches to research as being a thing of the past, misguided, and outdone by the digital sphere. Through these means, we can discover a bigger audience that we might not even know is there.

With these things in mind, let us always be wise with the tools that we use. As we continue to commit ourselves to the work of Humanists, remembering the “human” aspect will help us keep abreast of both the pitfalls and benefits of digital technologies.

About the author

Kyle

View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *