Mining HDMB
Tomorrow is the 10th anniversary of Hurricane Katrina, and I wrote a post for the RRCHNM blog post discussing the history and legacies of the Hurricane Digital Memory Bank, which I helped to create.
One advantage to collecting online first, is that we have thousands of sources available for close reading that are discoverable by browsing or key word searches. My colleague Mills Kelly has highlighted some of those individual stories on his blog today.
HDMD also provides content ready for computational analysis, and research can be done at scale. I hadn’t had the time to do any myself, until this week.I started with the HDMB databases in PHP MyAdmin and downloaded a few tables as CSV files.
At first, I wanted to some basic calculations and summaries of the contributors to include in the About section of the project. For researchers, it is important to know whose voices are speaking and represented. As a project creator, I wanted to see where we succeeded and failed in outreach efforts. We asked each online contributor a few optional demographic questions so that we could better track who was sharing stories, photos, and other digital materials with us. What I found is that few contributors shared any of that demographic information with us (gender, race, year of birth, occupation). We also asked for contributors to share the location or zip code of where they were during the storms, and then after the storms, to get a broader sense of the migrations. All of these questions were optional, intentionally.
The project team debated these issues intensively. As historians, we wanted to collect some general demographic information about the contributors. We also did not want a long list of required questions in an online form to discourage someone from submitting a story. Balancing out those needs was tough and we decided that collecting the reflection or the photograph was a priority.
The next table I examined contained the full list of all items, and I extracted the descriptions. In the second iteration of the site, completed in 2006, we mapped text from “stories”, and image and other file descriptions to the Dublin Core description field.
I uploaded that CSV into the Voyant Tools to surface word patterns and trends across the 25,000 + digital items. In addition the ubiquitous word cloud, I could also see word frequencies and view relationships of terms in context with others.
Not surprisingly, place names featured prominently in individual contributions. To look beyond the names of cities, parishes, or states, it is possible to create a list of “stop words†that removes those terms from the analysis. Without place names, it is possible to see that HDMB’s contributors frequently mentioned “people,†“home,†“house,†and “family.†By examining those keywords in context, it is possible to see how mentions of “house†relate to the descriptions of physical damage and destruction. While usages of “home†often discuss the emotions of leaving or returning to a damaged house or city. It is possible to identify other emotional terms, such as “loss†and “angry,†and see that “hope†is invoked as a verb and a noun more often than both loss and angry.
In thinking through what other patterns might become visible, I decided to run that corpus through a light-weight topic modeling tool.
At first, I ran all item descriptions and asked for 20 words for 20 topics.
As I ran the text, I continued to refine the stop word list. I noticed that contractions were split, so that “don”, “ll”, and “ve” were coming through.
Once I created a good stop word list, I decided to run a CSV with the Story item type only, and reduced the numbers of topics and terms per topic.
I was able to get a good hint at who some of the contributors were, as the terms students and school featured prominently in the stories. Students, teachers, and/or parents discussed how the storms effected their school years.
It is possible to see that this digital collection would be useful for someone interested in reading first-hand accounts about evacuations; life in temporary housing, such as in shelters or hotels; relief efforts; the challenges of returning home to deal with damages;Â the emotional challenges faced in the recovery process and the roles of families; the financial burdens faced by storm survivors; and the impact of local, state, and federal government in a disaster.
With these topic strings identified, I then drilled down and read individual text. In the earliest years of the project, I read many of the contributions but 10 year later, had forgotten so much of what I had read.
This exercise allowed me to rediscover some of the resources in the site. I also did a lot of old-fashioned browsing through the collections of photographs.
As I turned up topics, I really wanted to discuss these trends with Michael and Roy, both whom are now gone. I found that researching in HDMB was surprisingly emotional for me. I can imagine that this anniversary has been difficult for the millions of people who were intimately effected, and who are still feeling personal losses of varying degrees ten years later.