The Many Forms of Endangered Data: Notes from Michigan

This post is one of several we are publishing about this year’s Endangered Data Week. This contribution comes from Justin Schell, Director of the Shapiro Design Lab at the University of Michigan Libraries.

On Saturday, March 3rd, I participated in a panel discussion organized by the Wayne State University (WSU) National Digital Stewardship Alliance (NDSA) & Association for Information Science and Technology (ASIS&T) about the many different forms that endangered data can take, and what is being done to try and address these challenges across many different fields. The panel consisted of Kimberly Schroeder (Coordinator of the Archival Program and Lecturer at WSU); Dr. Katherine Akers (Biomedical Research Data Specialist at the Shiffman Medical Library at WSU); Dr. Laura Sheble (Assistant Professor at WSU’s School of Information Sciences); and myself, Director of the Shapiro Design Lab at the University of Michigan Library and member of EDGI, the Environmental Data & Governance Initiative.

I’ll briefly summarize our conversation in this post, but you can view the full video (which consists of our slides and audio) at the bottom of this post.

Each panelist took a different approach to the idea of endangered data (and the means of addressing those challenges), and how it intersected with their own work.

I went first and gave an overview and background of the larger “Data Rescue” movement that began in December of 2016 and has evolved through 40 different archiving events and a broad coalition of librarians, archivists, data professionals, and concerned citizens helping to preserve access to public federal information. I discussed some lessons learned over the past year, ways that people can stay involved (by using Chrome and Firefox extensions to automatically save pages to the Internet Archive’s Wayback Machine), and a number of new directions that have emerged out of these events, directions that are summarized in this one-year review document.

Katherine Akers discussed a project she did with her WSU Library colleagues to better understand what kind of federal data was highly used and/or valued by WSU faculty. Much of this came from the biomedical field, and the data that faculty most frequently reported was the National Center for Biotechnology Information. The challenge, as Akers noted, is how to judge the vulnerability of these datasets, with some having mirrors and large-scale preservation plans, while others more vulnerable due to lack of funding and/or infrastructure. Her presentation ended with a call to the librarian and archive community to better understand and address this vulnerability question, in order to better ensure continued access of important public data to researchers.

Kimberly Schroeder focused on format obsolescence in her presentation, and the challenges of preserving (and sometimes even loading or playing) content that is either born-digital or from an earlier moment in the digital era (think floppy disks and Jaz drives). A great many number of archives, she argues, are not set up to handle these kinds of materials, and we often don’t know what we’re missing because we can’t get them to load or play, much less assess the contents and their importance. However, this isn’t just a question of archaic formats; compact discs can range wildly in terms of quality, from gold discs to the cheapest recordable CDs, some of which will no longer play. With information trapped on these pieces of media, Schroeder voiced a concern shared by many in the digital preservation and cultural heritage field, that future generations of scholars will be severely limited as they try to study the eras documented within such increasingly inaccessible information.

Laura Sheble took a broader philosophical view on the question of endangered data, asking not just about endangered data, but also data that endangers people in its collection and use. She illustrated much of this in a discussion of a quasi-viral (and inaccurate) statistic that nearly 50% of people in Detroit couldn’t read, which began as a 1992 study, was picked up again in 2011, and was seen again as recently as 2017. Sheble argues that, beyond the damage that such a false claim does for the people of Detroit, data hasn’t been collected well enough to actually measure literacy rates successfully. In a parallel vein, Sheble discussed the changing data practices within the health field, including electronic medical records that were built around billing practices (and the consequences of that for how data is gathered, saved, and made available) and what the shift to both personalized and incentivized medicine (FitBit and other activity trackers serving as proxies for health) means for accessibility of data, to doctors, researchers, and patients.

You can view the full discussion, along with the slides of each presenter, with the video below.