Collaboration, Openness, and Preservation: An NDSA Interview with Dave Rice
We are very excited to talk with Dave Rice. He was awarded an Innovation Award from the National Digital Stewardship Alliance (NDSA) in 2016 for his creative work in bringing together people and organizations from different communities to result in useful standards and practices. Follow Dave’s work at dericed.com, and you can find all of our interviews with the NDSA Innovation Award winners here.
We learned more about Dave and his work in the following interview:
You were selected for your work in advocating for a new working group and for the FFV1 and Matroska standards with the Internet Engineering Task Force (IETF). The IETF is sometimes ascribed nearly magical abilities for its success in working on the standards that hold the internet infrastructure together. It is not common for members of the digital preservation community to work directly with the IETF and its related groups. How did you go down that path, and did you find anything surprising about it?
Firstly, thank you for selecting me for the Innovation Award. I feel honored to be associated with the NDSA community in this way.
I should explain a bit of background to the work on FFV1 and Matroska. The Library of Congress’s Sustainability of Digital Formats defines several sustainability factors for digital formats and lists “disclosure” as the first consideration, which includes the “degree to which complete specifications and tools for validating technical integrity exist and are accessible.” In 2014 the PREFORMA project began to develop conformance checkers for a select list of open formats and included FFV1 and Matroska as their audiovisual selections. I worked within the PREFORMA audiovisual team on a conformance checker for audiovisual files, called MediaConch, and as part of the planning we saw that the quality of the specification work for FFV1 and Matroska would hinder development of a conformance checker for those formats. Thus at that point it made sense to coordinate users and developers of FFV1 and Matroska to approach an open standards organization about addressing further development of specifications building off the existing progress made by communities working on these formats.
The IETF (Internet Engineering Task Force) appeared to be the most suitable standards organization to work with. Developers from both Matroska and FFmpeg (where FFV1 was initially developed) were well familiar with the IETF from their work on other open video formats such as Ogg, VP8, Opus, and Daala. Furthermore, the values of the IETF align particularly well with Free Software principles, the LOC’s Sustainability Factors, and other digital preservation guidelines in that the IETF’s work and procedure is open, transparent, and participatory. In addition to the resulting specification documents being open, credible, and clear, the entire process is open to view including listserv discussions, chatroom transcripts, and recordings of meetings. Thus not only are the specifications of the IETF open but we can study the process, discussions, and debate that formed them.
For us, the path involved a lot of collaboration and learning along the way. We discussed the idea within the FFmpeg and Matroska communities to determine willingness, interest, and method to proceed with standardization. Initially we collaborated with the IETF Dispatch working group to determine the best way to proceed. We also met with IETF members at conferences, such as FOSDEM and VDD, to seek advice and refine our methodology. With this assistance, we drafted a charter for a working group and Tessa Fallon presented a proposal at an IETF conference. The charter was debated and adjusted, put to a vote, and ultimately approved. At this point, the CELLAR working group became active.
Can you explain how the IETF working group functions? Can you explain what you think will happen in this area in the next few years?
This is a link to the data tracker of the IETF’s CELLAR working group. There you will find links to our mailing list, historical information about the group, its charter, and its active documents. Although active documents are listed in the working group’s website, the efforts to revise the specifications happen in GitHub with related conversation on the listserv. Presently there’s a GitHub repository for Matroska, for EBML (a binary XML format on which Matroska is based), and FFV1.
The working group charter includes a timeline; though the working group is currently behind schedule, we are making active progress. The objective of the working group is to achieve the the goals outlined in the charter, namely to submit specifications for FFV1, Matroska, and FLAC to the IESG (Internet Engineering Steering Group) for approval. Anyone is welcome to join the working group’s listserv and participate.
What do you think the digital preservation community can learn from what you did with the IETF?
Although it is not common for members of the digital preservation community to work directly with the IETF, it is in the interest of the digital preservation field to foster more involvement in related standardization work. Rather than waiting for the creation of standards to adopt in preservation, it is in the interest of the preservation community to represent and advocate within standardization efforts to ensure that adequate attention is given to sustainability qualities required in long term format preservation. Often this is the case with the development of metadata standards (such as PBCore and PREMIS), but there is opportunity for more involvement from the community in the standardization of the formats that we will eventually steward. I consider the work produced by the Library of Congress on AS-07 and by the DPF Manager on TIFF as good examples of the digital preservation community’s active involvement in standardization efforts for file formats.
The world of libraries, archives and museums has many standards groups and standards of their own. Do you have any thoughts about how standards are most effectively developed within and across communities? How do you think innovation relates to standards?
Engaging with numerous stakeholder communities is critical to the sustainability of standards. As an example, last year I helped organize the No Time to Wait symposium in Berlin, which focused on the standardization efforts for FFV1 and Matroska. During the symposium, Reto Kromer and Kieran O’Leary presented on using those formats in film scanning and provided proposals and research on storing color data from the film scanning process in those formats. Additionally, Michael Bradshaw from Google presented on YouTube’s ongoing efforts to support the vast technical variety of incoming media in order to document and render color data effectively. It was very revealing to see that those managing the newest audiovisual media (YouTube uploads) and those managing the oldest audiovisual media (film formats) shared a common interest in standardizing management of comprehensive colorspace data within these formats and could collaborate on proposals.
Additionally, since the IETF working groups operate in open online spaces where those interested may join to watch or participate, the environment is welcoming to collaboration between communities with shared interests. Frequently in the working group I’ve seen contribution of expertise in areas where I wasn’t aware such expertise existed. Some standards organizations are closed or require subscription membership, limiting participation to a targeted community. These closed systems might stymie the potential for more diverse and innovative contributions.
In the preservation community, the existence or presumption of a ‘standard’ may sometimes discourage innovation in that potential participants view the work as already complete or are concerned that additional development might compete with an established standard in a way that compromises its adoption. For example, the recommended practices for the storage of analog media moved from one format to another as technological advancement offered new opportunities; however, in the migration from analog to digital formats there is sometimes less acceptance in the adoption and integration of these new formats. Best practices based upon technology should be considered to have expiration dates and be approached more skeptically as they age. A best practice should not be considered as the edge to our innovation, but a reference point from which improvements can be made. Working within the context of a standards organization ensures that the work to improve a standard develops in a controlled environment thus protecting standards from tumultuous changes while simultaneously maintaining an environment of transparency and consensus.
On your website you say your work has been focused on “independent media”. Can you talk about that term?
I learned this term at my first full time job as an archivist at Democracy Now, a daily, independent news program. Over the last few decades media consolidation has led to fewer and fewer companies controlling significant parts of the media and those companies often have financial stakes in sectors beyond media. For instance a company may own a news network to cover climate change but that company also profits from environmental deregulation or a company may own a news network to cover threats of war but also profits from the sale of military hardware. Independent media is more independent from the influence to maximize profits or distort reality. For independent media organizations the focus is more wholly on providing media as an offering to the public as opposed to providing the public as an offering to its advertisers.
On your website (dericed.com), you label yourself as an “archivist” and “technologist”. How do you use the term “technologist”? Is “technologist” a term we should be expanding in the digital preservation community?
I think my consideration of this term comes from my education at the Selznick School of Film Preservation. The education here gave a strong impression that the meeting preservation challenges depends not only on following practices but also understanding and controlling the technology involved. This becomes particularly important when the technology we require for preservation is obsolete and debugging becomes a central part of the process. There are many areas where the technology available to us is not sufficient and that those in preservation need to discover and create their own technologies. So I think I use the term to mean someone that both knows how to use certain technologies as a tool but also knows how or when to create such tools.
In your first blog post on your website, you say (in bold) “Unplayable and broken digital media may be fixed just as an unplayable film print may be fixed.” Why did you put this in bold? Can you talk about the parallels between analog and digital in your work? Can you talk about the challenges of perceptions in this area?
This is one area where I wish my education was different as it focused on the differences between analog and digital formats when I now think there are more parallels than realized. At the time I was in school I was more in tune with film preservation communities rather than digital preservation communities and there was a lot of skepticism towards digital formats and feelings of security in analog formats. I think perspectives like this slowed the progress of the community as so much time was spent trying to avoid or stall digital workflows rather than innovating in digital environments.
I had gotten a side job doing audiovisual restoration for a producer who recorded video on a camera that wrote digital files onto solid state card. He had accidentally deleted the card and when trying to use data recovery services on the files could only recover portions of QuickTime files but none of the file headers that are needed to decode the file. I worked to discover the encoded contents (mpeg2 and pcm audio in this case) and developed a process to chisel the audio and video out of these broken files so that I could recover the recording. It was thrilling to take a pile of malformed data and recover a presentation from it. This seemed very similar to my work at school prying through decomposing nitrate film and repairing edge damage and splices. Although the tools are different, there are more analogous opportunities in audiovisual preservation, whether analog or digital, than I think many realize.
Can you explain what you see as the relationship between computing hacking and archiving, and how this has benefited organizations such as AMIA (Association of Moving Image Archivists)?
Audiovisual archiving is so dependent on obsolete unsupported hardware (video machines, etc) that we must hack to support them ourselves. I think the need is clearer with analog formats and our field is accustomed to us opening video decks and tinkering in order to improve preservation possibilities. I worked with an engineer who modified a U-matic video player to have an option to disable the sensor that detects the end of the tape so that tapes with extreme shedding could cautiously be played back without triggering the machine to rewind. I’ve seen projects to convert video decks into cleaners or to sand down sprocket wheels in film transfers to accommodate shrunken film. Analog audiovisual hardware was not created with the expectation of handling media in a highly deteriorated state and I think we should celebrate the analog tinkering and hacking done to better preserve media. On the other hand, sometimes the digital equivalent can be regarded as problematic or not credible, but I consider it essential that the community support its own hackers, working with analog and digital forms, so that we aren’t unnecessarily hindered by our own technology.
Do you think the challenges and problems you are working on will be different in 5 years or 10 years? In the next generation?
Yes, our challenges and problems change as technology progresses. Perhaps 15 years ago an archivist may have felt that copying audio onto Gold CD-R discs was in the interest of long-term preservation, but nowadays an archivist may examine a collection of Gold CD-Rs and determine that it’s a priority to migrate them to more suitable storage. I remember trying to make long term plans in my early days as an archivist and in retrospect much of the intent of those plans goes in the right direction, but the details and priorities obviously change. Furthermore, many of the challenges felt in archiving 10 years ago are different because we’ve improved solutions for them. Online collaborative technology spaces such as GitHub have really helped archivists collaborate and support each other to address challenges.
I find that acknowledging that our systems are temporary helps in long term planning. The collections and the metadata about the collections should be the core of what we work to sustain, describe, and make accessible. The systems that we use to manage those collections and metadata should be replaced or improved upon as needed. Although the collections may need to be permanent, the systems do not need to be.
Based on your work and areas of interest, what kinds of work would you like to see the digital preservation and stewardship community take on?
I would like to see more adoption of and support for open source tools within preservation workflows, particularly digitization. I know that the digital preservation community has worked to contribute to, sponsor, or integrate open source tools to facilitate access to digital collections; however, in many areas we still use proprietary or closed systems for digitization that we have limited control or understanding over. I’d like to see more advocacy from the community for open software development kits for digitization hardware (such as scanners and audiovisual digitization cards) and support for open source digitization software that accounts for preservation principles.
For audiovisual digitization in particular, the community had often adopted production tools for videotape digitization such as Final Cut 7 and Live Capture Plus. Since videotape digitization is no longer part of most production workflows, the communities that support such software have dropped support and moved on, all while the preservation community is more urgently in need for such tools.
On another note, I’m glad to see more and more digital archives implementing microservice approaches to design and implement workflows for processing digital collections, rather than wholly creating workflows from the options provided by a monolithic system. I’d like to encourage more discussion on describing how archival packages are organized and how we may better create microservices that are interoperable rather than system-specific. Dinah Handel wrote an excellent blog on this topic at http://ndsr.nycdigital.org/check-your-aip-before-you-wreck-your-aip/.
Can you suggest other people who are doing interesting or innovative work that you think might be of interest to the digital preservation community?
Overall I think interesting and innovative work is becoming much more accessible to the digital preservation community in that more people are working in environments that encourage collaboration or working in open, online spaces. There’s several active projects in the AMIA Open Source github account that reflect innovative work of the audiovisual archiving community, such as ffmprovisr, vrecord, and open-workflows.
I’d also recommend following the NDSR program. The project’s mission is to “build a dedicated community of professionals who will advance our nation’s capabilities in managing, preserving, and making accessible the digital record of human achievement.” I think the focus on developing “capabilities” is an urgent need and the program is doing well to support and empower residents to focus on preservation challenges and to research and innovate accordingly.