NDSA Member Profile: Sally Vermaaten of the Gates Archive

NDSA Member Profiles is a new collaborative series from the Interest Groups of the NDSA. The series is inspired by past NDSA traditions such as Insights Interviews, and aims to build on and expand these types of interviews with featured NDSA members to allow for better shared communication and collaboration around the work of digital stewardship and preservation. Topics range from member questions and insights for the NDSA community to sharing failures, discoveries, and anything else in between. If you or your institution is interested in being featured, please contact Lauren Work (lw2cd@virginia.edu) or Sibyl Schaefer (sschaefer@ucsd.edu). 

Sally Vermaaten is the Manager of Archive Solutions at the Gates Archive, where she leads a team that designs, implements, and maintains the technology and business solutions that support the work of the organization. Gates Archive has been an NDSA member since 2016. Sally joined the Archive in March 2017 and has become involved with the NDSA’s Infrastructure Interest Group.

Describe your position, and how you spend most of your working time

As the Manager of Archive Solutions, I lead a team responsible for the organization’s systems, infrastructure, asset management, digitization, reformatting of audiovisual materials, as well as strategic program management. My primary focus is on making sure that my fantastic team – which includes library and archive professionals, technologists, and a program manager – have the tools and support they need to do their jobs.

Another portion of my role is planning and executing programs of work to implement new technology, improve process workflows, and ensure we are maintaining core infrastructure including storage and archival systems.

Fostering a positive organizational culture – one where collaboration and professional respect are the norm and the team feels empowered to identify and make improvements – is also a part of my role. Work at the Archive is fast-paced, which makes a culture of trust and open communication especially important.

Do you have a ongoing or finished digital stewardship project that you are particularly proud of that you would like to share?

One recent project I am proud of is an email analysis pilot project currently underway. I am working with Archivists Kate Stratton and Martin Gengenbach, Systems Engineer Julio Lopez, and Application Developer Erik Hauck to test tools and develop workflows for analyzing and appraising email. We are particularly interested in methods for screening personally identifiable information (PII) and sensitive content that can scale to large email collections. Tools we are testing include Forensics Toolkit, ePADD, Microsoft Advanced eDiscovery, and AccessData’s Summation.

The project has required experimentation but the team has also made good use of the growing body of professional literature about email in archives – including webinars from the SAA Electronic Records Roundtable, resources from the University of Illinois System’s Processing Capstone Email Using Predictive Coding project and the Task Force on Technical Approaches to Email Archives Consultation Report Draft. Email is a key component of modern archival collections so it is great to see the profession sharing information and exploring how archivists might be able to use email ‘power tools’ that are actively being developed for system administrators, information security, and legal teams.

What are your current challenges working in digital preservation?

As part of ongoing management of our infrastructure’s health, we are revisiting the architecture of our ‘digital stacks’ storage. We have just kicked off a project to refresh storage projections and requirements and to evaluate potential storage providers. Setting up more robust storage policies and monitoring mechanisms is also an important part of the work to ensure collection materials are in the right types of storage and to ensure we are adhering to sound data management practices, e.g. deleting working copies and adhering to consistent packaging practices.

What have you found most beneficial from the NDSA community, and where do you think the NDSA has room to improve?

I have only recently become active in NDSA. I attended Digital Preservation for the first time in Pittsburgh this fall and was impressed by the outputs of NDSA’s groups – one highlight for me was a walkthrough of the results of the Storage, Fixity, and Staffing surveys. As those who conducted the surveys know, cross-institutional and longitudinal data on the state of digital preservation is valuable in many ways including benchmarking one’s own organization practices and gaining a more concrete understanding of the current needs of the field.

Being able to connect with colleagues about the nitty-gritty of digital preservation work is an obvious but key benefit of NDSA. As I learned from my involvement with a smaller professional group in New Zealand, the Digital Preservation Practical Implementers’ Guild, institutions charged with long-term preservation face comparable challenges but often in different sequences based on needs of their users and collections. This means there are opportunities to learn from institutions who have already developed models to handle similar use cases. The forum also proved to be a great place to discuss computing trends – such as the decline of the file –  that directly impact current and future digital preservation practice.

What recent digital stewardship discovery have you made that you would love to share with the NDSA community?

I am excited about the rapidly evolving area of image analysis and automated keyword extraction services such as Microsoft Computer Vision, Amazon Rekognition, Google Cloud Vision API, Clarifai, and Imagga. My colleagues Ryan Edge (Digital Production & Metadata Lead), Jonathan Steinberg (Asset Management Specialist), and Erik Hauck (Application Developer) have done some testing of these tools. They are finding their output is far from perfect (I love the hilariously incorrect examples of automatically generated captions in this blog post) but can be accurate enough to hold significant promise as a complement to human analysis and description, in particular for basic, bulk extraction of metadata for large sets of digital images that would otherwise be ‘hidden’ due to lack of metadata.

Do you have an example of a digital preservation or stewardship failure you would like to share? 

In my role at Statistics New Zealand, I managed a project to implement a new centralized system for metadata about the organization’s statistical data, which ranged from census data to unemployment, GDP, and CPI data. Implementation of the system delivered many benefits including improved long-term stewardship as it facilitated capture of detailed metadata essential to data re-use and it facilitated more efficient transfer of data to the internal Data Archive. The repository, public facing website, and metadata authoring client saw strong adoption. Once the system was in place, we strove to set up integrations with existing statistical systems. The idea was that users of those statistical systems could stay within existing interfaces and, via the addition of controlled fields that pushed and pulled information from the metadata system via an API, seamlessly capture metadata as they were working. We made some progress towards these goals – in particular, harvesting information from those systems – but, due to competing priorities, we did not realize full integration. Our integration plans were ambitious and we were reliant on the schedules of very busy teams, but I wish I had been even more dogged in advocating for these integrations to be built. This experience taught me the importance of seizing windows of opportunity and running hard and fast when you have the attention of potential collaborators.

What topics or issues do you wish the digital preservation community offered more expert guidance or robust documentation for?

I would like to see more guidance and work on many of the digital preservation related topics Chela Scott Weber identified in her recent OCLC Position Paper, Research and Learning Agenda for Archives, Special, and Distinctive Collections in Research Libraries.  A few topics that particularly resonated with me are the need for: 1) more appraisal tools and frameworks to help curators appraise digital collections both before and after transfer to the archive 2) models and practical guidance on the roles librarians, archivists, and technologists can fill in open source software (OSS) projects and better ways to understand the ‘total cost of ownership’ for OSS and 3) standardized metrics and data collection strategies to help archives assess their programs and drive decision-making.

What does the next year of digital stewardship hold for you and your institution? What are you working on next?

There are a number of projects on the docket for the Archive this coming year. One highlight is the work mentioned above to revisit our digital collections storage architecture. Another major focus will be an internal access layer that allows archivists and internal users to browse and search our digital and physical collections and securely access or request access to digital objects. Meg Tuomala, Archivist, and I are currently leading a review of several technology options and we will then move into the implementation phase.