This post was written by Annamarie Klose, Scott Goldstein, Greer Martin, Rachel White, and Elliot Williams of the DLF AIG MWG Tools Subgroup
Introduction
Metadata assessment is often aided by scripting and other automation tools. Catalogers and metadata librarians often employ the adage “Work smarter, not harder” to work at scale on hundreds, thousands, or even millions of records at a time. Within the community, some tools are more or less commonly known and used. The DLF Assessment Interest Group’s Metadata Working Group created the Tools Repository as a way of sharing information about metadata tools within the Libraries, Archives, and Museums community. The repository is publicly accessible through the DLF AIG MWG website’s tools webpage (http://dlfmetadataassessment.github.io/tools).
History of the Group
The original Tools Repository was an offshoot of the Environmental Scan project, begun in 2016, soon after the birth of the Metadata Working Group itself. The environmental scan collected information on the use, status, and application of 21 tools, with plans made to test them during the next working year. A visual breakdown of the types of tools and varieties of metadata work they supported, as well as a list of citations related to the use of tools in metadata work, accompanied this initial list. In the early days, the Google suite of products were used to capture documentation and the ideas generated during collaborative working group meetings. A concern was how to use the tools within the DLF Metadata Assessment Framework in terms of completeness, accuracy, and accessibility, conformance to expectations, consistency, timeliness, provenance, and trends. In 2017, with preliminary organization achieved and tools selected, the working group began to consider not only deliverables, but the platforms to organize and display projects to their best effect. Working documents such as meeting notes and agendas would remain in the Googleverse, while the environmental scan and tools repository would be staged on GitHub, where issue tracking is readily available. More static documents, such as annual working plans and completed documentation, would find a home on the DLF AIG wiki and OSF repositories.
Getting to GitHub proved to be a bit of a challenge for the Tools Subgroup. Throughout 2017 and 2018, the subgroup continued collecting its research, reviews, and results of tests in a shared Google Sheet. In 2019, members of the Tools Subgroup liaised with the Website Subgroup in an attempt to publicize what was then a spreadsheet, when using Jekyll was first considered. Rather than having to suffer through the infinite two-way scroll of a spreadsheet, using a wiki format would easily let readers choose categorized tools that held the most interest. Then, as now, Tools members relied heavily upon the tech skills of the Website subgroup to display their work.
Planned changes
Since 2018, the tools repository has listed the information in the wiki section of the underlying GitHub repository. This made adding and updating tools easy, but the wiki presentation was not ideal for giving the viewer a “bird’s-eye view” of the entire software landscape. Starting this year, the website links out to a Google Sheet. We are currently setting up the GitHub Pages website so the tools data can be (re)incorporated as HTML. To make this easier, we will set up a “collection” in the underlying Jekyll framework. A collection is a feature of Jekyll that creates a content type, or class of pages that are all structured the same way. Our collection will have fields for the title of the software tool, the programming language the tool is written in, whether the tool is open source, and so on. Adding a tool will then simply be a matter of adding a record to the tools directory. Using a Jekyll collection will not only allow us to display the tools in a consistent way, but it will also enable dynamic filtering. For example, one could filter only the records with a description containing “MARC” or with the programming language field containing “Python.”
Categories
In reimagining the tools repository, the subgroup took inspiration from the Preserving Digital Objects With Restricted Resources (Digital POWRR) Tool Grid, a collection of tools for digital preservation organized into categories based on the OAIS Reference Model. The subgroup thought that introducing categories would aid navigation of the tools repository and summarize the tools’ functionality.
The group started to work on identifying functional categories of metadata work by reviewing and grouping the tasks carried out by the current list of tools. Labels, and sometimes definitions, were difficult. For instance, “extraction” was considered as a category label for tools that retrieve data from an external source, but this term is commonly used to refer to the extraction of embedded metadata from digital files. “Retrieval” was settled on instead, and “extraction/embedding” became the category label for tools that interact with embedded data. Some of these challenges were due to the brevity of the labels. For instance, does “transform” refer to tools that transform metadata from one schema to another, or from one format to another, or transform the metadata values themselves? Or, are they tools that do all three? It was tempting to make the category labels more specific, but we decided to keep the categories broad in order to align with the Digital POWRR model and add category definitions for the labels.
After consulting with the larger DLF Metadata Working Group, the subgroup settled on seven task categories: Creation, Editing, Validation, Transformation, Extraction/Embedding, Retrieval, and Analysis.
Limitations
Early on in the creation of the Tools Repository, each tool was rigorously tested to determine whether to include it in the repository. This meant that a lot of information was known about each tool, but it put a significant limit on how many tools would be included. In order to include more tools and make the repository a more useful resource, the subgroup shifted away from testing each tool and now relies on each tool’s documentation and community knowledge. This has enabled us to build a larger list of tools, but with the trade-off that the subgroup has less first-hand experience with each tool and cannot verify that each works as expected. We actively welcome community input to help us provide more accurate and complete information about the tools listed!
Because the group does not perform extensive testing or evaluation of each tool, we cannot vouch for the stability or security of any of the tools listed. For example, older versions of OpenRefine may be vulnerable to the Log4j security vulnerability that was exposed last year. Inclusion in the Tools Repository should not be seen as an endorsement of the tool, and individuals should exercise the usual level of caution when downloading and using any of the tools listed.
Like all projects, the Tools Repository is also influenced by the background of the people who created it. Within the subgroup, our members have varied experience with MARC cataloging, non-MARC metadata, programming, and more. Each of us brings those perspectives to our analysis of these tools, so our understanding and use of these tools may not be exactly the same as people with other backgrounds and needs.
Community Input
Do you know of tools that could be added or changes that should be made to the Tools Repository? The Tools Group can always use contributors. The Tools Submission form and Tool Correction form can be used. If you would like to join the group, you can contact Annamarie Klose (klose.16 AT osu.edu).
Contributors
Current members of the subgroup are: Annamarie Klose, Scott Goldstein, Greer Martin, Rachel White, and Elliot Williams. There were many prior contributors to this group. Special thanks to the many prior DLF AIG MWG members and prior Tools Subgroup members without whose work this would not have been possible.