HathiTrust: Sharing the Care and Feeding of the Elephant

Session Type: Presentation/Panel

Session Description:
HathiTrust integrates existing content, processes and workflows from a variety of extant systems and an ever-increasing number of partners. This federation of institutions and systems necessitates a different organizing principle than that for most single-institution systems, at the administrative and data levels. Michigan has played a significant role in shaping the principles and guidelines for content placed into HathiTrust. Michigan staff will share their own experience of working with their library’s materials in HathiTrust, information that may be useful to any institution looking at integration or long-term preservation of digital projects in any repository.

The integration of material in HathiTrust makes the differences in institutional practice more apparent, and decisions preferring one among a variety of different policies are difficult to make. Even within standard formats like MARC, there are many different ways of describing the same item. For older materials without standard identifiers such as ISBNs or even LCCNs, how can you identify, let alone normalize, this variation? But without normalization, how can you ensure your volumes are included with others from the same set? And as individually-created projects are ingested, differences in practice can be magnified, and a lack of the type of administrative metadata available for more recent digitization processes increases uncertainty about variation that is meaningful and intentional and that which is incidental and unintentional.

In short, HathiTrust faces a series of interesting challenges as content, processes, and workflows from multiple partners and extant systems are integrated. We will discuss these challenges, and provide relevant examples, for:
– managing updates to information maintained in separate systems
– recognizing and managing normalization across the variation (cataloging, quality, collections) inherent in a large system
– organizing administrative and data levels to utilize principles of organization for metadata and data with a long term perspective that assumes federation.

Session Leaders:
Kat Hagedorn, University of Michigan
Christina Powell, University of Michigan
John Weise, University of Michigan

Session Notes:
View the community reporting Google doc for this session!

Session Slides:

Skip to content