Metadata First: Using Structured Data Markup and the Google Custom Search API to Outsource Your Digital Collection Search Index

Session Type: Presentation

Session Description
Discussions of building the digital library have centered around using tools like contentDM and Solr/Blacklight to build local search engines for our digital content. This model has served us well and by many measures remains an industry best practice. However, as discussions about making our digital collections findable on the open web have evolved, the work of optimizing data and following SEO best practices has introduced new workflows and pressures to digital library development. In this session, we will explore how this new search-oriented workflow can be incorporated and utilized to provide an effective and efficient local search engine for our digital collections.

This session’s focus will be Google Custom Search, a search tool that offers powerful and flexible indexing for a range of content, including the images, documents, and videos that comprise digital collections. Building on Google’s search algorithm with Custom Search provides an achievable alternative to other more complex indexing tools such as Solr/Blacklight. We will consider how investing in structured data to create indexable content that can be consumed via a Google Custom Search API client has further advantages over other local search implementations. Specifically, speakers in the session will discuss:

  • Business cases for outsourcing your search index
  • Gains in SEO and discoverability
  • Preparing collections for a search engine index (e.g., sitemaps, schema.org HTML markup)
  • Using analytics available to Google Custom Search to understand the use and indexability of your content
  • Applying the Google Custom Search JSON API to power your collection search

Among the benefits of this “metadata first” approach to digital library development, the MSU Library’s experience with Google Custom Search has provided an increased understanding of the operations of commercial search engines that crawl and index our content. We have also been able to efficiently align digital library personnel in the work of building interoperable and indexable structured data markup with schema.org and RDFa. And finally, we note the lowering of a significant barrier to creating a searchable collection index and the efficiencies gained in optimizing your digital collections data for commercial search engines.

At the center of this presentation will be a demonstration of how the MSU Library is using the Google Custom Search API to build our local collections search (arc.lib.montana.edu/digital-collections/). The session will include a discussion of how an investment in structured data and indexable content, rather than local search engine development, is a viable path forward for digital libraries.

Session Leaders
Kenning Arlitsch, Montana State University Library
Jason A. Clark, Montana State University Library
Scott W. Young, Montana State University Library
Patrick O’Brien, Montana State University Library

View the community reporting Google doc for this session.