HathiTrust Database Developer

Job Summary
The Library Information Technology (LIT) division provides comprehensive technology support and guidance for the University of Michigan Library system, including significant work in support of the operations of the HathiTrust Digital Library (a collaboration of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future).

The Library Systems Office, a part of LIT, develops and manages systems and resources that support traditional library services (metadata creation and management, acquisition and circulation of materials, discovery interfaces, etc.), as well as a variety of other systems and functions that depend on bibliographic and descriptive metadata.

NOTE: This is a term-limited appointment funded though December 31, 2017, with the possibility for renewal.
Responsibilities

Responsibilities
The Library Systems Office is currently seeking a talented, resourceful software developer to take the lead in managing and enhancing a database of print holdings data collected from all of the 80+ HathiTrust partner institutions. This large (some tables contain hundreds of millions of rows), rapidly evolving collection of data requires regular updates that rely on a mix of relational database operations, external scripts, and cloud-based calculations. The database maps individual institutions’ print holdings data to items in the HathiTrust digital repository, and serves as the basis for calculating costs of partnership, special access privileges and a number of other useful queries. The print holdings database is a fundamental component of the dynamic HathiTrust Digital Library platform, and it requires a skilled and agile developer to manage and guide its implementation as HathiTrust continues to grow.

Project tasks and goals include:

  • Working closely with other HathiTrust developers and members of the Library Systems, Core Services, and the Digital Library Production Service staff to develop and improve the print holdings database, including parsing, normalizing and ingesting holdings data submitted by HathiTrust partners
  • Building and running custom reports, queries, and data processing scripts and routines
  • Accommodating new data types and functionalities
  • Developing a regular, automated update strategy
  • Documenting apparent data anomalies and processes that could be used to minimize them and, potentially, building a web-based interface to the database.

The successful candidate will be encouraged to conceive of and implement new processes that go beyond the current system functionality to improve automated matching of records, to provide new ways of addressing ambiguous data, etc.

It is anticipated that the successful candidate will also assist with other HathiTrust special projects, including consultation and assistance in the development of the HathiTrust Government Documents Registry which will be a registry of metadata intended to describe the full corpus of U.S. Federal government documents.

Required Qualifications

  • Bachelor’s degree in Computer Science or a related field and 3 to 5 years of work experience, or an equivalent combination of education and experience
  • Demonstrated proficiency with relational database technologies such as MySQL, including experience with design and implementation of very large databases.
  • Demonstrated programming skills in at least one modern programming language
  • Demonstrated experience processing and mining very large files of textual data.
  • Facility with Linux-based operating systems
  • Strong analytical and troubleshooting skills
  • Excellent written and verbal communication
  • Ability to creatively improve workflows and processes
  • Ability to function independently in a dynamic multicultural/collaborative environment.

Desired Qualifications

  • Experience with Ruby on Rails and Perl.
  • Knowledge of library metadata standards (MARC21/RDA, Dublin Core, etc.)
  • Basic web application development experience
  • Experience using version control systems in software development
  • Familiarity with batch file processing techniques on the command line
  • Background and experience with cloud computing, MapReduce (Hadoop, Pig), and/or NoSQL technologies.

Application Deadline
Job openings are posted for a minimum of seven calendar days. This job may be removed from posting boards and filled anytime after the minimum posting period has ended.

U-M EEO/AA Statement
The University of Michigan is an equal opportunity/affirmative action employer.

To apply: http://umjobs.org/job_detail/82708/hathitrust_database_developer