Community Spotlight: ScholarSphere

Mike Giarlo
Patricia Hswe

This Community Spotlight was provided by Mike Giarlo and Patricia Hswe, Penn State University.


The story of ScholarSphere, Penn State’s institutional repository (IR) service, is a multilayered, community-driven narrative. As such, it did not happen overnight. Before it was a gleam in any developer’s eye, ScholarSphere was nurtured first by a strategic commitment between the University Libraries and Information Technology Services (ITS) to overall stewardship of digital content and data, with dedicated attention to preservation for access, use, and reuse. Having refrained from jumping on the IR bandwagon in the early 2000s, Penn State was an outlier among R1 university peers. The deliberation, however, allowed us to leverage lessons learned in hindsight from first-generation IRs, especially the need to draw involvement early from users.

In prioritizing use and users, engagement with the university community — largely in the form of librarians, faculty, and students — was critical. Listening to their needs would be mandatory. Equally critical: that the technology framework supporting a key system in our digital stewardship program also be community-based. A blend of philosophy and experience, described in detail below, informed this stance as well as our decision to go with the Hydra technology stack. By being part of the Hydra community, which makes code a communal, transparent resource, Penn State is contributing to the sustainability of Hydra technology. The emergence of a practitioner-based community, CURATEcamp, was also instrumental, in that it offered outlets for discussion, hacking, and idea exchange — activities that led to new and continuing collaborations.

The timing of a certain mandate also worked in ScholarSphere’s favor. With the announcement from the National Science Foundation (NSF) in 2010 of an imminent data management plan (DMP) requirement, the Libraries and ITS had a potent catalyst for a repository service — one that could take care of research data over the long term, optimizing them for public discovery and sharing. Given the recent pronouncements from the Office of Science and Technology Policy for more public access to data, the trend toward open data has gained sizable traction, in effect producing a climate affirming of a service at Penn State like ScholarSphere. As a tool for fulfilling the DMP requirement, ScholarSphere was also an important hook that liaison librarians could convey to their faculty, thus mustering greater local investment in the service from them.

Thus, the story of ScholarSphere that follows is one of community building, both at Penn State and beyond. A narrative that embraces a variety of people, roles, and use cases, it has come to define the way the Libraries and ITS work together in launching and sustaining a service.

Setting a Foundation: Penn State’s Content Stewardship Program

In 2008, joint strategic planning between ITS and the Libraries led to the creation of a shared program for cyberinfrastructure, digital content, and data stewardship, or “Content Stewardship Program” for short. The program brought together the strengths of a large IT organization with those of a large knowledge management organization in order to offer centrally supported services around digital stewardship. It was driven both by anticipated federal funding agency requirements for data management plans, and by faculty looking for better support for e-research and e-science initiatives.

The program’s first action was hiring complementary positions, viewed as cornerstones to the success of the program. These positions were a digital library architect (Mike Giarlo) to report up through ITS and a digital content strategist (Patricia Hswe) to report up through the Libraries. The positions are strategically placed — the strategist reporting to an Associate Dean and the architect reporting to a Senior Director of IT, meaning we have privileged access to both the Dean of Libraries and the Vice Provost for Information Technology. These direct-report hires signaled that the content stewardship program was an important, new strategic initiative.

E-Content Delivery Platform Review

The first project chartered by the Content Stewardship Program was an assessment of the Libraries’ strategy for delivering digital content.

First we selected a rubric for evaluating the delivery platforms in use. The rubric, adapted from work that Purdue University and University of Wisconsin-Madison did comparing repository software, contained technical measures such as access control, metadata standards, and file format support, as well as general information regarding level of adoption, development community, and customer support model. With the rubric in hand, we interviewed nearly 35 people — including staff, faculty, and students from across the libraries, ITS, and campus departments — to learn what they thought of the platforms they regularly used: what they liked, what they didn’t like, what they wanted to be able to do with content in using such platforms, what workflows were in place, etc.

Through the review, we found that three of the four applications the Libraries used to deliver digital content were moribund, and none of the four offered content owners or curators the necessary tools to manage their content over time; that work was instead happening in other environments, coordinated quite often by desktop spreadsheets and copies of files littered across mapped network drives or burned to CDs and stuffed into cabinets. In short, there was a huge opportunity to improve preservation of the Libraries’ investment in digital content.

Pilots and Prototypes

The review set us down the path of looking into tools and platforms for digital curation and digital preservation.

Eager to advance our digital preservation efforts, the Content Stewardship Program chartered a new project to gather use cases documenting the curatorial work being done within the Libraries and the University Archives. In parallel, we learned more about available technologies, tinkered with them, and sparked numerous discussions at events such as CURATEcamp, on mailing lists, and on social networks.

When we finished gathering use cases, we learned not only how our curators were managing their work; we learned that they didn’t consider themselves curators and, moreover, they hardly realized they had a community of peers within their own organization. Although our digitization decisions embraced key lifecycle actions such as conceptualization, appraisal, and selection, the work of getting a collection online and discoverable and usable had gaps and inefficiencies born of not only disparate platforms and thus disparate processes. The fragmentation exposed by our use cases was resulting from a “one-off,” rather than architectural, approach to digital stewardship, which also meant that certain personnel or entire departments were effectively siloed.

With the curators’ use cases documented, we built them a prototype tool for deposit, description, auditing, and version management. The prototyping experience was a positive one, and there were two lasting impacts of the prototype, despite its short life and narrow scope. First, we successfully raised awareness of “siloization” by connecting curators with one another across structural and technological boundaries. Second, we presented a poster on our prototype during a well-attended session at Open Repositories 2011, where excellent comments and questions from those who stopped to talk about our poster planted seeds about our ultimate technical direction; we decided that for scale and sustainability purposes, it was more important for our institution to align our digital preservation efforts with that of a larger, more active community.

Hydra: A Community Fabric for Repository Services

There were compelling reasons to align our efforts at Penn State with those of an established and growing community. The community we refer to is the Hydra Project, a collaborative of largely cultural heritage institutions founded in 2008 to develop repository services based on a common architecture and common open-source software. Hydra’s focus on community was (and remains) attractive to Penn State primarily for sustainability purposes: we could have built a repository entirely in-house or subscribed to a vended product, but we were more confident about sustaining this work knowing that a growing number of institutions were adopting Hydra and, furthermore, committing to sustaining the project, its community, and its software products.

Because Hydra already had an established base in the form of a core platform and common software, software developers at Penn State could spend their time building web-based interfaces for our customers, rather than on building infrastructure. That is, we could scale up user-facing services more quickly (not to mention more sustainably) than if we’d decided to build from the ground up or to base our work on less mature or less well adopted technologies.

Choosing Hydra required us to train our technical staff in new technologies. It has compelled us to work in largely new ways:

  • We develop out in the open, with many eyes on our source code, since Hydra is an open-source project;
  • We consider the impact of our code on other Hydra institutions, meaning that we not only allow greater access to our code, but that we contribute back to core components thus giving back to the community; and
  • We coordinate our development with other Hydra partners so that we can divide work among institutions, align timelines, and share resources.

Few institutions have the resources to do application development alone in a way that scales sufficiently and can be sustained — or, at least, Penn State does not, and so the shared development that occurs in the Hydra community allows Penn State to do more with less.

ScholarSphere: Community-Centered Development

The work of bringing ScholarSphere into production also exemplified an internal community-building experience. ScholarSphere was planned as a self-deposit repository service, and the Libraries and ITS were committed to learning about, and addressing, user needs explicitly. Before beginning the project, we solicited participation from stakeholders: initially, these were mostly liaison librarians with strong ties to faculty and students; in time, as development progressed, stakeholders came to include more faculty and students.

Following the example of the earlier pilot project, in which we reached out to curators for use cases, the ScholarSphere team collected and documented use cases from our ScholarSphere stakeholders. We all met bi-weekly to discuss scenarios of use, drawing out possible roles, features, and functions to begin to prioritize. Soon our meetings included demonstrations by the developers, to show the evolving service and receive feedback. Because liaison librarians had started talking about the upcoming service with their constituents, we were able to connect with faculty about ScholarSphere even before the service was rolled out.

The dedicated attention to users during the pre-production development phase paid off immensely when it came time to do usability testing. Across eight scheduled usability testing sessions, we were able to draw approximately 40 librarians, faculty, and students to be test users. Developers also attended the testing sessions, which gave them a chance to hear feedback, including questions, directly from users. These interactions resulted in iterative enhancements to the service during the testing period, such that when we launched ScholarSphere in September 2012, it already reflected user-driven feature decisions.

Extending Community through Sufia

The community’s reaction to ScholarSphere has been extremely positive. In fact, a group of Hydra developers extracted functionality from ScholarSphere into a new open-source component called Sufia, so that other institutions could more easily bootstrap their own institutional repository applications. Sufia, based on the work that we started for ScholarSphere, is now in production at five institutions, with around ten more institutions working on adopting or planning to adopt Sufia now.

In 2013, we are in a very different position than we were a few short years ago when Penn State was more or less invisible in the digital preservation community. Much of this success is due to our alignment with a growing community; in the time since Penn State joined the Hydra Project as a partner in June 2012, the community has more than doubled in size, from nine partner institutions up to twenty, with more coming. The mileage we have gained from prioritizing community-based efforts continues to grow. Putting community — whether at the local, organizational, or national level — ahead of other concerns has turned into a fundamental strategy for sustaining digital preservation and stewardship services at Penn State.

Three historical social science datasets that were deposited by one of Penn State’s liaison librarians for the social sciences. Before ScholarSphere there was no place for him to host them.


Did you enjoy this post? Please Share!


Related Posts

NDSA Updates Strategic Activities

As part of the NDSA’s broader organizational alignment activities taking place over the last year, the NDSA Coordinating Committee recently charged a small group of

Skip to content