Catching up with past NDSA Innovation Awards Winners: DataUp

 

Nominations are now being accepted for the NDSA 2020 Innovation Awards.Screenshot of the DataUp interface

DataUp from the California Digital Library won a 2013 Innovation Award in the Project category. DataUp was recognized for creating an open-source tool uniquely built to assist individuals aiming to preserve research datasets by guiding them through the digital stewardship workflow process from dataset creation and description to the deposit of their datasets into public repositories. The following individuals are recognized for their contributions to DataUp and subsequent projects, and responses to this Q&A.

The original CDL DataUp team included:

  • Stephen Abrams, then CDL Associate Director of the UC Curation Center, currently Head of Digital Preservation, Harvard Library
  • Patricia Cruse, then CDL Director of the UC Curation Center, subsequently Executive Director of DataCite, and now retired
  • John Kunze, CDL Identifier Systems Architect
  • Carly Strasser, then CDL Data Curation Project Manager, currently Program Manager for Open Science at the Chan Zuckerberg Initiative

Current CDL staff responsible for the successor Dash and Dryad projects are:

  • John Chodacki, CDL Director of the UC Curation Center
  • Daniella Lowenberg, CDL Research Data Specialist and Product Manager

What has DataUp been doing since receiving an NDSA Innovation Award?

DataUp was conceived by the University of California Curation Center (UC3) at the California Digital Library (CDL) as an immediate response to the needs of researchers for an intuitive, effective, and self-service data curation platform.  DataUp initially targeted support for tabular datasets via an easy-to-use UI accessible to researchers themselves, rather than requiring mediation by librarians or archivists.  At the same time, CDL was engaged in other related initiatives, including the DataShare open data publication system.  Over time, the curatorial intentions and functional capabilities of both systems began to overlap considerably.  Consequently, in 2014 CDL decided to converge the two systems into a common technical platform under the Dash name.  More recently, similar synergies were recognized between Dash and the Dryad research data repository, which led to the integration of the Dash system as the new Dryad technology platform.  Throughout this multi-year evolution, the core principles and goals of the original DataUp project have remained steadfast: providing the best possible support to the scholarly community for the long-term curation, publication, and reuse of critical research data.

What did receiving the NDSA Award mean to you?

Receiving the NDSA Innovation Award was very gratifying as public affirmation by a significant stakeholder community of the value and beneficial impact of the DataUp vision, project, product, and service.  While the DataUp team was convinced of that value right from the start, it is always nice to have those beliefs recognized and confirmed by colleagues and peers.

What efforts, advances, or ideas over the last few years have you been impressed with or admired in the area of digital stewardship?

Tremendous strides forward have been made in digital stewardship over the past years.  This has been facilitated in large part by mutual recognition of all implicated stakeholders – scholars, administrators, librarians, archivists, funders – of the nature of common problems and needs and the necessity for coordinated response.  Positive outcomes have followed from the open contribution of their individual perspectives and strengths in collaborative efforts.  For example, the success of the DataUp/DataShare/Dash/Dryad activity called upon the active participation over many years by the CDL, University of California Libraries, the DataONE network, Microsoft Research, the Gordan and Betty Moore Foundation, the Alfred P. Sloan Foundation, DataCite, the Make Data Count initiative, and the Dryad community.  Looking towards the future, there are very promising avenues of exploration regarding the application of big data and machine learning techniques to the proactive curation of research data and other forms and genres of digital content deserving long-term stewardship.

The DataUp project began in 2011 – nearly a decade ago! Various challenges of preserving and providing access to research data sets continue to be discussed, and have been addressed in the 2014, 2015, and 2020 NDSA Agendas for Digital Stewardship. Where do we go from here?

The guiding tenets originally encapsulated by DataUp and its DataShare, Dash, and Dryad successors are fully consistent with the NDSA Agenda’s recommendations for organizing and ensuring long-term access to scientific data sets, including support for at-scale curation, promotion of the FAIR principles, and collaborative attention to innovation and sustainability (https://osf.io/7sfc6/, p. 26).  Three specific concerns seem particularly challenging and call out for concerted attention.  First, the academy as a whole needs to continue development of more flexible and sustainable financial practices concerning the curation of all legitimate research outputs, including research data, to avoid dis-incentivizing and confounding widespread adoption of effective RDM tools and practices.  Second, greater automation and intuitive self-service operation is still needed regarding the contribution of research data to managed curation environments such as Dryad.  Ideally, these actions would be automatic side-effects of other, more primary activities and workflows with which scholars and researchers are already engaged.  And third, more can be done regarding actionable linkages between research publication, research data, and research software, all of which interact within a cohesive and co-dependent web of scholarly activity and communication.  We feel that DataUp provided a pioneering attempt at addressing these issues and look forward to continuing progress towards these important goals.