IDCC: Data Publication and Citation Workshops

This Conference Update was provided by John Kratz, Postdoctoral Fellow, Data Curation, California Digital Library.

Research Data Publication in Principle and Practice

The day before last week’s  9th International Digital Curation Conference (IDCC) proper began, I attended the associated workshop on data publication. The workshop featured discussions and presentations organized around three viewpoints: scenes and paradigms, repositories and funders, and publishers and service providers. I took two major things away from the day:

1. Data papers are taking off.

As a mechanism for data publication, data papers– short articles that describe dataset collection methods and rationale but don’t perform analysis or draw conclusions– are proliferating, faster than any other model. All of the presenters in the publisher portion of the program represented journals that publish data papers:

  • Rebecca Lawrence (slides here) focused on F1000Research’s efforts to tightly incorporate underlying data in traditional as well as data papers and on review of data.
  • Ruth Wilson (slides here) discussed Nature Publishing Group’s forthcoming data journal Scientific Data which begins publishing in May.
  • Brian Hole from Ubiquity Press explained that their approach is to set a low barrier to data paper authorship: data paper article processing charges are cheap ($40), the peer review process is meant to be quick and straightforward, and (my favorite) the online authoring tool uses small boxes to encourage short papers.

Data papers even bled over into the repository program when Ingrid Dillo discussed the Dutch Data Archiving and Network Service’s (DANS) data journal initiative . She echoed Brian Hole in emphasizing the need to make data publication easy for researchers. The “DANS Data Journal” model publication sounded similar to Ubiquity’s, with the important distinction that Ubiquity works with third-party repositories, while DANS is both publisher and repository.

2. Data review won’t be settled anytime soon.

Questions about how and when to asses dataset quality came up repeatedly throughout the day. Some examples of approaches being taken or proposed:

  • F1000Research takes a two-step approach to data review: an internal technical check, followed by scientific peer review of the data along with the paper. Rebecca Lawrence reported the results of a survey in which, encouragingly, 95% of their peer reviewers said they did actually examine the data, and 80% say that the data influenced their decision to accept or reject the paper.
  • Ubiquity Press aims for a quick “objective” review of each dataset according to well-defined standards.
  • In a talk based on his 2013 article, Mark Parsons (Research Data Alliance) asserted that data review is inherently different from article review, on the the grounds that “literature is an argument; data is (nominally) a fact.” Parsons went on to argue that rather than trying to assess quality (an inherently subjective property) we would do better to track context: how the dataset is used or reused.
  • DANS set up a structured post-publication review system whereby users rate the datasets they have worked with in 6 different categories (e.g., data quality, quality of the documentation) on a 5-star scale.
  • The last activity of the day for me was group discussion of data peer review. The discussion was notable to me not for any consensus reached (which there was not), but for the diversity of models suggested– multiple levels of review, dataset “level of services” with quality implications, accounting for the quality and diversity of post-publication uses, and many others.

Data Citation Principles: A Synthesis

The morning after IDCC ended, I attended the workshop organized by the FORCE11 Data Citation Synthesis Group to introduce 8 consensus data citation principles derived from a number sets of principles and guidelines issued by various organizations. The session opened with an overview of the finalized Joint Declaration of Data Citation Principles.  You can read and, if you wish, endorse the declaration here.

Joan Starr representing DataCite (slides here) and Anita de Waard of Elsevier Advanced Technology Group (slides here) delivered presentations focused on how DataCite and journal publishers respectively can support the principles. Christine Borgman argued that because researchers value citations to publications more than to data, many don’t want direct data citation, and consequently we should focus on the potential to enhance discovery.

We spent the second half of the workshop discussing implementation. The conversation swung intermittently into technical issues, but we spent most of the time on policy and incentives. The general conclusion seemed to be that in the short term, funders, managers, and publishers are targets that are likely to be interested in data citation and researchers simply are not. However, as Puneet Kishor put it, “we need to acknowledge the truth of the situation today, but allow for a future that may be different.”

Leave a Reply