Notes from the Research Data Alliance (RDA) Plenary

Inna Kouper
Inna Kouper

This event report is by Inna Kouper, Data to Insight Center, Indiana University / RDA Engagement Interest Group

Many conversations are happening around data. The US government encourages wider and open availability of scientific and government data (http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research, http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-) and makes strategic investments into big data (http://www.whitehouse.gov/blog/2013/04/18/unleashing-power-big-data). Universities, academic libraries and archives are grappling with an existing or anticipated data deluge and are developing infrastructure in support of data curation and sharing.

The success of data efforts depends on trust, coordination, collaboration and hard work. The Research Data Alliance (RDA, https://rd-alliance.org) is an international initiative created to provide a platform for collaboration and hard work. Supported by the US, EU and Australian governments and funded through $2.5 million grant from the National Science Foundation, RDA provides a forum for interested individuals and organizations to identify problems or roadblocks and work on removing them. The goal is to develop standards, services, software and policies and connect as many people and pieces of infrastructure as possible.

The RDA is a combination of structured top-down leadership and a broader bottom-up foundation (http://rd-alliance.org/organisation.html). The upper divisions provide administrative, technical and organizational oversight and make decisions with regard to the grassroots initiatives, which are now comprised by bird-of-a-feather, interest and working groups. The working groups are expected to have clear outcomes (deliverables) within a relatively short 12-18 month period. The interest groups form around a particular interest and may be seen more as discussion-oriented.

The RDA vision of infrastructure and data exchanges is socio-technical. Among many sources, it draws inspirations from the Internet Engineering Task Force (http://www.ietf.org/) and the report “Understanding infrastructures: Dynamics, Tensions, and Designs” by P. Edwards, S. Jackson, G. Bowker, and C. Knobel (link to pdf http://deepblue.lib.umich.edu/bitstream/2027.42/49353/3/UnderstandingInfrastructure2007.pdf, see also my notes http://inkouper.blogspot.com/2012/08/cyberinfrastructures.html). The Edwards et al. report, in turn, draws significantly on the historical and ethnographic work on infrastructures that demonstrates the importance of understanding complex social assemblages that facilitate transitions from technical visions and designs to switches and gateways that work. Successful infrastructure efforts need technical “wizards”, who envision and create the system, “maestros,” who orchestrate the organizational, financial, and marketing aspects of the system, “champions” who stimulate interest in the project, promote it and generate adoption and, finally, users and user communities.

Similar themes of system building, championing, adoption and engagement were prominent during the second bi-annual RDA plenary (September 16-18, 2013, Washington, DC, https://rd-alliance.org/future-events). There were about 400 participants from 22 countries – quite an impressive turnout for a new initiative. The plenary featured keynotes from Tom Kalil, Deputy Director for Technology and Innovation, White House Office of Science and Technology Policy, John Wilbanks, Chief Commons Officer, Sage Bionetworks and Carole Palmer from UIUC School of Library and Information Science. The keynotes were recorded and can be found online (https://www.rd-alliance.org/programme.html), but a few points are worth re-iterating.

John Wilbanks emphasized the need to make data open and machine-readable. This will not only help us support the emerging system of knowledge production based on formulas and calculations applied to data, but also it will create new value for data. It will make data a generative system, i.e., a system that has the “capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.” (The quote comes from the book “The future of the Internet” by J. Zittrain http://futureoftheinternet.org/). One takes a dataset because it’s open, connects it to other datasets because they are all machine-readable and produces something new and important, something that is of greater value to researchers and the public. In order to support that, we have to think seriously about licensing, privacy, metadata, and provenance as well as about ease of use (consumption) versus mastery and manipulation (production). And we have to be looking into future and trying to support the potential value of data, i.e., aniticipate that anything can become data.

Carole Palmer emphasized the differences between various disciplines and epistemic cultures – something that has a long tradition of research in the literature on social and cultural aspects of science. How can we foster a common culture across disciplinary fields? One of the examples she brought up was a shift from individual research to a shared collective responsibility centered around artifacts, sites or problems. Thus, researchers who use Yellowstone National Park come from a variety of disciplines to study microbes, trees, water, soil, rocks, and so on. Scientifically significant sites are sites where many individuals and teams can work together and benefit from each other’s data collecting efforts. The shared data will ultimately benefit society in its attempts to make decisions based on complex heterogeneous evidence. This new evidential culture requires better documentation and provenance and more effective techniques of data collection, sharing and re-use. And how do we make a shift toward the new culture? Unfortunately, it is still not clear. As Carole Palmer said, “the social part is hard.” Perhaps, this is a social scientist job – making connections between infrastructure teams and users, identifying and smoothing tensions, “championing” for certain decisions.

The keynotes stimulated a lot of discussions about adoption and engagement. As a co-chair of an engagement interest group within RDA (https://rd-alliance.org/internal-groups/engagement-group-ig.html), I couldn’t be happier. After all, the members of the engagement group recognized the need for connections among various stakeholders and championing right from the beginning. On the second day of the plenary, many BoF, working and interesting groups talked about this in their breakout sessions. There were 5 BoF groups, 6 working groups, and 18 interest groups meeting during the plenary!

The engagement session went quite well, we had about 20 attendees with a few more people coming and going. We had three great presentations followed by a lively discussion about engagement strategies and the need to support long-tail science. We also talked about mutual expectations between participants and RDA. The RDA can provide influence, recommendations / best practices, awareness of important issues, funding, but most importantly, a platform to build trust and transparency. The engagement interest group can help with domain-specific or stakeholder-specific events, training and outreach materials and showcases of successes and failures.

We also had some interesting conversations with the members of the RDA TAB (Technical Advisory Board) and the Council about our group being “in-between” other working and interest groups and possibly facilitating more connections within and beyond RDA.

The most immediate next steps for the engagement group include drafting a charter of the group that would replace an outdated case statement, updating the group webpage with notes and presentations and working on the community engagement wiki. A more long-term activity would be to develop a template for success and failure showcases and collect stories.

As I said in a three minute report on the progress of our break-out session, engagement is a tough topic – it’s messy and not very outcome-oriented (all groups reports were recorded and could be viewed here http://www.tvworldwide.com/events/rda/130916/globe_show/default_go_archive.cfm?gsid=2352&type=flv&test=0&live=0). It would be great to receive more support and contributions from communities interested in promoting data exchanges, including the DLF community. We plan to organize a BoF session on RDA and community engagement at the DLF Forum to talk about theories and practice of data sharing, stakeholders and, of course, community engagement. Everyone is welcome to join!

Other online posts from or about RDA plenary:

RDA Plenary, http://discoverygarden.ca/post.php?type=blog_entry&id=19

Carly Strasser, RDA Meeting Part 1 (http://datapub.cdlib.org/2013/09/20/rda-meeting-part-one-the-rda-organization/) and RDA Meeting Part 2 (http://datapub.cdlib.org/2013/09/24/rda-meeting-part-2-the-meeting-in-dc/)

Research Data Sharing without barriers…get involved? http://infteam.jiscinvolve.org/wp/2013/09/20/research-data-sharing-without-barriersget-involved/