Reproducibility, Research Objects and FAIR data Realitiesj
Van Leeuwenhoek Lecture on BioScience.
Carole Goble (CBE FREng FBCS CITP) is professor of Computer Science at the University of Manchester. Over the past 25 years she has pursued research interests in the acceleration of scientific innovation through: distributed computing, workflows and automation; software management and the Semantic Web; social, virtual environments; software engineering for scientific software; and new models of scholarship for data-intensive science. Her current research interests include Grid computing, the Semantic Grid, the Semantic Web, Ontologies, e-science, medical informatics, Bioinformatics, and Research Objects. She applies advances in knowledge technologies and workflow systems to solve information management problems for life scientists, and other scientific disciplines. She received many awards and honours. She was appointed Commander of the Order of tthe British Empire (CBE) in the 2014 New Year Honours for services to science. She was appointed a Fellow of the Royal Academy of Engineering in 2010.
Over the past 5 years we have seen a change in expectation for the management of all the outcomes of research - that is the "assets" of data, models, codes, SOPs, workflows. The "FAIR" (Findable, Accessible, Interoperable, Reusable) GUiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post-publications. It all sounds very laudable and straightforward. But .......
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorhithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not "finished": the codes fork, data is updated, algorhithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved in (and Leiden is involved in too) including:
* FAIRDOM (fair-dom.org) which has built a Commons (fairdomhub.org) for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions.
* ELIXIR (elixir-europe.org), the EU Research Data Infrastructure, and its efforts to exchange workflows.
* Bioschemas (bioschemas.org), an ELIXIR_NIH-Google effort to support the finding of bio-assets through exploiting web-strength search infrastructure.