Policies and Standards for Reproducible Research: from theory to practice
This month GigaScience co-hosted a session at the Genomic Standards Consortium meeting in Shenzhen on “Policies and Standards for Reproducible Research: from theory to practice. The session brought together a diverse group of speakers with different roles in the production, dissemination and use of data, to discuss all of the issues surrounding the role of policies and standards enabling reproducible research and data sharing. Co-chaired by our Editorial Board member Susanna-Assunta Sansone, a diverse panel was assembled representing the different stakeholders involved in the data-production cycle, including those able enforce data policies such as research funders and journal editors, users and managers of databases, data producers, and facilitators of all of these processes.
Susanna opened proceedings and set the scene with an overview of BioSharing, of which BioMed Central and GigaScience are both members (slides here). Giving an overview of the evolving portfolio of data sharing enablers, this was very relevant forum as BioSharing aims to strengthen collaborations between exactly the groups present in this session, and to discourage redundant (accidental) competition between standards-generating groups
The second scene-setting talk was from our editor Scott Edmunds, covering the issues and additional incentives needed to enable and encourage data dissemination (see the slides below and video here). Covering work that GigaScience and the BGI has done to release datasets with citable DOIs, the utility of releasing genomes pre-publication was nicely shown by the resulting crowd-sourcing of the the deadly 2011 E. coli O104:H4 outbreak genome sequenced by the BGI (and partners in Hamburg and Birmingham) and released by us in this manner. A further recent development relating to this is a new study in PLoS One that used draft unassembled genome sequence data to directly develop a targeted bactericidal agent to kill O104-positive E. coli. To enable data-citation to be a recognized form of credit and viable incentive to encourage faster data dissemination in this manner, journals need to allow data to be cited in the references in the same way as articles. Scott then presented many of the recent examples highlighted in this blog of journals including Genome Biology and Nature Biotechnology (both represented at this meeting) now carrying this out
Being key gatekeepers able to enforce and influence data policies and standards the perspective of funders was then covered, with Paula Olsiewski representing the Alfred P Sloan Foundation, and Professor Rita Colwell providing her wealth of experience as former director of the NSF, and highlighting the elephants in the room that needed solving. A group of talks then covered “Breaching the Bio-Domain, providing a more hands-on point of view from data-producers, curators and database managers. Philippe Rocca-Serra brought a ‘data commoning’ perspective and presented on the ISA-Commons system, of which we have recently collaborated on a publication. Folker Meyer talked of his experience running MG-RAST, highlighting that of the 41,000 datasets in the database only a minority were publicly accessible, and appealing for funders to insist more on this. Srikrishna Submanian (Institute of Microbial Technology, India) gave a similarly open-data talk giving examples and a structural genomics perspective for data sharing (for an example see TOPSAN). Yong Zhang gave a final “data-producer” and BGI perspective outlining the scale of the challenges ahead, and giving a preview of some of the work underway to build biobanks and datacenters that hope to become the China National Genebank.
The session ended with final perspectives from journal editors, with Genome Biology editor Clare Garvey and Craig Mak from Nature Biotechnology giving overviews of their journal policies and examples of both of their publishers schemes encouraging aiding data sharing and standardization.
On top of this session, the meeting was well attended and covered on twitter and google+ (a first for us), and for a genomics-focussed meeting participants were remarkably restrained in dropping any “genomics bingo” buzzwords. The organizers hope to be posting slides on the conference wiki, and we have also archived some (such as the opening address from GSC president Dawn Field) on our slideshare account. There is further coverage on the BGI news page, pictures on their Flickr page, and videos are currently being posted on the BGI Youtube and GSC Scivee pages.
GSC13 special series: call for papers highlighting best practice in genomics research
As mentioned in a previous posting, to tie in with the meeting we are launching a call for submissions to a thematic series of discussion and research from the conference and wider community highlighting best practice in genomics research, and we are currently reviewing a number of candidate submissions. BGI is generously covering the open-access article-processing charges for the journal’s first year, so please contact us at editorial@gigasciencejournal.com if you have related work you would like to submit to this series or journal, or submit a manuscript here.
References
1. Rohde, H. et al. Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med. 365(8):718-24. (2011)
2. Scholl, D. et al. Genome Sequence of E. coli O104:H4 Leads to Rapid Development of a Targeted Antimicrobial Agent against This Emerging Pathogen. PLoS ONE 7(3): e33637. (2012)
3. Sansone, S-A. et al. Toward interoperable bioscience data. Nature Genetics 44, 2 (2012).
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience Slides
View more PowerPoint from GigaScience, BGI Shenzhen