Open Bioinformatics in The Irish Free Software State

Open is the New Black
While the internet might still be raging over Tim Hunt’s comments about #distractlingsexy gender issues in the lab, and to a lesser extent Lior Pachtor’s recent provocative blog on the “myths” of bioinformatics code availability and licensing, here in Dublin this years BOSC conference was clear as ever about where they stand on both issues. Appropriately being held in the capital of the first country to legalise marriage equality by popular vote, this year’s theme was “Diversity”. This explicitly had the goal of opening the door even wider to participants who have historically been under-represented in the world of open source bioinformatics. On top of race, gender and sexuality, this call for an increase in diversity was also made to more traditional biologists, and even taxa – with Holly Bik providing probably the first BOSC keynote that covered worm identification and marine nematodes (slides here). On as well as a panel discussion on the topic (Open Source, Open Door: increasing diversity in the bioinformatics open source community”), BOSC and OBF (it’s parent organisation: the Open Bioinformatics Foundation) have made many steps over the last year to introduce a code of conduct (now implemented across ISMB), more travel fellowships, and the popular idea from Michael Crusoe to allow questions via index cards and twitter (inspired by this blog from Valerie Aurora).

IMG_9485Regarding the ongoing licensing arguments, while Ewan Birney circumvented the topic somewhat in his great keynote focussing on transparency, the OBF members and the large proportion of attendees representing the many open source bioinformatics communities beginning with “Bio” (regulars like Biopython being joined on the program by newer kids on the block like BioJS) know which side of the licensing debate their open source soda bread was buttered. Coming immediately after the Galaxy meeting we attended in Norwich, there was heavy Galaxy representation (the first day having 4 talks in a row), and in common with the Galaxy meeting, BOSC was heavy with talks mentioning docker (check our the bioboxes approach trying to standardise container publishing). Our favourite part of the meeting the last few years has been the sessions on Open Science and Reproducibility, and on top of projects promoting reproducibility of pipelines and workflows such as Galaxy, iPlant, and COPO, there were also interesting presentations pushing the boundaries of openness in science such as OpenSNP. Being a publisher that focusses reproducibility and re-usability as our primary criteria (we insist upon all data being CC0 and all code OSI-compliant), this all fits very nicely into our ethos and scope, and we were proud to be one of the official sponsors of the meeting this year.

A Model Publisher
Being a regular attender of ISMB and its many SIG (special interest group) meetings such as BOSC (see write-ups here) you may have read our previous posting about the “What Bioinformaticians need to know about digital publishing beyond the PDF” workshop we co-organised with members of the Research Object, Nanopublication and ISA communities. We presented there the early results of a reproducibility case-study we were working on with the workshop organisers, and it was timely this week that the results of this were finally published in PLOS One, and Alejandra González-Beltrán from the ISA-team had the opportunity to present it a few days later at the meeting (slides here). Throwing these research models on our SOAPdenovo2 publication and implementing all of the workflows in our server showed we could recreate the results of the paper, and use all of these models to structure the information in the paper and explicitly declare elements of experimental design, variables, and findings. We also archived snapshots of the relevant ISA-TAB, linkedISA, nanopubs and Research Object files in GigaDB. This example demonstrated the models served as guides in the curation of scientific information in the paper, and surprised us by detecting a number of inconsistencies in what we thought was a highly reproducible and scrutinised piece of work. As a result of this the authors have just issued an Erratum article, which has helped correct the scientific record. Norman Morrison also presented at BOSC on behalf of Research Object, and used this work as an in-the-wild example of their use (slides here).

For more on the meeting, Brad Chapman has again shared his notes in his blog, and being a twitter savvy bunch there are Storify’s put together by Peter Cock of both days of the meeting. The main ISMB-ECCB meeting is now underway, and look out for further updates from us and others using the hashtags #ISMB2015 & #ISMBECCB2015.

Further Reading
González-Beltrán A, Li P, Zhao J, Avila-Garcia MS, Roos M, Thompson M, et al. (2015) From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. PLoS ONE 10(7): e0127612. doi:10.1371/journal.pone.0127612

Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J et al. Erratum: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 4:30 2015; doi:10.1186/s13742-015-0069-2