Join the Open Data club.
Every year we catch up with fellow enthusiasts of open data, open source and open science at BOSC, the Bioinformatics Open Source Conference that is a Special Interest Group of the ISCB’s annual ISMB conference. This year there is an interesting juxtaposition with the location, being held at the Disney World resort in Orlando. While our and BOSCs open ethos is about breaking down barriers and maximizing re-use of scientific outputs by putting it in the commons, Disney has form in restricting the public domain through promoting the Copyright Term Extension Act. Also known as the “Mickey Mouse Protection Act”, their lobbying extended copyright protection (originally a statute of 14 years) up to 120 years. Being in the Magical Kingdom prompted many of the speakers to be open in sharing childhood pictures (and even some adult ones) of their previous visits. But wasn’t a particularly conducive atmosphere for some of BOSCs traditional activities such as the Codefest hackathon, which this year was hosted at the FamiLAB hacker space.
The location in the Magical Kingdom also provoked mixed emotions coming so soon after the tragic Pulse nightclub shooting, and the organizers made a conscious effort to acknowledge and pay tribute to the victims. On top of moving tributes to the 49 at the beginning and end of the meeting, and presenting a new rainbow BOSC pear logo, trying to leave a practical legacy BOSC had their first BOF on Activism in the professional world. Trying to create a space to share thoughts on how we respond to matters at the intersection of our personal and professional interests, this was a perfect continuation from last years “Diversity” theme and efforts implementing an ISMB-wide code of conduct (write-up here).
The BOSC keynotes were perfectly pitched to demonstrate the advantages of open bioinformatics. Jennifer Gardy’s keynote on “The Open-Source Outbreak” really put across the life and death consequences of closed source, and how open data and genomics will save us all from the next horrible infectious disease pandemic (see her slides and the video). This is one of our favourite topics, as the first dataset we disseminated via GigaDB was the pathogen genome responsible for the from the deadly E. coli 0104:H4 outbreak that killed over people in Europe in 2011. Ending her talk with the example nostalgically brought back memories of the global crowdsourcing efforts it helped kickstart (see our blog from the time on this).
Steven Salzberg’s keynote was a more broad overview of why science moves faster in an open world, covering open source, open data and open access publishing. Refreshingly he didn’t pull any punches, naming and shaming (via a wall of shame) projects that had unnecessary and unhelpful embargo policies. Particular ire was held for the Broad institute and their closed source GATK tool, coming out of a $45 million dollar NHGRI grant. On top of the keynotes, the advantages of open were also demonstrated via a great panel on growing and sustaining open source communities, with insight from active and productive communities such as Galaxy, OpenSNP, Mozilla Science, and Project Jupyter.
Reproducibility rather than subjective impact is the key criteria we focus on at GigaScience, and one of our key interests in this area is through the publishing and sharing of workflows and containers (see our blog on docker publications and our Galaxy series for examples). We were very excited to see this year has added a new session on Workflows. Much of this covered the Common Workflow Language (CWL), both overviews from Michael Crusoe, the lead community engineer developing this common format for bioinformatics tool and workflow execution, and examples of its use in the field. CWL has history at BOSC, coming out of the 2014 Codefest, and highlighted through a talk in last years program, seeing wider users presenting it in most of a track shows the promise of the standards growth and uptake. Version 1.0 launched just before the meeting, and watch this space on how these efforts will help standardize sharing of our published workflows.
BOSC are at the bleeding edge in these areas, again having a specific session on Open Science and Reproducibility, but the wider ISMB community are not much further behind. Computational Biology has been built on open software and open data, and the society has issued statements expressing their deep concern about the restrictive and potentially damaging opinions voiced by the New England Journal of Medicine comparing data users as “Research Parasites”. Now the main ISMB meeting is underway we’ll be continuing to hang around with the giant copyright-protected cartoon characters for a few days longer, and also handing out some of our new “Game of Omes” t-shirts featuring a cartoon character of our own (GigaPanda – now Mother of Data). Look out for further updates from us and others using the hashtag #ISMB16. We’ll be sticking around in Orlando after that for The Allied Genetics Conference (TAGC 2016), so follow tweets from that via #TAGC16.
There is nice coverage of BOSC in Brad Chapmans blog, and archived storifies of the prolific tweets (part one and part two). The F1000 BOSC channel will also be shortly uploading all the slides and videos.
UPDATE: 18/7/16: Embedded Steven Salzberg’s keynote video now it has been posted to youtube. We also have a follow up blog on the full meeting (and 4th birthday celebrations).