Enabling bioinformatics tools to smoke the peace pipe together

Peace Pipe, wikipediaScientific workflow software such as Taverna, Knime and Pipeline Pilot can overcome interoperability issues relating to the access of tools and data format conversions. Such tasks are automatically handled by the data processing pipeline during its enactment by the workflow software. Galaxy is another workflow system, and its annual conference was held last month at the University of Chicago. The growing popularity of Galaxy was plain to see with over 200 delegates in attendance. This number was up from 148 delegates at last year’s Galaxy conference in the Netherlands which itself was an increase from the 69 participants at the 2010 conference at Cold Spring Harbor Laboratory in the US.

In a change to previous conferences, a training day preceded the main conference which provided an opportunity for delegates to learn about new features and enhancements made to Galaxy. The sessions in the training day catered for every kind of Galaxy user. For those new to the software, an introduction to Galaxy was provided. More expert users could learn about using Galaxy for RNA-Seq and variant analysis of NGS data. In contrast, tool developers learnt about the architecture of Galaxy and its API for extending functionality of the software. Training materials are available online here, and handily working via virtual machine images.

Presentations at the main conference could be grouped into three themes: Galaxy developments, data and tool integration, and applications of Galaxy. A number of ingenious customisations of Galaxy were presented. I personally enjoyed learning about the Windows2Galaxy tool which was presented by Liram Vardi from Agilent Labs. This is a tool which uses virtual machine technology to enable software built for the Windows operating system to be used from within Galaxy installed on a Linux platform. This is great news for those of you making use of software provided by vendors of laboratory machines such as mass spectrometers and microarray scanners which tend to be Microsoft-based. I also enjoyed listening to Ira Cooke and Jeremy Goecks showing the impressive features they had developed in Galaxy for proteomics data analysis and NGS data visualization, respectively.

Of the excellent presentations made by the core Galaxy development team, Greg von Kuster reported on the enhancements made to the Galaxy Tool Shed including the ability to track the versioning of tools. Benefits of this feature include the ability to make checks on the reproducibility of data if new versions of tools are available in a Galaxy instance. Data reproducibility is a key focus of GigaScience such that we are looking to enable results reported in research papers published in our journal to be reproducible, and we are using Galaxy as the platform for enabling this feature. The data processes described in our papers will be implemented as Galaxy workflows (see the slides from our collaborator Tin-Lap Lee’s talk at the meeting for a preview of this). They will be made downloadable and shareable from myExperiment, repository of data pipelines, which is developed by David De Roure at the University of Oxford.

Next year’s Galaxy conference will be held at Oslo University in Norway. There is a momentum in the adoption of Galaxy within the NGS community. With the work developing Galaxy by its core team being reciprocated by development efforts of its users around the world, there is every reason to believe that it will be gaining more users in the future and we will see it being applied in other branches in post-genomic science in 2013.

Peter Li, GigaScience

Recent comments

Comments are closed.