Galaxy shines in the land of the midnight sun
Immediately after beer, mussels and genomics at ICG-Europe in Ghent (see BGI’s write-up of the event), last week was a blur of eye-wateringly expensive alcohol, brown cheese and reproducible research at the Galaxy Community Conference in Oslo. Now in its 4th year, and the second year we have attended (see the blog and Genome Biology meeting report from last years meeting in Chicago), GCC 2013 was bigger and better than ever. Tripling in size since the first meeting, the over 200 attendees took part in a full day of training followed by a packed two days of talks on reproducible research, and updates from Galaxy users and the core Galaxy team.
The goals of Galaxy in democratizing computational research and making it more reproducible align very closely with ours at GigaScience. While the Galaxy team were keen to stress that Galaxy is much more than a workflow management system, we’ve been using our GigaGalaxy data analysis platform to present workflows associated with our papers, allowing them to be shared, tested and more easily reproduced. We presented a poster at the conference showing an example of how this works for the SOAPdenovo2 genome assembler paper we published, and if you were weren’t in Oslo you can see the poster here as well as see the tutorial on our GigaGalaxy server.
This years meeting was a great opportunity to see how far things have come in the decade that Galaxy has morphed and grown from the perl-based GALA platform, and everyone was very excited by the growth in number of users of the platform – with 36,000 current users being supplemented by 1,300 new converts a month. With nearly 1PB of public data hosted, and in this data-driven age, increasingly large analyses there have been obvious scalability issues to tackle to deal with. On top of growing numbers using Galaxy in the cloud with Cloudman, pushing more of the new tools into the Toolshed, and encouraging distribution to the now more than 30 public Galaxy instances (including GigaGalaxy) has lead to a plateauing of 120,000 jobs per month from the main Galaxy server. Further measures were presented to enable the further distribution and growth of new users to continue, with BioTeam presenting their Slipstream Galaxy Server – that for $20,000 provides ready installed and optimized plug-and-play server solution (for more see the write-up in Bio-IT world). Furthermore Anton Nekrutenko presented improvements to the toolshed that will allow hosting and sharing of “suites” of tools, data, and workflows. With Jessica Kissinger presenting work they have done to make it much easier to open and integrate external web applications into your workflows, this had to be coupled with the addition of a semantic suggestion engine to enable annotation of the enormous number of potential resources, as finding services is much easier when people tell you what they do.
On top of the updates from the core team, and keynotes and talks from Victoria Stodden and Ross Lazarus on why the growing replication gap is such an important issue (or as Ross put it, the growing irreproducible “dark script matter” messing up automated pipelines), the most inspiring talks were from the growing number of Galaxy users around the globe, particularly from researchers taking it out of its original “genomics toolbench” comfort zone. The diversity was particularly impressive this year, with groups using it for electronic medical records, chemoinformatics, and proteomics. The Galaxy-P platform presented by John Chilton particularly interested us with our on-going NERC-funded metabolomics collaboration with Birmingham University also to wrap up a suite of tools for handling mass-spec data (see the press-release and GenomeWeb). Ravi Madduri mentioned in his talk on Globus Online that the ANL in Chicago were using the platform in cosmology, so its kind of appropriate that the scope of Galaxy is now truly intergalactic!
Future directions for Galaxy: integrating with publishing, and our special Galaxy series
In the state of Galaxy talks the Galaxy team outlined that the roadmap for the coming year would tackle a number of issues. On top of further investments in the toolshed and cloud, redesigns to the user interface, work on the visualization framework, API and federated infrastructure, was their interest in working much closer with journals. The potential of Galaxy to make data-intensive research much more reproducible and transparent is is exactly the reason GigaScience launched its own GigaGalaxy server, to enabling the hosting and implementation of Galaxy-based workflows and methods associated with our papers. Tied with the meeting we and the Galaxy team have announced a special thematic focused series on studies utilizing large-scale datasets and workflows in GigaGalaxy. On top of presentations from the meeting, any work utilizing and covering Galaxy are eligible for consideration in the series, and working with the Galaxy team peer review will be coordinated, thorough and timely.
BGI has been generously covering the open-access article-processing charges for the journal’s launch, and this offer will be extended to all submissions from the 2013 conference, as well as any other related papers until the end of this year. This series will remain open in a similar manner to our Genomic Standards Consortium and beyond: best practice in genomics research series, so related work utilizing Galaxy can also be continued to be added to the virtual issue. Please contact us (firstname.lastname@example.org) if you have any questions, or you can also submit through the GigaScience website, and our curators and data managers will get in touch about how they can help you with your data and workflows.
If you were not fortunate enough to be there in Norway you can still follow the huge numbers of tweets from the meeting check from Peter Cock’s storifies of each session, all the of the slides are already available from the conference page, and videos will hopefully also be posted shortly. The next meeting will be in Baltimore in 2014, but if you can’t wait until then many of the speakers and some of the Galaxy team will also be presenting at BOSC and ISMB later this month in Berlin, and we will also be co-organising a workshop “What Bioinformaticians need to know about digital publishing beyond the PDF’” that will also cover reproducible research and workflows. We’d like to thank all of the organisers and the core Galaxy team (particularly Dave Clements – the hardest working man in Galaxy). Hope to see many of you in Berlin and Baltimore.