Visualizations are becoming increasing important to graphically illustrate, understand, and glean insight from the explosion of larger and larger datasets in this supposed era of “big data”. Microbial ecology and the study of the microbiome is revolutionizing how we look at health, microorganism diversity and ecological interactions, but these studies are proving challenging to deal with the ever-expanding numbers of specimens sampled. New analysis tools are required to relate the distribution of microbes across these datasets, and integrate rich and standardized contextual metadata to understand the biological factors driving these relationships. Ambitious “megasequencing” projects such as the Earth Microbiome Project (EMP), which aims to construct a microbial biomap of the planet by sampling hundreds of thousands of samples across multiple environment types, and the Human Microbiome Project (HMP), an NIH-sponsored microbiome project testing how changes in the human microbiome are associated with human health or disease, have driven the need to be able share the terabytes of data produced in an easily understandable and scalable manner.
The democratization of microbiome research
The massive interest in this area has even led to direct-to-consumer-style microbiome projects American Gut and uBiome, that have both raised over $300,000 through crowdfunding and recruited thousands of volunteers that will allow comparisons in gut microbiota across global populations. We were fortunate to have Zachary Apte, one of the founders of uBiome in our “open source genomics” session at the ICG meeting last month (see pictured, and in the write-up here), and it was impressive to see the project managing to already collect even larger datasets than the HMP (3,500 participants and 6,500 samples from over 40 countries to date). There has been huge amounts of interest in the role of the microbiome in human health, with TED talks, articles in the New York Times, and high profile studies investigating the role of the microbiome in type-2 diabetes (with data hosted in our GigaDB database), obesity, and even psychiatric disorders. While there has been promising work in trying to restore healthy flora through stool infusion (“Fecal Material Transplants”), and billions of dollars of sales of probiotic products, there has been criticism by many (see Jonathan Eisen’s “overselling the microbiome” award) that too many of these studies hype and overstate the links to human disease and their potential to lead to cures. Being able to easily share, integrate and visualize the supporting data and results of the growing number of microbiome studies is an important step to discover meaningful patterns amongst the potential noise, and should help provide more confidence in the conclusions of the better designed studies.
One thing many of the large-scale microbiome projects have in common is the involvement of the Rob Knight Lab in Boulder Colorado, and the utilization of their popular QIIME (‘quantitative insights into microbial ecology’) open source software package that enables comparison and analysis of microbial communities. QIIME has been integrated with a molecular graphics viewer, and by framing visualized outputs around the data generated by the HMP this has enabled production of fascinating moving pictures of the microbiome (see here for an example). Good examples include visualizations of recovery from serious Clostridium difficile infection after fecal transplants, and a particular striking example of how a new born child’s gut microbiome develops over its first few years of life that we saw been presented at BGI’s ICG-Europe meeting this summer.
Presenting the EMPeror
There has been a need to better support the workflows of the modern microbial ecologist and produce more lightweight and easy-to-share outputs, and in a new Technical Note just published in GigaScience, Yoshiki Vázquez-Baeza and colleagues from the Knight lab present a new tool aiming to do just this: EMPeror. Allowing the user to colour experimental metadata dynamically and separate coloring from visibility, this helps encourage interactive exploration, understanding and analysis, elucidate and bring further insight into the patterns hidden in the data, as well as structuring it to be obtained much more easily.
EMPeror brings a set of customizations and modifications that can be integrated into any QIIME compliant dataset; with lightweight data files and hardware accelerated graphics, providing the state of the art for analyzing N-dimensional data. The outputs can be as small as 1.3% of the size of their original files, and are lightweight and simple enough to even be manipulated on a mobile device. Always keen to provide insight and transparency into the publication process and our papers, the following clip demonstrates a member of the Knight lab recreating figure 1-B2 of the manuscript on their iphone in just a few swipes.
As with other papers in GigaScience, to aid transparency and reusability all of the review history is available from the review history page, the version of the software at point of publication together with the example results used in the paper to illustrate its functionality are available to download from our GigaDB database, and the latest version of EMPeror is also available from the projects GitHub page.
2. Vázquez-Baeza, Y; Pirrung, M; Gonzalez, A; Knight, R (2013): Example files and supporting material for “EMPeror: An interactive analysis and visualization tool for high throughput microbial ecology datasets.” GigaScience Database. http://dx.doi.org/10.5524/100068