GigaScience Pushes Metabolomics Open Data & Training

With our focus on large-scale biological data, mass spectrometry has been one area of particular interest and attention. Metabolomics involves the detection and quantification of small molecules (metabolites) in living organisms using mass spectrometers. The measurements made from these sophisticated instruments are analysed using computational programs to determine the abundances of metabolites, the results of which can provide an indication of an organism’s cellular condition and health. These data can be stored and shared through public repositories such as EBI’s MetaboLights database, which launched in 2012. However, data sharing has some way to go yet to keep pace with the publication of scientific papers in the field.

To address this a new partnership between the EBI, the Universities of Birmingham, Manchester and Oxford, The Sainsbury Laboratory and TGAC with BGI and GigaScience, has received funding from the UK’s BBSRC to support the sharing of data and analyses in metabolomics. This is our second BBSRC UK-China collaboration grant, the first helping with organization of our “Bring your own Data” hackathon in China last year (see our write-up), and also follows collaborations (and NERC funding) to work on open metabolomics workflows with the University of Birmingham.

The award of £30,000 from the BBSRC will enable the consortium to host training workshops to support scientists in the UK and China in managing and sharing their metabolomics data and analyses. Such computational skills have been highlighted by the BBSRC as being essential for furthering the impact of science on society and the economy. The consortium will work with Software Carpentry, Data Carpentry, ELIXIR and the Galaxy Project: four international networks dedicated to building computational and bioinformatics skills capacity. Dr Christoph Steinbeck, team leader of Cheminformatics and metabolism at the EBI comments: “There is already a lot of commitment in metabolomics research community to data sharing and reuse – our main challenge is simply in training people how best to incorporate this into their regular working practices. The BBSRC has recognised that this area of molecular biology is growing more quickly than any other, and that we need to do everything we can to train and support scientists in sharing data. That will lead to better quality data, more efficient research and shorter time to discovery.”

Seeding new data sharing ecosystems
With MetaboLights soon to publish its 100th public dataset, it is encouraging to see the growing volumes of mass spectrometry data in the public domain, but there is still a long way to go to get this community to open up and share their raw datasets in similar levels to the genomics community for example. This is particularly for clinical and reference datasets, as well as less conventional applications. With the aim of incentivizing the release of more biological data into the public domain, in the last month we have published our first metabolomics datasets and Data Note articles describing them. The first of these describes and releases data from 180 plasma samples from healthy maternal pregnancy. This publication is the first to come from our BYOD party in Hong Kong last August, in collaboration with the ISA Team at the University of Oxford e-Research Centre as part of BBSRC UK-China partnering award (BB/J020265/1). The data is hosted in MetaboLights (MTBLS146), and also mirrored in our repository.

The second Data Note is the first publicly available mass-spectroscopy imaging dataset, an extremely useful, if data intensive, technique used to visualize the spatial distribution of chemical compositions by their molecular masses. Making up close to 100GB of data, this includes four 3D MALDI imaging MS datasets, consisting of millions of spectra from murine tissue, human tissue and microbial colony samples and a 3D DESI imaging MS dataset from human colorectal adenocarcinoma tissue. The supporting data in GigaDB also includes code scripts for interacting with the imzML format files, as well as faster Aspera access. This dataset is also currently undergoing the final touches of curation in MetaboLights (MTBLS176) and the team there has produced a novel ISA specification for MS imaging data that will hopefully be adopted by the community as we strive for greater standards and interoperability.

These releases represent useful data for the community, for algorithm building and training, especially the 3D imaging data which is a novel and exciting approach that could do with community support for development of tools and pipelines. GigaScience is pleased to support the Metabolights repository with early data-release, mirroring, DOIs for citation tracking, analytics storage, and of course extra metadata and contextual information via the Data Note articles.

Screen shot 2015-05-13 at 5.06.13 PMFollowing from these first clinical and animal datasets, GigaScience is also seeking data and submissions for a new series on “Plant Metabolomics: approaches, applications, and challenges” guest edited by Ute Roessner and Ruth Welti. Please contact us if you have potential submissions or interest in participating in future BYOD parties and workshops.


1. Luan, H; Meng, N; Liu, P; Feng, Q; Lin, S; Fu, J; Chen, X; Rao, W; Chen, F; Jiang, H; Xu, X; Cai, Z; Wang, J (2015): Nontargeted metabolomics and lipidomics HPLC-MS data from maternal plasma of 180 healthy pregnant women. GigaScience Database.

2. Luan H, Meng N, Liu P, Fu J, Chen X, Rao W, Jiang H, Xu X, Cai Z, Wang J. Non-targeted metabolomics and lipidomics LC-MS data from maternal plasma of 180 healthy pregnant women. Gigascience. 2015 4:16.

3. Oetjen, J; Veselkov, K; Watrous, J; McKenzie, JS; Becker, M; Hauberg-Lotte, L; Strittmatter, N; Mróz, AK; Hoffmann, F; Trede, D; Kobarg, JH; Palmer, A; Schiffler, S; Steinhorst, K; Aichler, M; Goldin, R; Guntinas-Lichius, O; von Eggeling, F; Thiele, H; Maedler, K; Walch, A; Maass, P; Dorrestein, P; Takats, Z; Alexandrov, T (2015): Supporting materials for “Benchmark datasets for 3D MALDI- and DESI-Imaging Mass Spectrometry”. GigaScience Database.

4. Oetjen J, Veselkov K, Watrous J, McKenzie JS, Becker M, Hauberg-Lotte L, Kobarg JH, Strittmatter N, Mróz AK, Hoffmann F, Trede D, Palmer A, Schiffler S, Steinhorst K, Aichler M, Goldin R, Guntinas-Lichius O, von Eggeling F, Thiele H, Maedler K, Walch A, Maass P, Dorrestein PC, Takats Z, Alexandrov T. Benchmark datasets for 3D MALDI- and DESI-imaging mass spectrometry. Gigascience. 2015 4:20.