As mentioned in our Happy New Year blog post, one of the highlights in 2017 was a second GigaScience hackathon workshop which we held in November last year in our Hong Kong BGI office. Funding for this workshop came from a project we have called CUDDEL which was awarded a grant by the BBSRC from their China Partnering Awards scheme (see our previous write-up). Thanks to this award, we had 19 friends from the UK, USA, Australia and China joining us for a week-long event in Hong Kong. Where our first metabolomics hackathon was a data-centric Bring Your Own Data (BYOD) party, this was very focussed on the computational pipelines and tools processing and analysing that data, making it our first BYO Workflow party.
Metabolomics case study
Since the main focus of the CUDDEL workshop was on reproducible metabolomics, we investigated how a metabolomics data analysis can be made to be as reproducible as possible using publicly available online tools. Work towards this aim was started on a case study using an unpublished liquid chromatography-mass spectrometry (LC-MS) metabolomics dataset. Eva Li from Govita Laboratory, the owner of the dataset, presented an introduction about it and how it was generated from blood plasma samples collected by GlaxoSmithKline (GSK) from male volunteers undergoing longitudinal study involving exercise and food intake interventions. Eva described how the mass spectrometry measurements about the metabolites in the samples were made using LC-MS, and her initial results on the levels of metabolites affected by exercise and food from multivariate and univariate statistical analyses.
Eva’s original analysis of the GSK dataset was described in her PhD thesis and involved the use of MatLab and R scripts, and Taverna workflows. For consistency, we at GigaScience have been using our data skills to re-implement the MatLab data pre-treatment and cleaning scripts in the R programming language. The output of the R script was then used by Saravanan Dayalan from Melbourne University to develop an analysis comparing the statistical results from combinations of the interventions to look at the effects of exercise and food intake.
Reza Salek from EBI developed a Galaxy pipeline to annotate a peak features with their metabolite identities, whilst Ram Shrestha from Sainsbury Laboratory worked on a Galaxy tool wrapper for MetaX, an R package used in the data analysis for correcting batch effects in the LC-MS dataset.
Further development of ISA tools
A major focus in the CUDDEL grant is the continued development of the ISA tools to facilitate the reporting of scientific metadata. Technical work on ISA tools at the Hong Kong workshop involved Philippe Rocca-Serra, Alejandra Beltran-Gonzalez, Susanna Sansone and David Johnson from Oxford e-Research Centre, Ralf Weber and Thomas Lawson from Birmingham University, as well as Ram Shrestha, and Saravanan Dayalan working with us on the representing complex designs in the ISA format, creating a Galaxy tool for creating ISA documents, integrating mzml2ISA tool into the ISA API and adding semantic support for using the STATO statistics ontology in the ISA API.
Public seminar on FAIR data
Getting so many reproducibility and research data experts out to Hong Kong we also carried out some wider outreach. Our Executive Editor at GigaScience, Scott Edmunds introduced Susanna Sansone who gave a global policy and practices-oriented seminar on how to make better use of research data.
We are at an @opendata_hk event talking research #opendata policies and practices with @SusannaASansone taking #FAIRprinciples pic.twitter.com/SGsJTHJD0r
— GigaScience (@GigaScience) November 20, 2017
Organised by Knowledge Dialogues and Open Data Hong Kong, the presentation was given at the Innocentre in Kowloon and was attended by representatives from Hong Universities, Hong Kong Science Park companies and Taylor and Francis publishing group. In her presentation, Susanna showed how the global science community is working on the FAIR data initiative to make data Findable, Accessible, Interoperable and Re-usable. With EU open science programs, and the Go-FAIR initiative and the NIH Big Data 2 Knowledge program in place – what does Hong Kong need to do to keep up with these global policy movements? This talk was very pertinent to Hong Kong since its research universities produce data that are not easily accessible (see the white paper on open science in Hong Kong that seminar hosts Waltraut Ritter and Scott Edmunds have written on the topic). And was also covered in the OeRC blog.
Tutorial and presentation on Common Workflow Language
A number of staff from GigaScience’s parent company, BGI travelled from Shenzhen to Hong Kong to attend the CUDDEL workshop to learn about CWL from Michael Crusoe. He also participated in a workshop organised by the CNGB department of BGI on the final day of the workshop where he gave a presentation on CWL to the workshop audience which included representatives from Aliyun, Huawei and Baidu as well as BGI.
Visit to BGI Mass Spectrometry Initiative in BGI-Shenzhen
We arranged for Reza Salek, Ralf Weber and Thomas Lawson to visit the BGI Mass Spectrometry Initiative (MSI) team led by Guixue Hou at CNGB on the final day of the CUDDEL workshop. Reza provided trouble-shooting support to Chunwei Zeng from MSI on uploading metabolomics data to the MetaboLights database. The MetaX package in R was also discussed since BGI are its developers and the GSK data set analysis uses this in its data processing.
A metabolomics-eye view of BGI – taking @metaboknight & @ralf_weber around the @BGI_Events mass-spec floor after our #CUDDEL workshop pic.twitter.com/wIaPsD5ICS
— GigaScience (@GigaScience) November 24, 2017
Thanks everyone for visiting us in Hong Kong!
Day 4 @BBSRC #CUDDEL workshop, tomorrow visits @BGI_Genomics Shenzhen – huge thanks to @GigaScience team & Peter Li! #FAIRdata #openscience pic.twitter.com/Cx6wcXQO3O
— FAIRlady (@SusannaASansone) November 23, 2017