Publishing our first virtual box of delights to aid the fight against heart disease

Sheer Heart Attack
Diagnosis is key to beginning treatment for preventing coronary heart disease, the most common cause of heart attacks. One useful tool in the fight against this leading killer is magnetic resonance imaging, which allows the direct examination of blood flow to the myocardium of the heart. However, for this perfusion analysis technique to be the most effective requires compensation for the breathing motion of the patient, which is done using complex image processing methods. Thus, there is a need to improve these tools and algorithms. The key to achieving things is the availability of large publicly available MRI datasets to allow testing, optimization and development of new methods.

In our latest paper in GigaScience, researchers from Universidad Politécnica de Madrid in Spain and the National Institutes of Health in the USA provide a fantastic example of open data sharing to help build these exact tools: a wealth of patient imaging data. Even better: to enable reproducible comparisons between new tools, the researchers and journal have taken the unusual step of publishing and packaging the data alongside tools, scripts and the software required to run the experiments. This is available to download from our GigaDB database as a “virtual hard disk” that will specifically allow researchers to directly run the experiments themselves and to add their own annotations to the data set.

Black Box (Ride on Time)?
With our goal to publish different shaped research objects to the traditional static paper, on top of publishing Galaxy workflows and packages to recreate papers in Knitr, this new paper provides another great example of a more reproducible and interactive way of disseminating scientific research. While this approach makes it much easier for reviewers and future users to immediate run applications. Allowing reviewers and future users to reproduce the experiments described in a paper through launching a virtual machine means there would be no need to install complex, version-sensitive and inter-dependent prerequisite software components. While this ability to quickly and easily recreate the results of a study would seem to address many issues of reproducibility, there are arguments from some (including Titus Brown) that this reproducibility is achieved in a not very useful way, and virtual machines are essentially black boxes that aren’t really reusable for remixing or mashing up the code. Helping to address these arguments somewhat, the data (MRI scans from a series of ten patients considered clinically to have a stress perfusion defect) and the scripts and the software required to run the experiments are provided separately to enable interested parties to directly run the experiments themselves. The scripts to run the motion compensation algorithms are also available in Sourceforge, and as with all of our papers everything is available under open source licenses.

Images_162As one potential user of these resources, Professor Alistair Young, Technical Director of the Auckland Magnetic Resonance Research Group (and GigaScience Editorial Board Member) commented: “Very large amounts of medical imaging data are now becoming available through registries and large population studies. Well validated, automated methods are required to derive maximum benefit from such resources. The paper by Wollny and Kellman exemplifies how data and algorithm sharing can advance the field by providing a platform by which existing methods can be tested and new methods validated against existing benchmarks. Such benchmarking datasets are essential to advance the field through objective metrics and standards.”

Having everything wrapped up in a virtual machine also made things simpler during the scientific peer-review and publication process, as the settings, packages and file locations were already set up in a working configuration. One of the people carrying out this testing process was GigaScience Data Scientist Rob Davidson, who stated “Actually testing the code during review is sadly almost a novel concept and one that needs to roll out as a standard. But even more: if it’s easy for the reviewers, it’s easy for the community to use too.”

As well as being important for improving the diagnosis for the number one cause of death world wide, the continuing rise in retractions of published scientific articles, makes the addition of direct means to improve article reproducibility is essential, both for the ability to be able to trust current findings —on which future studies are built— and to prevent the public losing confidence in the research community they fund. Publishing a virtual machine, an interactive and executable publication provides an example to the scientific community and test case demonstrating a potential new type of scholarly output.

While we this is the first first myocardial MRI data we’ve published, we have recently published a large series of MRI scans of 98 sea urchin species, as well as a functional MRI dataset for motor, language and spatial attention function. We are currently collecting papers for a fMRI thematic series edited by our board member Russell Polldrack, so please contact us if you have data or papers you are interested in publishing.


1. Wollny, G; Kellman, P: Free breathing myocardial perfusion data sets for performance analysis of motion compensation algorithms. GigaScience 2014 3:23

2. Wollny, G; Kellman, P (2014): Supporting material for: “Free breathingly acquired myocardial perfusion data sets for performance analysis of motion compensation algorithms”. GigaScience Database.

Recent comment

Comments are closed.