Here we present a guest blog by our Editorial Board Member Russell Poldrack, Professor of Psychology at Stanford University, who highlights the challenges and opportunities surrounding imaging data to enable the neuroscience community to “stand on the shoulders of giants”.
The sharing of neuroimaging data is an idea whose time has finally come, but many challenges remain. Foremost is the incentive problem: Why should a researcher take the time to organize their data for sharing when they could spend the same time working on a new study or paper? Related is the credit problem: How will a researcher receive credit for having shared their data? In addition, there are technical challenges that focuses on how to best share data so that they will be useful for other researchers, which is made difficult by the lack of an agreed-upon standard for data organization. Given the pivot point at which we now find ourselves —between the desire to share data and the inherent challenges, I am working with GigaScience on a thematic series to focus on data sharing in this field. This series centers on functional MRI (fMRI) research and big-data analysis.
There is now an increasing amount of neuroimaging data openly available online; we (Poldrack & Gorgolewski, Nature Neuroscience, 2014) recently estimated that there are now more than 8,000 MRI datasets available for sharing. One welcome development is the move towards completely open sharing. Whereas older projects, such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) required co-authorship as a requirement to access the data. Most newer data sharing projects are now making the data available with no restrictions on authorship. At the forefront of this move is the Human Connectome Project (HCP), which has made available the complete raw datasets for more than 500 individuals, to date. It’s important to note that open sharing does not necessarily mean open data availability; access to some of the HCP data is restricted, because of the potential for identification of individuals due to the family structure in the data. For this reason, access to the full dataset requires a more stringent set of certifications. However, once the data have been accessed, there are no requirements for co-authorship or approval of resulting manuscripts.
A common question about open sharing is how researchers will get credit for their work in generating the shared data, if they are not included as co-authors. The “data paper” provides a potential solution to this challenge, by providing a citable reference for the dataset. A number of journals including GigaScience are now publishing data papers, which will hopefully help spur even more open sharing of datasets. As an example, Gorgolewski et al. (2013) recently published a data note in GigaScience that described a test-retest fMRI dataset for motor, language and spatial attention tasks. This dataset not only allows researchers to replicate the results from the researchers’ previous empirical paper, but also allows testing of new questions about test-retest reliability across multiple tasks (e.g., for researchers developing new analysis methods).
As the number of available datasets increases, there are also increased opportunities to apply new methods that span across multiple datasets, which will allow the identification of more complex function-structure associations than are possible with a single task dataset. An upcoming paper by Varoquaux and Thirion, which is part of the GigaScience fMRI series, outlines how these kinds of techniques are changing the face of fMRI data analysis, and how they have the potential to uncover new aspects of functional brain organization.
Another grand hope for shared neuroimaging data is that they begin to provide new insights into neuropsychiatric disorders, which remain particularly difficult to treat. As another upcoming paper by Turner in the fMRI series points out, the amount of shared data relevant to these disorders has increased greatly over the last decade, though the open sharing of disease-relevant data still lags behind the sharing of data from healthy individuals. With sufficient data, we can begin to ask whether neuroimaging can provide the means to reconceptualize neuropsychiatric disorders. Given the substantial momentum behind the NIMH RDoC project, which aims to classify psychiatric disorders based on biology rather than symptomatology, the development of greater databases of disease-relevant data would be very timely.
Shared data are only useful if their structure is transparent, and a major remaining challenge for sharing of neuroimaging data lies in the need for standard frameworks to clearly describe the structure of the data and the associated metadata. In our OpenfMRI project, we have found that it can sometimes take tens of hours of curator time to reconstruct and reformat a dataset for processing using a standard data analysis pipeline. The Neuroimaging Data Model (NI-DM) is a emerging standard in the field of neuroimaging that is being developed by the INCF Neuroimaging Data Sharing Task Force . While still under development, this framework is already being integrated into some of the major fMRI analysis packages, including Statistical Parametric Mapping (SPM). Once this integration is complete, the addition of shared datasets into databases will be much more straightforward.
The adoption of a formal data model by fMRI analysis packages could also help improve the reproducibility of fMRI research, by allowing others to reproduce the full analysis pathway based on a formal representation of the pipeline rather than reconstructing the analysis from the textual description in the methods section (which often is lacking key specific details). Data sharing in neuroimaging is finally hitting its stride, and the papers in the upcoming and ongoing GigaScience fMRI series highlight both the progress and the remaining challenges.
There is still time to submit papers for this series, to share fMRI data and findings, and to work toward overcoming the challenges we face in making our research accessible and reusable to the entire neuroscience community.
“fMRI: advances and challenges in big data analysis” is a new thematic series from GigaScience. This cutting-edge series aims to explore and highlight new advances and ongoing challenges and to improve data sharing and reproducibility with fMRI data. We encourage the submission of Research, Technical Notes, and Data Notes, where interesting datasets are described, curated and hosted in our database, GigaDB (see this test-retest description and dataset for example). We also consider thought provoking commentary and reviews in this area.
GigaScience’s Editor-in-Chief, Laurie Goodman, will be attending the upcoming Society for Neuroscience (SfN) 2014 meeting and will be at the BioMed Central stand, so please drop by booth 104 for more information.
Submit your manuscript soon to take advantage of this year’s FREE APCs (ending January 1, 2015) along with free data curation and hosting for both the series and other submissions, thanks to generous support from BGI. This is a savings of up to £1,250 GBP.