The Importance of Data Sharing in Neuroscience – GigaScience at INCF 2022

The International Neuroinformatics Coordinating Facility (INCF) had its annual Neuroinformatics Assembly from 12th-16th September 2022. Being a community representing the most forward thinking and Open Science friendly members of the neuroscience community, we’ve been long term attendees and supporters of the meeting (see GigaBlog write-ups of the 2015, 2019 and 2021 editions). As with last year’s meeting, this year’s INCF 2022 Assembly was a virtual conference and 220 people participated. There was great interest this year in FAIR Data Management, which is necessary to ensure neuroscience data is Findable, Accessible, Interoperable, and Reusable. Which was topical with the Chair of the INCF Board (and GigaScience Ed Board Member) Maryann Martone publishing a commentary for our 10th birthday this summer on the importance of community organisations such as INCF for open and FAIR efforts in neuroinformatics. GigaScience Press Data Scientist Chris Armit attended the virtual conference, and reports below on some of the highlights.

FAIR Workflows for Neuroinformatics
In her talk entitled, “The Green Valley where the FAIR Work flows”, Xiaoli Chen (DataCite) presented the “FAIR Workflows Project” and highlighted the core aims of this project, which is to capture FAIR entities, practices, supporting structures and outputs. As Xiaoli explains, the FAIR Workflows Project aims to provide “an exemplar FAIR and Open workflow based on the reality of an entire research lifecycle.” The core interests of the FAIR Workflows project are detailed below.

  • FAIR Entities
  • Uniquely identified resources associated with a project, including: researcher as identified by ORCID ID; research organisation as identified by ROR ID, and funding agency / funding streams as identified by ROR ID (funding body) and Grant ID.
  • FAIR Practices
  • Sharing various types of interim output, including: Data Management Plan, Preprint, Code, and Dataset.
  • FAIR Supporting Structures
  • Tools and platforms that integrate Persistent Identifiers (PIDs) and metadata workflows. This can include Data Management Plan (DMP) authoring tools.
  • FAIR Outputs
  • Assigning PIDs to outputs with rich metadata annotation. This is especially useful if the metadata utilises reporting guidelines to provide a uniform structure, and ontologies to provide controlled terms. This can also include domain specific metadata.

Xiaoli highlighted the utility of domain-specific metadata templates in capturing these details and “ensuring that metadata follows a standard recommended by the target research community of the data.” Xiaoli highlighted additional outputs of the FAIR Workflows Project, which include an API to enable computational access, and a user interface / dashboard to allow visualisation of linked DMPs and research outputs.

INCF 2022 workflows
A schematic of FAIR Workflows, as presented by Xiaoli Chen, highlighting the incorporation of Platforms & Metadata, PID Graph & API, and User Interface / Dashboard.

So what do the Neuroinformatics community think of this project? Overall, it was very well received, but when Xiaoli and her colleague Zefan Zheng posed the question, “Do you have any concerns about engaging in FAIR practices”, the INCF community raised a number of issues relating to “implementing FAIR” when projects have already begun, researchers wishing to “keep data private for future studies” – a point that was echoed in additional talks throughout the conference – and the “embarrassment” of early release of poor quality code. As members of DataCite and publishers of neuroscience data, the concerns of the INCF community were illuminating and highlighted a need for FAIRification to manage neuroscience data more efficiently.

INCF 2022 screenshot
Xiaoli Chen and Zefan Zheng chair a discussion on FAIR practices.

For more details on the FAIR Workflows Project, please see Xiaoli’s blog.

Towards an Understanding of Natural Cognition with EEG
In the INCF 2022 session entitled, “EEG: the interface between neuroinformatics and clinical/basic science research”, Pedro Valdes-Sosa as chair introduced the challenges the EEG and neuroimaging community face as an interface between neuroinformatics and clinical/basic science research. EEG is low-cost and has excellent temporal resolution, and Pedro was swift to point out that this research focus can be used to address global health problems in “countries that don’t have so many resources”. Pedro additionally highlighted that “there was concern whether people are actually going to share data”, and this was a recurring theme of the conference.

Michael Milham (Child Mind Institute & Nathan S. Kline Institute, New York) delivered an enlightening talk entitled “The NKI-Rockland Sample II: Scaling Multimodal Data Acquisition to Study Brain, Behaviour and Cognition Across the Lifespan”. Michael explained how EEG characterisation of the brain has enabled psychiatric phenotyping, and that simultaneous, multimodal characterisation – combined with ecological sampling – is allowing researchers to explore natural cognition. The core concept of ecological sampling / ecological momentary assessment (EMA) is repeated sampling of people across the day, on multiple days, as a means of finding correlates of motor function and cognition. To obtain these correlates, comprehensive physiological phenotypes are sampled using a range of motor and cognitive tasks. It is known that specific tasks, such as handwriting in children with ADHD, or handwriting in patients with Parkinson’s disease, can be used to highlight motor and cognition phenotypes. Michael additionally highlighted the Archimedes Spiral Drawing Task, and explained that, “in spiral drawing you can find associations with psychiatric status as well as medication effects”.

Motor and Cognitive Tasks used by NKI-RS II.

Michael additionally introduced the highly ambitious Nathan Kline Institute-Rockland Sample (NKI-RS) II, which aims to create a large-scale community sample of participants across the lifespan. 500 participants between 9 years and 75 years are included in this study, and measures include a wide array of physiological and psychological assessments. Importantly, anonymised data generated from this study are to be publicly shared openly and prospectively, “on a quarterly basis” according to the NKI-RS website.

Michael further explained that originally, this initiative was to include 1000 participants, but due to logistic difficulties the number of participants has been halved. This is understandable, especially when Michael highlighted the various Data Collection challenges that the NKI-RS II have encountered, which include: participant burden/compliance; experimenter/staff burden; proprietary formats; equipment malfunction; and supply chain issues. There are also issues relating to Data Sharing, such as ensuring participant privacy. The NKI-RS II is an immensely significant and highly challenging project, and Michael is to be commended for coordinating such an invaluable research initiative.

The Challenge of Brain Image Registration
Lydia Ng (Allen Institute for Brain Science), in her talk entitled “Registration of single cell morphology data to the Allen CCF”, highlighted the challenges of registering 3D neuroimage data to the Allen Mouse Brain Common Coordinate Framework. Using the Allen Developing Mouse Brain Atlas as an example, Lydia highlighted the complexity of spatially mapping fluorescence micro-optical sectioning tomography (fMOST) Atlas source images plus new images to an average template. The average template is then registered onto the anatomical atlas of the developing mouse brain to obtain a Common Coordinate Framework aligned image with detailed anatomical annotation.

An overview of the fMOST Registration Framework.

Anatomical annotation (colours) provides the necessary context to understand fMOST brain images (grayscale).

fMOST data can appear striped and so as a preprocessing step, notch filters are applied to the frequency domain to remove stripes.

INCF 2022 preprocessing slide
To remove artifacts, preprocessing steps are applied to the image frequency domain.

The results are very impressive and this image registration approach enables micro-anatomical resolution anatomical annotation to be overlaid on brain image data.

An additional highlight was the presentation by Harry Carey (University of Oslo) on DeepSlice. DeepSlice is a 3D image registration tool that utilises a Deep Neural Network to register mouse brain histology images and block-face images to a volumetric atlas. A core strategy used by DeepSlice is its ability to find 3 corner coordinates, in 3D, as best fit for automatic registration of mouse brain images. DeepSlice utilises an iterative process, as Harry explained, “involving hundreds of thousands of training cycles” and has been applied successfully to Allen Gene Expression Atlas Data (131k slide-mounted histological sections) and Allen Connectivity Atlas Data (443k sections obtained by block-face imaging).

INCF 2022 blockface slide
Allen Connectivity Atlas Data, with multichannel images of viral reporter expression, can be automatically registered using Deep Slice.

The DeepSlice software tool is immensely impressive, and a preprint with more details of this software tool is available on bioRxiv.

Global Survey on Data-Sharing Barriers in Neuroscience
A major focus of this year’s INCF 2022 Assembly was Data Sharing, and towards this end the INCF Infrastructure Committee is trying to identify barriers to data sharing and reuse among neuroscience researchers worldwide, with a brief anonymous survey. The results will be made public, and will be used by INCF and collaborators to develop strategies and activities for supporting the global neuroscience community. 

To have real impact, the survey needs to reach the broadest possible range of neuroscience researchers. If you are a neuroscience researcher, please complete the survey here.

We look forward to the INCF 2023 Meeting.

Further Reading
Martone ME. A decade of GigaScience: the importance of community organizations for open and FAIR efforts in neuroinformatics, GigaScience, Volume 11, 2022, giac060,