Open imaging data to assist AI in the COVID-19 fight. Q&A with Dominic Cushnan

National COVID-19 Chest Imaging Database

Published today in GigaScience is a Data Note describing the National COVID-19 Chest Imaging Database (NCCID), a centralised database containing chest X-rays, Computed Tomography (CT) and MRI scans from patients across the UK. Utilising the UK National Health Service’s unique position as the world’s single largest integrated healthcare system, the benefits of collecting chest imaging data this large are extensive and already being used by doctors and the research community. The database is already supporting the development of Artificial Intelligence (AI)-powered image processing software and diagnostic products and models being used to predict COVID-19 mortality in the UK. And also has the potential to become a long-term resource for teaching radiologists. These efforts provide the potential to enable faster patient assessment in Accident and Emergency, save clinicians time, and increase the safety and consistency of care across the UK.

The NCCID training data is available to users anywhere in the world, including software developers, academics and clinicians, via a rigorous Data Access Request process that we tested out during our Open Peer Review process. To explain more the process of creating this resource and how to use it we one of our author Q&A’s with first author Dominic Cushnan. Dominic is Head of Artificial Intelligence (Imaging) for NHSX, the UK Government unit with responsibility for setting national policy and developing best practice for the National Health Service (NHS) technology and data, including data sharing and transparency.

Can you give us an impression of the size and scale of this database, and how much effort was it to collect this data?

The National Covid Chest Imaging Database (NCCID) is one of the largest datasets of its kind and it is continuously growing, with over 27 contributing hospitals and imaging data from over 10,000 patients in the NCCID training set, available upon request. The NCCID includes 3 types of medical images, including X-rays, MRIs and CT scans, from hospitals across England and Wales, with future integration of data from Scotland planned. This represents a country-wide effort from radiologists and IT teams in NHS sites, the British Society of Thoracic Imaging (BSTI), Royal Surrey County Hospital (RSCH) and the NHS AI Lab have collaborated at a time where all teams have little capacity for long-term projects.

What is the reuse potential for these COVID-19 Chest Imaging scans and datasets? What are the current applications people reanalysing them do, and what the potential in the future? Have you seen any positive examples and applications of re-use yet?

There is a lot of potential in using the NCCID for the study of Covid. We would like to share three examples of where NCCID COVID-19 Chest Imaging data has been used:

  • Cambridge University has developed an open-source AI tool, AIX-COV-NET, which supports the rapid diagnosing and triaging of patients with COVID-19 in the UK.
  • The NCCID has also played a role in the publication of another project called LUCAS, (developed by a consortium of universities of Brighton, Oxford, Glasgow, Lincoln and Sheffield) which is used to predict Covid patient mortality risk from patient variables (for example lymphocyte count, urea, CRP, age, sex). 
  • The NHS AI Lab itself has been building AI models to help us better understand the AI algorithm development and training process, giving us a better insight into how to collect and clean data so it is safe and appropriate to use, how to train algorithms and facilitate long-term research in AI and healthcare.

With the vaccination program deployed very quickly, hopefully, the UK is moving to a post-pandemic phase, so what utility does the National COVID-19 Chest Imaging Database have going forward?

There is significant value in both understanding and advancing our research and knowledge on COVID-19 and other diseases. AI technologies, such as the NHS NCCID database, present a mechanism in which to augment clinical identification and diagnosis of diseases using digital technologies. For example, an AI algorithm could be used to track how a disease has evolved or mutated through reviewing medical images and clinical data. The database could also be used by the UK research community, for investigating and learning how different organs, such as the heart, have been affected by COVID. The NCCID database facilitates the creation of AI-enabled diagnostic tools which can augment and accelerate the clinical diagnosis of diseases (such as COVID) and also support longitudinal studies on the effect of disease on patients and population management (such as PHOSP-COVID).

The paper outlines the instructions for requesting access to this data, and as an open peer review journal using named reviewers (see our previous post on how we review restricted data), we were able to get a peer reviewer to assess this procedure and scrutinise the data. How did you find this process?

This process certainly stood out from traditional journal or conference reviews as we had some specific needs due to NCCID being available exclusively upon request. As we are storing patients’ data, we need to make sure we only give access to the data to researchers who have a good and valuable reason. On top of the usual process where the paper was revised in a feedback loop with the reviewers, we also had reviewers looking into the storage method and the database’s structure and images to guarantee future readers that the database does store real and trustworthy information. With the increasing number of datasets out there, this flexibility was more than welcome as it allowed a sufficient level of scrutiny which should become a standard.

Further Reading:
Cushnan D, Bennett O, Berka R et al. An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis. Gigascience. 2021. doi:10.1093/gigascience/giab076