Secure Genetic Data Moves into the Fast Lane of Discovery

Take a ride down chromosome highways with a novel web-based platform that allows sharing of private genetic data while maintaining privacy through a colourful dynamic visualization tool.

B1uGkn6CIAIzP2hThe Human Genome Project offered new hope that discovering the genetic determinants which mediate chronic disease susceptibility would lead to new avenues for drug development and targeted therapy, yet over decade later the is disappointment that very few gene screens have made it into the clinic to inform and improve treatment. Getting the most out of the explosion of human genetic data has been difficult due to the massive size of the data produced, as well as the challenges in balancing the need to protect patient privacy with the need to be able to independently inspect, replicate and build upon discoveries. GWATCH (Genome-Wide Association Tracks Chromosome Highway) is a new tool just published in GigaScience, that aims to address some of these issues. A new web-based platform that provides visualization tools for identifying disease-associated genetic markers from privacy-protected human data without risk to patient privacy. This dynamic online tool, developed by an international team of researchers from Russia, Australia, Canada, and the US, allows and facilitates disease gene discovery via automation and presentation of intuitive data visualization tools. Providing results in three dimensions via a scrolling (Guitar Hero-like) chromosome highway, this gives an extremely useful, visually appealing bird’s-eye view of positive disease-association results, while all sensitive information and raw data remain secure behind firewalls.

Identification of genes that underlie deadly complex diseases, such as heart disease, cancer and diabetes, and infections, including HIV-AIDS, papilloma virus, and hepatitis B and C, is extremely difficult, as it requires the availability of a huge amount of genetic information from large numbers of patients and healthy controls. The advent of cheaper and faster ways to sequence whole genomes — with there likely to be over 200,000 human genomes sequenced this year— has made producing this extensive amount of data effectively a non-issue; however, issues over patient security and data access extremely limit researchers’ use of these amazing resources. Identification of genes, replication of findings and independent validation from ‘potentially’ available data is made nearly impossible due to the necessarily complex and time consuming processes researchers need to go through to obtain access to protected data. Thus, only a very small percentage of data in protected databases is likely ever used. To take full advantage of these data to uncover ways to treat or prevent the ~20 million deaths per year worldwide of people suffering from the most common complex diseases, researchers need new, secure methods to access and share these data.

As always the peer reviews of the article are available from the pre-publication history. One of the peer reviewers of the article, Dr Lachlan Coin from the University of Queensland, made noted the importance of having such a tool, saying “The discovery of novel genetic variants associated with complex disease has necessitated the formation of large global research consortia to meta-analyse data from very large sample sizes. However, sharing of this data has always been problematic. GWATCH provides an innovative web-platform to facilitate sharing of summary data from GWAS, which will enable researchers to more quickly identify and validate disease-associated genetic variation.”

Sharing Disease Data
GWATCH allows investigators who were not involved in the original study to access disease-associated genetic variation results from GWAS (using whole genome sequence or SNP-arrays) rather than the raw data that can be used to identify individuals. GWATCH has a colourful and dynamic, user-friendly visualization tool that enables researchers to effectively ‘drive down chromosomes highways’ and easily see areas that associate with their disease of interest. Further they can zoom in for greater detail on variation patterns and see and compare different stages of disease (e.g., HIV infection, AIDS progression and treatment outcome). The authors developed and tested GWATCH using an often-requested huge NIH dataset of association data from more than 6,000 patients at risk for HIV-AIDS, which was used to discover CCR5-∆32, the most powerful and useful AIDS resistance gene discovered so far. GWATCH, however, can be used for any complex disease study by importing in that study’s association results.

The source code for GWATCH is freely available in Github, an archived snapshot of the code used in this paper is available from a DOI in our GigaDB repository, and access to on-going updated versions of GWATCH is freely available at their website:

Further Reading
1. Svitin et al. GWATCH: a web platform for automated gene association discovery analysis. GigaScience 2014, 3:18
2. Svitin A, Malov S, Cherkasov N, Geerts P, Rotkevich M, Dobrynin P, Shevchenko A, Guan L, Troyer J, Hendrickson-Lambert S, Hutcheson Dilks H, Oleksyk TK, Donfield S, Gomperts E, Jabs DA, Sezgin E, Van Natta M, Harrigan PR, Brumme ZL, O’Brien SJ. Software and supporting material for: GWATCH: a web platform for automated gene association discovery analysis (2014) GigaScience Database