Diversity, Ancestry, and the Tenacious Concept of Race: GigaScience at GA4GH

GA4GH increases diversityDiversity, Ancestry, and the Tenacious Concept of Race
The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework. GigaScience are organisational members, and their meeting often coincides with the American Society of Genetics meeting that sometimes participate in this initiative (see a write up in GigaBlog). This year GA4GH 8th Plenary Meeting was a virtual conference with over 230 attendees from over 60 different countries that took place on 29th and 30th September 2020. GigaScience Data Scientist Chris Armit attended the talks and reports on why diversity of genomic data is such a hot and contentious topic. With diversity such a key topic at this years plenary, being a virtual meeting was a particularly useful way to increase voices and participation from across the globe.

Charles Rotimi of NIH, who is also the founder of the African Society of Human Genetics, delivered an exceptional opening keynote talk entitled “Diversity in Genome Science: A Scientific & Social Justice Imperative”. As Charles explained, when exploring diversity, “Africa is a good place to start”. “About 99% of human evolutionary experience, as a species, was spent in Africa before some migrated out to populate the rest of the world about 100,000 years ago. This common history is why humans all over the world share over 99% of our genetic inheritance”. Charles further explained the bottleneck effect that was a key feature of this Out-of-Africa migration, whereby a parent population with significant genetic variation remained in Africa. Charles emphasised that “Africa is the only place to study these variants” and that human genetics will gain immensely from the sampling of African populations.

In his keynote talk, Charles Rotimi highlighted the genomic diversity of African populations.

In a fascinating study of pathogenic and clinically relevant variation in Africa, Charles highlighted disease as a strong driver of genome-level population variation. To illustrate with one example, Sickle cell disease is an autosomal recessive disorder prevalent in African populations. The concept here is that of a genetic trade-off, as heterozygous carriers of the sickle cell trait have a protective advantage against malaria. This explains why this trait is common in malaria-endemic areas. As Charles explained there are other examples, with variation in the gene encoding the serum factor APOL1 being associated with kidney disease in African populations. APOL1 is a serum factor that lyses unicellular trypanosomes, and protects against the lethal form of African sleeping sickness. This would appear to be another genetic trade-off as the APOL1 variant SNP rs73885319 that prevails in Yoruba populations is associated with a high risk of kidney disease. Charles further explained the societal impact of variation in the APOL1 locus in the context of transplants, as an individual with two variant copies of the APOL1 gene will also have a high risk of failure as a kidney donor.

As an additional example of how genetics can help us understand African tropical disease, Charles offered a case study of podoconiosis. This neglected disease is a non-filarial elephantiasis that is not based on infection by a pathogen, but is rather a reaction to long-term barefoot exposure to red clay soil derived from volcanic rock. Podoconiosis has a high degree of heritability, and genome-wide associations (GWAS) between variants in HLA Class II loci suggest that this condition may be a T-cell mediated inflammatory disease. These are invaluable insights that will help us understand the disease susceptibility in African populations and also gene flow in African populations.

The Importance of Diversity in Genomic Data

This keynote talk was followed by a panel discussion on “The Importance of Diversity in Genomic Data”. Alice Popejoy (Stanford University) explained that there is “no standard definition of diversity measures”, and further explained the problematic concepts of Race, Ethnicity and Ancestry. Geneticists and Biomedical Scientists are primarily interested in ancestry, which is the biological inheritance of DNA and can be traced through the genome using genotype data. However, in the USA, patients are often invited to declare their racial affiliation, which is a socio-political classification mechanism and which, as Alice explains, is “often tied to status and power”. In European countries, patients are often invited to declare their ethnicity which, from a genomics and health perspective, is also a troublesome classification as it is, as Alice explains, “a cultural construct often linked to community, religion, language”. In both instances, ancestry is being sidelined.

In her panelist presentation, Alice Popejoy highlighted the need for standardised diversity measures based on ancestry.

This message was echoed by Consuelo Wilkins (Vanderbilt University) who expounded “race and ethnicity often serve as surrogates for other constructs that explain differences in health”, and Consuelo considered socioeconomic status, education, and access to health care are some of these constructs that are often associated with race. Indeed, as Consuelo explained “race is one of the most imprecise variables we use in human research”. There is a clear need to replace this outdated concept of race with the biological concepts of diversity and ancestry.

In her panelist presentation, Consuelo Wilkins explained how race and ethnicity are often used as surrogates for other constructs that explain differences in health.

Giorgio Sirugo (University of Pennsylvania) used an example of dizygotic (fraternal) twins to highlight the problematic concept of race. In a striking example of UK-born fraternal twins, one with dark skin and dark brown hair and the other with alabaster skin and reddish hair, Giorgio made the point that closely related individuals can look quite different. Consequently, if we tend to define race based on how people ‘look’, this too is problematic because there are clear examples of even very closely related individuals looking quite different with regard to having iconic “racial” features. It would be intriguing to know whether siblings with divergent phenotypes such as these show greater diversity in their genetic makeup, or whether other epigenetic factors are responsible for the observed variability in phenotypic penetrance.

In his panelist presentation, Giorgio Sirugo highlighted the phenotypic diversity that can be observed in siblings and even fratenal twins.

Giorgio further explored the transferability of genetic findings across diverse populations with reference to monogenic (single gene) diseases, oligogenic diseases (complex traits), and polygenic diseases (multiple genes), and highlighted that linkage disequilibrium (LD) – whereby common causative variants may be tagged by different SNPs in different populations – will have a major effect on transferring our understanding of polygenic diseases across diverse populations.

Continuing the subject of diversity, Laura Paglione (Spherical Cow Group) took a deep dive into the core objective of GA4GH in her keynote talk entitled “Building an Intentional Community for Standards Development” and explained that “inclusion must be a priority for all of us”. Laura offered an insightful example of the issues associated with a lack of inclusion by highlighting a case in the US where an imbalance of females to males in clinical trials led to misleading results. Laura cuts the phrase, “teams that look alike, arrive at solutions that look alike”, and Laura strongly advocates for inclusion and diversity in the genomic health community as a means of improving how genomic data are collected, managed and accessed.

In her keynote talk, Laura Paglione detailed the benefits of inclusion and the importance of building an intentional community in Genomics and Health.

Real World Implementation of GA4GH Tools and Standards

Day 2 of the Plenary Meeting explored real-world implementations of tools and standards. The Global Alliance team detailed their plan for Genomic and Health-Related Data Sharing. The core concept is that data are distributed across a federated network, and individual web services become registered as nodes. By using service registries, researchers can query multiple datasets simultaneously, ensuring fast and efficient federated query. Retrieval of datasets can then be performed via API or, alternatively, Cloud-based analysis with provenance of the various analytical steps. The latter is exceptionally important as it ensures that a researcher can check that their bioinformatic analyses were performed in accordance with their expectations.

Elixir Director Niklas Blomberg asked the question “How far away are we from a full implementation / demonstrator of the full research data access cycle?” In reply to this, Michael Baudis (University of Zurich) explained that we should not think of the proposed GA4GH data sharing mechanism as a complete platform, but rather as a suite of tools, and that a major factor in the success of this project is the degree to which researchers align themselves with the various tools developed by the GA4GH Global Alliance. On this note, the panel discussions on “Real-world implementation of tools and standards” were especially informative, with Shaikh Farhan Rashid (CanDIG) highlighting the need for a “visa model for authorisation” and Augusto Rendon (Genomics England) declaring that “our data is read only…you cannot download primary data”.

From the real-world implementations described by the various institutes in Europe, Canada, Brazil, Japan, and Australia, there was universal emphasis on the confidentiality of genomic and health-related data, with the various institutes further addressing the need for a mechanism by which researchers can access the genomic data that are required for their study. One of the main benefits of GA4GH is that it will enable researchers to use genomic data from other countries in their research. In her talk entitled “GEM Japan – Real World Implementation”, Takako Takai (Japan Agency for Medical Research and Development) provided what could be the mission statement of the GA4GH, namely that “we could bring a new research harmonisation that was hardly realised before”.

The GA4GH Work Stream Managers discuss how GA4GH standards can support a Global Learning Health System.

At GigaScience, we look forward to the fruits of the research that will be made possible by GA4GH, and as organisational members we’ll watch the development of tools and standards and try to implement them when we can.