Updates from sequencing the tree of life. Biodiversity Genomics 2020

“Extinction is forever – so our action must be immediate.”

– Sir David Attenborough, Sept 30th 2020

Biodiversity Genomics 2020 aimed to bring together researchers across the world to celebrate the global achievement in genome sequencing in an effort to “sequence life for the future of life”. This was a virtual conference that took place on 5th-9th September 2020 and the GigaScience team participated and attended the talks. Reporting back on how genomics are helping us understand biodiversity, and how whole genome sequencing data is providing us with the necessary toolkit to enable conservation of endangered species. We’ve previously attended previous incarnations of this conference like the G10K and VGP meetings, but going virtual meant that this year 3,000+ registrants were able to attend, logging in from 89 countries on 18,267 devices. Those present got to see an unprecedented gathering of all of the exports exploring all of the corners of the tree of life, both from a taxonomic perspective (e.g. Bat1K, Bird10K, 10KP plants, Global Invertebrate Genomics Alliance, having strong showings and in some cases tracks) and geographic (with sessions representing projects representing species from various continents). And while not meeting people in person it was great to see so many of our Editorial Board Members, authors, reviewers and also published projects (e.g. the amazing giant squid genome presented by Rute da Fonseca).

Jose Victor Lopez, coordinator for Biodiversity Genomics 2020, introduced this year’s conference.

Earth BioGenome Project
Starting from an overview of the whole tree, plenary speaker Harris Lewin (UC Davis) presented on the Herculean task that is the Earth BioGenome Project. Born out of the G10K consortium, this project aims to sequence all eukaryotic life on the planet. So how does one go about a task of such magnitude? As Harris explains, “the Earth BioGenome Project is a confederated international network-of-networks that have the common goal of sequencing and annotating the genomes of all 1.5 million known species of eukaryotes in 10 years”. Harris provided an overview of the accomplishments to date, which are highly impressive. Phase I, whereby an annotated reference genome for one representative of each taxonomic family of eukaryotes – what amount to approximately 9,300 species – is now mostly complete. Phase II, which aims to provide a reference genome for one representative species per genus, has now commenced. Phase II is, of itself, a vast project as it aims to generate reference genomes for 180,000 species. This will be followed by Phase III, which aims to provide a reference genome for one representative of each of the approximately 1.32 million species on the planet. Harris sees the long-term goal of this initiative as the ability to “preserve and restore entire ecosystems”.

Do not adjust your set. Not a Hawkwind live stream, but conference chair Mark Blaxter announcing Harris Lewin.

Genomics and Food Security: An African Perspective
The individual networks that Harris refers to are National and Regional projects from Africa, Asia, the Americas, and Europe. These were all represented at this year’s conference. The African consortia focused on genomics and food security, with ThankGod Ebeneezer of EMBL-EBI presenting on behalf of DAISEA (Digital Information in Africa for a Sustainable Agri-Environment). The overarching goal of this project is to sequence the genomics of endemic organisms – both crops and livestock – in Africa’s agri-environments. In a panel discussion, Simplice Nouala, who is Head of the Agricultural and Food Security Division at the African Union Commission highlighted that the sequencing effort will lead to “sustainable production” and will improve the nutritional quality of milk and crops in the diverse biomes of the African continent, and will additionally lead to a marked improvement in disease resistance in food crops (see the African Orphan Crops Consortium that we’ve previously covered) and livestock.

Biodiversity in the Americas
The American consortia reported on biodiversity hotspots, with a presentation by Guilherme Oliveira (Instituto Tecnológico Vale) on the Carajás National Forest in the Amazon deforestation arc of Brazil being of special interest. Guilherme showcased how genomic sequencing has allowed researchers to identify connected populations of invertebrates in different cave systems in this region, and this highlights that species can migrate from one cave to another. An additional benefit of genomic sequencing is the ability to follow gene flow between the migrating populations. This is especially helpful for conservation genomics, as it enables researchers to identify wildlife corridors where species populations are connected. Furthermore, a deep understanding of gene flow can help inform decisions made by rewilding initiatives, whereby conservationists restore core wilderness areas that have been ecologically damaged.

In addition, I was intrigued by the presentation by Rodrigo Gutierrez (Center for Genome Regulation, Chile) entitled “Phylogenomics and Systems Biology approaches reveal conserved adaptive processes in Atacama Desert plants”. As Rodrigo explained, the Atacama Desert is “one of the oldest and driest deserts on the planet” and survival is tough in this part of the world. Rodrigo pointed out that there are 22 stations in the Atacama that have been monitored for the last 10 years, and this has enabled environmental parameters to also be captured in a variety of Atacama biomes including high-altitude steppe (grasslands), the dry puna, which is enriched in shrubs, and the extremely dry lower-altitude pre-puna. So what features of Atacama species allow them to survive? Rodrigo explains that “phylogenomic reconstruction and positive selection analysis uncovers key genes for plant survival”. Functional categories that were enriched in the genomic analysis of Atacama plant species included chaperone proteins, which are known to stabilise newly synthesised polypeptide chains in stress conditions and therefore contribute to survival.

Furthermore, the presentation by Brad Shaffer (UCLA) on “The California Conservation Genomics Project (CCGP)” highlighted this important contribution to our understanding of biodiversity. Thus far, the CCGP has generated 125 reference genomes of various Californian species, and the consortium intends to deliver another hundred reference genomes over the following years. I was extremely interested to hear that the CCGP has also resequenced 18,750 genomes of plants and animals across the State of California. The chromosome-level reference genomes serve as the genetic foundations that enable CCGP researchers to assemble and analyse the resequenced genomes. From a conservation genomics perspective, this is an incredibly powerful resource as it allows researchers to explore genomic variation in multiple species, and for each species to understand genome-wide associations with the various Californian biomes and microclimates. In addition, by studying gene flow in this large cohort of wildlife species, there may even be the possibility of discovering hitherto unknown wildlife corridors connecting species populations and aiding in their survival.

Genomics of the Avian Vocal Learning System
Plenary speaker Erich Jarvis (Rockefeller University) delivered a fascinating talk on spoken language, which he explains is a complex trait found in 5 groups of mammals – humans, dolphins/whales, bats, elephants, seals – and 3 groups of birds – parrots, songbirds, and hummingbirds. Erich explained that these mammals and birds can imitate learned vocalisations, and highlighted the fabulous case of ‘Disco’ the parakeet who has learned to produce up to 400 words and can recombine them.

Erich wondered, “What is the genetics of this trait?” and with our Editorial Board Members Guojie Zhang and Tom Gilbert (University of Copenhagen), formed the Avian Phylogenomics Project, which was able to address this question. The Avian Phylogenomics Project consortium sequenced genomes of 48 representative species of birds, including vocal learners, and was able to identify the closest non-vocal learner relatives of these species. Its a project we know very well, helping to coordinate the data release in GigaDB and publishing a number of companion papers. Interestingly, Erich explained that prior to the generation of this genome-scale phylogenetic tree (the data of which was published as a Data Note in GigaScience), scientists believed vocal learning birds evolved independently, but some doubted this based on Neuroscience findings. There are observable neuroanatomical differences in the vocal learning system of birds, and so the possibility of an avian common ancestor with vocal learning capacity was a topic of controversy in the scientific community. However, the new genome-scale phylogenetic tree generated by the Avian Phylogenomics Project consortium and published by Erich and colleagues highlighted that this inference was incorrect, and that vocal learners such as hummingbirds were phylogenetically very far apart from parrots and songbirds. The genomic data more strongly supports a model of convergent evolution in which the vocal learning system in birds has evolved independently on multiple occasions.

Essential Genes and the Strange Case of the Missing paired Gene in the Mosquito
One of the major insights to come out of a genomic understanding of biodiversity is the core set of essential genes that are conserved across various clades. To illustrate with one example, the paired gene is a pair-rule gene that is important in the development of segmentation in insects, such as the fruit fly Drosophila. One of the key roles of paired is to ensure a striped pattern of the segment-polarity gene engrailed, which further subdivide each segment into anterior and posterior regions. paired is an essential gene for segment patterning in diptera (flies) such as Drosophila melanogaster and the beetle Tribolium castaneum, and I had thought of this as a highly conserved gene that is present in all insects. Consequently, on the session entitled “Functional Genomics” I was thoroughly surprised when I heard Alys Cheatle Jarvela (University of Maryland) explain that the paired gene is missing from the mosquito Anopheles stephensi. By gene expression analysis, Alys was able to infer that a transcription factor gene called gooseberry, which can also upregulate engrailed, evolved a pair-rule pattern early in mosquito evolution, and consequently paired became redundant in this lineage and was lost. This study highlights the limitation of inferring gene regulatory networks from model organisms, such as Drosophila melanogaster. Furthermore, this study highlights the need for a deeper understanding of the cell, molecular, and developmental biology of non-model organisms relevant to human disease, such as the mosquito Anopheles stephensi, which is a known vector in the transmission of malaria.

The end of the workflow
On top of the geographic and taxonomic tracks, there were also sessions on methods (sequencing, assembly, functional genomics), Diversity and Inclusion, ELSI, Funding, and Annotation and Databases. We participated in the Annotation Database track, in the last talk in the last session before the closing plenary, with Scott presenting on our new publishing workflow for rapid dissemination of genomes using GigaByte and GigaDB (see slides).

Chaired by our Ed Board Member Paul Flicek, it felt an appropriate slot to talk about the final step in all of these genome projects: dissemination. The publication process not keeping pace with all of the other steps in the production of global-scale genome projects, and our new sister GigaByte journal aiming to streamline and help the scaling of this process. Our talk showcasing the first genome published using this new process, the genome of the Eastern Banjo frog that was one of the first 101 genomes announced by the Genome10K a decade ago after their very first meeting. Demonstrating the incentives and value gained in packaging it up as a short Data Release article (see the author video).

If you have genomes from the many corners of the tree of life that need disseminating in a similar way, please get in touch as article processing charges are free until the 28th February 2021.