Plant and Animal Genomes (PAG) Comes Storming Back!

PAG30 flyer

After 2 cancellations due to the pandemic, the Plant and Animal Genomes conference returned to its in-person format in January with its 30th edition. A number of the talks focused on the effects of climate change either on biodiversity or crop development. It seems only fitting then, that the weather provided a suitable demonstration with California experiencing one of the longest periods of sustained rainfall in many years. The river running just behind the Town and Country conference centre burst its banks and flooded the surrounding area, including some of the lower level rooms at the newly refurbished hotel! Luckily it did not reach the conference center itself so the Plant and Animal Genomes meeting continued as planned.

Town and Country hotel car park flooded by storm during Plant and Animal Genome conference.
Town and Country hotel car park flooded by storm during PAG30.

Over 3,200 participants from 66 different countries were in attendance, which resulted in a vibrant and exciting conference. Presentations spanned the entire taxonomic scope of life on earth, including some extinct species! A total of 2,351 presentations (including 7 plenary lectures) in over 200 sessions, with as many as 15 concurrent sessions at any one time. As always at PAG, one of the hardest things to do is decide which sessions to attend. Inevitably there are many great talks that one has to miss due to overlaps, so this blog only covers a small fraction of what there was, there were many more things that I was unable to see.

Day 1

I started my 4th PAG experience on Friday at the Vertebrate Genomes Project (VGP) community workshop hosted by Erich Jarvis (Rockefeller University). It featured talks from Erich himself, Guilio Formenti (Rockefeller University), Benedict Paten (UC Santa Cruz), Fergal Martin (EMBL-EBI), Camila Mazzoni (BeGenDiv), Kerstin Howe (Wellcome Sanger- talking about genome curation), Ntanga Mapholi (University of South Africa), Max Kaller (Stanford University), Eriona Hysolli (Harvard Medical School). The VGP is a sub-project of the Earth Biogenomes Project (EBP), and a collaboration of a number of slightly smaller projects including but not limited to; Darwin Tree of Life (focused on UK species); European Reference Genome Atlas (ERGA – focused on European species) and African Biogenome Project (ABP-focused on African species), all of which provided updates to their projects in the session. Those projects all aim to cover a broader taxonomic scope determined by the geographic location rather than the taxonomy, but all do include vertebrate species.

Also on Friday was the AgBioData workshop, a collaborative group of researchers all dealing with sharing of agricultural data as a common thread. Some of the working groups’ recommendations after surveying the 32 collaborating databases were that they need to:

  • Identify solutions to funding issues for data sharing
  • Data sharing training for database personnel
  • Stakeholder education on the benefits of data sharing
  • Focus on improvements to phenotypic data sharing
  • Continue working on relevant data standards

Day 2

One day down, 4 and a half more to go! Day 2 (Saturday) kicked off at 8am with a bamboozling choice of 15 different workshops to choose from. With 10 more sessions in the second time slot, then lunch, followed by 13 more concurrent workshops, then another 9. Just when you thought the day was done at 6 pm, BAM! one more 2 hour session with a choice of another dozen options! As you may be able to tell, planning is key when attending PAG, you need to know what you want to see and when so that you don’t spend the entire day looking at the conference app trying to work out where to go next.

My pre-selected route through the day consisted of Bioenergy Grass Biology; Microbiome and Plant Health; Single-Cell Genomics; Tripal Database Network; and The UCSC genome browser sessions. All excellent choices if I may say so. One highlight from the day for me was Sharifa Crandall (Penn State) talking about “Full Steam ahead- understanding microbiome recovery after steam disinfection”. Steam disinfection is routinely being used in nursery settings to sterilise the soil before planting, but what effects does it have on the microbiome development? Currently, there is no baseline metagenomics data available on steamed soil microbiome. Unfortunately, this study didn’t fully address that, as they only performed amplicon microbiome sequence analysis, but noted that the full metagenomics sequencing is required, fingers crossed they carry out that study and publish it in our functional metagenomics series.

Day 3

The conference eventually “opened” on the Sunday with the opening reception in the evening in the exhibition hall with drinks, nibbles, posters and the exhibitors. But before we even got to that, there was another full day with a packed agenda to navigate. 

Honey bees are not endemic to the American continent. Over the last two hundred years a relatively small number of seed colonies have been brought in to provide basis of the honey provision in the continent.  

In Canada, honey bees have an estimated annual value of 5.5b dollars, but last year alone there was a 50% loss of colonies over winter in Ontario, ~half of those losses are from unknown causes. With funding from the Canadian Genome Project, Amro Zayed (York Uni, Toronto) presented a project using Biomarkers of stress in honeybees. By comparing the responses to induced stress in controlled experiments on drone bees to identify the causes of those unknowns. There is evidence that there is a distinct pattern in response determined by the type of stress. Further studies are required to find more and better biomarkers to fine-tune the tool for commercial use for early identification, which it is hoped could lead to appropriate interventions to save colonies in the future.

A Beekeeper in protective equipment tending to bees in an apiary. Author: U.S. Department of Agriculture.  License: Public Domain
A Beekeeper in protective equipment tending to bees in an apiary at Wynn Farm in North Salem, Aug. 2, 2021. (NRCS photo by Brandon O’Connor. License: Public Domain)

Also on the Sunday afternoon, there was a hands-on workshop all about the new JBrowse2 genome browser. They introduced a plethora of new features as well as the ability for users to write their own plug-ins. I rounded out the day in the Big Data session, dramatically sub-titled “Manage your data before your data kills you”! Where Fiona McCarthy (University of Arizona) gave an impassioned presentation outlining eight things not to do (when dealing with Big Data). Anyone who deals with data on a daily basis will be familiar with most of these, but I’ll summarise them here anyway:

  1. Treat your data unequally. Why do some researchers think that their own Data Management Plan (DMP) only applies to certain parts of their project? For example; sequence data yes, but the analysis results, no. Data is data, and it all deserves to be treated equally.
  2. The “file.unicorn” problem. Using the incorrect or inappropriate file formats and/or file extensions causes everyone (including your future self) a headache when trying to decipher them.
  3. Think that context is less important than data – Actually, most data is pretty useless without its metadata, so include all the relevant metadata WITH the data. This can often be seen in the SRA with sequence data completely devoid of sample metadata!
  4. Short-term hires for long-term solutions. Don’t get the grad student to deal with the data that will outlive them, when they’re gone you’ll have no idea where the data is.
  5. One more thing before I finish… Don’t put off the data submission until the very end! No one is going to steal it, but if you’re worried then just set an embargo on the release.
  6. Its complicated. Well try using file names with a suitable convention that is EASY to understand and meaningful, and include a README file so that your future self can remember the convention.
  7. It’s not the destination, it’s the journey. The “how” you arrived at your results is important, so include software versions, parameters and all the input filenames in your methods.
  8. Keep your Data Management Plan (DMP) a secret. The PI writes the DMP, but no one else in the team ever sees it, so the grad student doing the data management doesn’t follow the plan because they don’t know it!

Day 4

For me, the highlight of the Monday talks was saved until the end, with the Genomics and Biodiversity session hosted by Parwinder Kaur (University of Western Australia). It started off with a whirlwind presentation of the entire history of HiC and its applications for understanding chromosome structures across the entire tree of life, I think Erez Lieberman-Aiden (Baylor College) managed to successfully cram two hours’ worth of lectures into 20 minutes! Starting with “Noahs Ark a Critique” which was apt given the levels of rainfall seen in San Diego recently, right through to the use of Hi-C in the DNA Zoo project where they are creating reference genome assemblies for as many species as possible as quickly as possible.

The pace barely slowed for the rest of the session with 5 more great talks covering; GeneBanks and the cost of sequencing thousands of lines of a single species; ancient DNA assembly, how HiC improved the Mammoth genome to a chromosomal scaffold; the differences between Black and White swan genomes; finding symbiont genomes within single species datasets; examining the genomes in largest Genus of trees; and ending with the session chair talking about the West Australian genome atlas. The aim of which is to sequence every native species dead or alive.

Day 5

Two plenary speakers started the day on Tuesday; Ian Godwin (University of Queensland) “New breeding technologies to deliver better sorghums: What could possibly go wrong?”, and Viviane Slon (Tel Aviv University) “Digging for DNA”. Ians’ opening gambit was you can’t have a good title without a colon, which set the tone of his talk with a great deal of humour. Unfortunately, this did mean I slightly lost the details of the content, but the general idea was that they can now produce sorghums with more protein content or more oil, or larger grain sizes etc, but that field trials of the product as a feed led to some unexpected results. 

Dr Viviane Slon gave a fascinating talk showing how recent advances in sequencing technology have enabled the ability to sequence ancient hominid DNA from sediment, even in places where there is no visual evidence of hominid remains. For example, a typical excavation of a prehistoric site will generally have lots of tools and animal bones, but rarely do the hominid remains appear in those digs, but the sediment in those digs might hold the key… maybe you can identify hominid DNA from the sediment? Using extreme precautions to avoid modern-day human DNA contamination they took sediment samples from archeological sites known to contain hominid remains to see if they could find evidence of them in the sediment. Using an amplicon approach looking for mtDNA worked, but how to be sure it was ancient and not recent DNA? The observation that ancient DNA has changes of cytosine to uracil at the ends of fragments provided the means to distinguish between ancient and modern DNA sequences.

The application of this technique could massively increase the knowledge of both hominids and other ancient species if applied to a wide range of sampling sites, watch this space!

The Bioinformatics session featured several GigaScience Press authors including the session chair Aleksey Zimin (Johns Hopkins University) and speakers Benedict Paten and Fritz Sedlazeck (Baylor College), as well as Katie Jenike ( of the Schatz lab). Katie presented the tool PANAGRAM , which is an interactive, alignment-free pangenome browser, i.e. it can visualise the alignments across multiple closely related genomes both at the global scale and in detail at the gene level. Everything is pre-computed so the end product is very fast and can run on a small local desktop or laptop. Fasta plus annotation files are used as input. They are processed to identify Kmer mappings which can then be quickly mapped onto the genome in the browser using a tabix index.

Day 6

The final day of PAG30 was opened by another friend of GigaScience, Oliver Ryder (San Diego Zoo Institute for Conservation Research). His plenary entitled “Emerging options for conserving biodiversity” highlighted a number of success stories in conservation that have already benefited from genome sequencing efforts. It also brings hope to other efforts by shedding light on the presence of genetic diversity in remaining populations or even in cryobanks such as the FrozenZoo. 

Many of the topics touched on during Dr Ryders’ keynote were followed up in the session Wildlife Genomics later that day, including the work presented by Marisa Korody (San Diego Zoo Wildlife Alliance) on the efforts to bring the Northern White Rhino back from the brink of extinction. 

The International Rice Informatics Consortium (IRIC) was the last session I attended. It was chaired by another friend of GigaScience Ken McNally (IRIC). Speakers brought the audience up to speed with the tools and data being made available, including a tantalising glimpse of the soon to be released version 2 of the Rice SNPSeek platform which already includes Rice Galaxy, and will also have JBrowse2 integration among other features.

The final day as always ended with a chance for everyone to let their hair down and party the night away on the dancefloor, see for example the tweet by fellow PAG goer @bluesherpa. It never disappoints seeing normally reserved senior scientists showing off their dance moves from yesteryear along with the younger rising stars of the field.