OPTIMising Genome Assembly
This month brings new additions to our exciting and on-going Optical Mapping series. Outside of a handful of key genomes, due to deficiencies in the short sequencing read lengths that have backed genome assembly, we lack reference genomes that are finished to high standards that can support comprehensive analyses. The rise of Optical Mapping (OM) has therefore been timely to promote with the nine papers published to date. The latest being from Nagarajan and Hillmer groups at A*STAR, who have just published OPTIMA, a new open source genome alignment method that is the first to be able to create indexes for continuous-valued mapping data while accounting for mapping errors, as well as a accompanying Data Note comprising of related validation data. On top of aiding reproducibility, the supporting data in our GigaDB repository from HapMap and colorectal cancer cell lines aligned to the human reference by OPTIMA, provides an excellent and high quality resource for genome structure analyses.
In the 18 months since the series was announced, and 12 months since the series page launched, the uptake of this technology has progressed in leaps and bounds. With few publications using the technique beyond assembling microbial genomes when this launched, there has been a huge increase in the number of groups and published studies now using these and related methods to produce more accurate and “finished” larger and more complicated genomes. Our Editor in Chief Laurie attended the Plant and Animal Genomes (PAG) conference in San Diego earlier this month and in 2015 to promote the series, and the amount of work showcasing these technologies has exploded. For those wanting to know more and dip their toes into optical mapping, many of the papers in our series have provided handy starter guides on how to use it in fields of plant comparative genomics or vertebrate genomes, the state-of-play of computational methods utilizing it, and insight into image processing techniques to process these massive datasets. Combing data and tools, a pipeline showcasing optical mapping approaches on a novel yeast (Dekkera bruxellensis) genome has also been published alongside its data.
Even before the series launched we showcased a number of BGI Optical Mapping datasets in our GigaDB repository. The Avian phylogenomic project has been a bit of a testbed for new sequencing approaches, and from this Data Notes have been published for the Ostrich and Budgerigar genomes. With there not being an established OM repository equivalent to the INSDC databases for sequencing data, these datasets are already picking up a lot of use, validating the utility of GigaDB. We have seen requests from the researcher to add additional intermediate and raw datasets (see this plum data for example), and have worked with data producers to make these available.
Nice Genomes Finish Last Later
On top of the OpGen Argus Optical Mapping System, being technology independent we have also showcased publications and data from the bionano Irys platform. On top of mapping technologies, the interest in finishing genomes has also been boosted by the rapid development in single-molecule and nanopore sequencing technologies with very long reads. We have also published five papers with Oxford nanopore data to date (the latest just out), including the LINKS method that makes use of the sequence properties of nanopore and other error-containing sequence data, to scaffold high-quality genome assemblies without the need for read alignment or base correction. Hybrid genomes combining cheap short reads with only modest amounts of long read data, offers the possibility of ultra-cheap, easy to assemble, high quality finished genomes. The Bacteroides fragilis genome we recently published assembled on commodity computing hardware by undergraduate students demonstrates the democratizing potential of these technologies. PacBio has demonstrated particular growth in recent times, and new Data Notes for Phytophthora, Chinese sage, and the Mitten crab all showcase this data. We have much more in review and in-press, so look out for optical mapping, long reads, and more “finished” quality genomes increasingly becoming the norm.
We are grateful and extend a big thank you to our Editorial Board member and series Editor – Optical Mapping pioneer Prof David C. Schwartz for his help in making the series such a success.
1: Verzotto D, M Teo AS, Hillmer AM, Nagarajan N. OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis. Gigascience. 2016 5:2. doi:10.1186/s13742-016-0110-0.
2. Teo AS, Verzotto D, Yao F, Nagarajan N, Hillmer AM. Single-molecule optical genome mapping of a human HapMap and a colorectal cancer cell line. Gigascience. 2015 4:65. doi:10.1186/s13742-015-0106-1.
See the series page for the current content, and follow it as future papers being added: