Guest posting: Optical Mapping allows comprehensiveness and scalability that modern sequencing cannot provide

David C. SchwartzShedding light on what the Optical Mapping System can provide for genome analysis, here we present a guest posting from optical mapping pioneer and developer (and GigaScience Editorial Board Member), David C. Schwartz, who is a Professor of Chemistry and Genetics at the University of Wisconsin-Madison.


Taking the Google Maps approach: providing comprehensive, scalable worldviews

We use maps in our daily lives to get around town and to explore new places, and Google mapping software has almost perfected the ways we can do this. One appeal of Google maps is that you can seamlessly scale the resolution to suit the type of journey you’re plotting— use the street view for getting about town, perhaps to check buildings and store fronts, and check out the coarser, highway maps for travelling between states or countries; you can even view the entire planet. Amazingly enough, everything visible seems to be there, resulting in truly comprehensive maps.

Is it possible to have Google Maps for the genomes that we sequence and study?

Short read lengths hobble modern sequencing approaches, so that de novo assembly of reference genomes, especially for animals and plants, is difficult. Dispersed and tandem repeats plague our analysis of complex genomes. Moreover, comparative analysis that is comprehensive in a way akin to a “molecular karyotype”, where sequence information spans from telomere-to-telomere at the resolution of a single nucleotide, is nearly impossible. Outside of a handful of complex genomes, we lack reference genomes that are actually finished to high standards (such as what we enjoy for the human) that can support comprehensive comparisons amongst populations.

What are we missing?

Comprehensive structural information. Aside from single nucleotide variants, small indels and copy number alterations (in lieu of structural information), it is rare during the course of most comparative studies that we completely reconstruct entire genomes from only sequence data. Consequently, the genomes analyzed in most comparative studies are incompletely known entities. Although such lack of comprehensiveness is regrettable for germline studies, the complete characterizations of somatic alterations, as found in cancers, remain even more elusive. With solid tumors presenting constellations of complex genotypes this problem becomes even more intractable. We now know from ENCODE and various other studies that regulatory functionalities are harbored within large complex, and extensive non-coding, regions of genomes, and these are also hot-spots for genomic rearrangements.

Optical Mapping enables long-range information—scalable genome analysis

Very long DNA molecules (200–2,000 kb) intrinsically yield long-range information, which captures and elucidates “dark matter,” or repeat-saturated regions within complex genomes. Such molecules exist in free solution as random coils and require fluidic manipulation to stretch them out for analysis either by direct imaging (fluorescence microscopy), or electronic detection. The Optical Mapping System harnesses these virtues and constructs de novo genome-wide restriction maps from the analysis of large datasets comprising very long, genomic DNA molecules, each one bearing a restriction map. Using map assembly algorithms and pipelines, similar to those used for sequencing, genomes are assembled, and/or analyzed for discovering structural variants.  Thus, entire chromosomes are covered, with few gaps from telomere-to-telomere. Scalable, comprehensive analysis emerges when genome-wide optical maps are combined with sequence data, in ways that begin to approach what Google Maps now offers.

As a pioneer of optical mapping – can you give us a little insight into the history of this approach?

As an undergraduate, and later graduate student, I invented Pulsed Field Gel Electrophoresis (PFGE)–I then wanted to know exactly how it worked. At a Cold Spring Harbor Quantitative Biology Meeting (1982), Yanagida (Kyoto) showed a movie of single DNA molecules visualized by fluorescence microscopy. I decided that I would use this approach to visualize DNA gel electrophoresis (1985; Carnegie Institution of Washington; Dept. of Embryology). A few years later I did, but discovered that this analytical approach—imaging and manipulating single DNA molecules—was far more powerful than PFGE. Later, the human genome initiative was just starting and I thought that I could create single molecule restriction maps from single molecules as a way to map the entire human genome. 

What triggered you to create this technology?

Back in 1987, I wanted to build human and mouse artificial chromosomes and thought I would first have to understand the structure of mammalian centromeres, which are repeat-ridden structures.  To understand these structures, I tried to develop a single molecule approach that would allow me to stretch individual molecules out for restriction digestion. I would then image these products through fluorescence microscopy.

In what ways has optical mapping proved useful for biology?

Two main contributions:

  1. As an independent means to validate and finish reference genomes
  2. Discovery and characterization of structural variants that escape other means of detection

What future applications do you envision OPTICAL MAPPING will be especially useful?

As throughput continues to increase, we will see OM approaches being used for large-scale population studies. We will also see OM used in the clinic, as a molecular approach that may supplant karyotyping, CNV chips, FISH, and possibly some applications of sequencing.  With this said, at some point OM and sequencing will likely merge into single platform.  A major driving force here might be the simplification of computational analysis, facilitated by analysis of very large molecules, required to quickly render confident diagnoses for individuals.

Are there specific biological fields, disease studies, and/or biological processes that you feel would particularly benefit from having OPTICAL MAPPING as a part of a researcher’s standard ‘tool box’ of technologies?

Here’s a quick list:

  • Chromatin—epigenetic studies might benefit from OM-type approaches.
  • Cancer genomes, particularly those found in solid tumors, suffer from very partial knowledge of their genomic make-up.
  • Metagenomics—although sequencing can remarkably parse out many organisms and genic variations, we do not get a good sense of the structure of these collective genomes. Here, OM can power large-scale assembly of such microbial communities.
  • Single cell genomics—OM is a single molecule approach, which is ideally suited for single-cell studies.

So optical mapping has been around for over a decade; however, there only seems to be a growing interest in using this method over the last few years– why do you think this is so?

Actually, OM has been around since 1993. I think we are now in the post-genic era—not post genomic. As such, given all of the sequencing data we have recently accumulated, a large community is now empowered to consider genome structure and the vast non-genic portions that mark mammalian and plant genomes. Unfortunately, sequencing and copy number analysis can only partially elucidate this genomic space that is now apparent to many us in our daily research activities, and this is why I think many are now looking to OM for answers.

Is there any area that you have been thinking it would be cool to use optical mapping technology for that is still in a formative stage?

I think that OM will probably become “EM”, or Electronic Mapping at some point. New electronic modalities, aside from blockade-type measurement schemes, are appearing that would obviate imaging.  Also, I think OM modalities could be repurposed for constructing synthetic genomes, or crafting novel types of materials comprising very long polymer molecules that are then “composited” at the single molecule level.

“Optical mapping: new applications, advances and challenges” is a new thematic series from GigaScience and is guest 2014-06_gigascience-CFP-skyscraper_v2edited by David C. Schwartz. This cutting-edge series aims to shed light on new advances, applications, and challenges, and to improve data sharing and reproducibility in research utilizing this relatively new approach, aiding the development of new tools and standards. We encourage the submission of Research Articles and Technical Notes, as well as Data Notes, which are papers that focus on the description of interesting datasets, curated and hosted in our database, GigaDB. See the recent Data Note of the Budgerigar genome and accompanying optical mapping data in GigaDB as an example. We also consider thought provoking Commentary and Reviews in this area.

Be sure to take advantage of this year’s waiver of the open-access article-processing charges, thanks to generous support from BGI, for the series, as well as all submissions until January 2015; with savings of up to £1,250 GBP.  Papers submitted prior to Sept 15 may be selected to be included in a highlight collection of papers at the January 2015 Plant and Animal Genome Conference.

References

  1. Teague B, Waterman MS, Goldstein S, Potamousis K, Zhou S, Reslewic S, Sarkar D, Valouev A, Churas C, Kidd JM, Kohn S, Runnhein R, Lamers C, Forrest D, Newton MA, Eichler EE, Kent-First M, Surti U, Livny M, Schwartz DC: High-resolution human genome structure by single molecule analysis. Proc. Natl. Acad. Sci. USA 2010 107: 10848-10853.
  2. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S,Liang C,Zhang J,Fulton L,Graves TA, Minx P,Reily AD,Courtney L,Kruchowski SS,Tomlinson C, Strong C, Delehaunty K, Fronick C,Courtney B,Rock SM,Belter E, Du F,Kim K,Abbott RM,Cotton M,Levy A,Marchetto P, Ochoa K, Jackson SM, Gillam B et al.: The B73 maize genome: Complexity, diversity, and Dynamics. Science 2009, 326: 1112-1115.
  3. Jo K, Dhingra DM, Odijk T, de Pablo JJ, Graham MD, Runnheim R, Forrest D, Schwartz, D.C: A single-molecule barcoding system using nanoslits for DNA analysis. Proc. Nat’l. Acad. Sci. USA 2007,104: 2673-2678.
  4. Valouev A, Schwartz D, Zhou S, Waterman MS: An algorithm for assembly of ordered restriction maps from single DNA molecules. Proc. Natl. Acad. Sci. USA 2006, 103: 15770-15775.
  5. Dimalanta ET, Lim A, Runnheim R, Lamers C, Churas C, Forrest DK, de Pablo JJ, Graham MD, Coppersmith SN, Schwartz DC: A microfluidic system for large DNA molecule arrays. Anal. Chem. 2004, 76: 5293-5301.
  6. Ganapathy G, et al.: High-coverage sequencing and annotated assemblies of the budgerigar genome. GigaScience. 2014, 3:11. https://doi.org/10.1186%2F2047-217X-3-11