DNA Day 2025: Where we are in the T2T Genomes Era, Pt 1

T2T genomes series

Today is International DNA Day, commemorating the day in 1953 when Crick, Watson, Wilkins and Franklin published their papers on the structure of DNA. Fifty years later, with the declaration that the human genome project was close to completion the US Senate and the House of Representatives declared that 25th April 2003 would be the first DNA Day, and ever since many (including ourselves through many blogs) have commemorated the date. While in 2003 a draft, partially completed genome was deemed sufficient to celebrate as “the remaining tiny gaps were considered too costly to fill.”, fast forward to 2022 when the Telomere-to-Telomere (T2T) consortium finally published the complete sequence of a human genome.

Regular readers will have seen the launch of our T2T Series last year, showcasing new T2T methods and genomic data sets. The series has grown rapidly, starting with some method papers but very quickly publishing lots of plant and animal genomes, particularly of species of agricultural interest. Really demonstrating the rapid progress of genomics science, alongside the embrace of complete genome assemblies when they can be achieved. This DNA Day also coinciding with the publication of the T2T Hong Kong Bauhinia genome (see blog). This explosion in supposed T2T genomes has led to scrutiny and criticism of what really is a complete genome, and whether T2T is an appropriate name for them. We have tried to address this issue by querying our series editors Jue Ruan and Fritz J Sedlazeck, alongside many of our Ed Board Members and other experts involved in the T2T consortium and T2T genomics. Our Ed Board Member Richard Durbin summarising the opinions of many that ” My view is that T2T is a triumph, and very nice, but not essential for outstanding genomic science.”

Founders of T2T Genomes

The T2T Consortium, which declared “Mission Accomplished” for the complete human reference genome in 2022.

We asked a lot of technical questions on the metrics, benchmarks, and thresholds we should require, and got huge amounts of insight in the process on what the experts thought on whether T2T is a suitable label, what the minimum requirements should be to truly be considered complete, and where genomics is likely going next now we’ve reached this supposedly completion point. We thought for a DNA Day treat we would start to share these comments in some blogs so this insight can be more widely shared, and also provide some transparency on where our thresholds and policies are being set.

We received masses feedback from many of the top experts so we thought we would follow our long running series of Q&As, and split this feedback up into a few different blogs. For part 1 we thought we’d start with a methodologist perspective combining the comments of our Ed Board Members Guojie Zhang (Zhejiang University), Jue Ruan (Agricultural Genomics Institute at Shenzhen) and T2T algorithmic expert Heng Li (Harvard Medical School). To learn more on best practices of benchmarking T2T genomes its great to start here with the authors of many of the best tools that are being used for this purpose (see compleasm and GCI).

 

The T2T Consortium paper on the complete sequence of a human genome was published 3 years ago. In that time “T2T” has become a buzzword for the complete genome references that have become increasingly ubiquitous not-just for humans, but other animal and plant species as well. Why do you think this move to sequence complete genomes has been so rapidly embraced, and do you feel “T2T” is a good label for the end products?

Heng Li: I think T2T is a good label if one can actually achieve that.

Guojie Zhang: The reasons are quite obviously the long reads sequencing advances and algorithmic breakthroughs. The standardized workflows producing the T2T genomes significantly reduce technical barriers. But I would like to stress another important reason to get T2T genome should be out from the biological insights that can be unlocked from T2T genomes. Such as centromere/epigenetic organization, complex regions like MHC, sex chromosomes like human Y, and retrotransposon proliferation, all of these have been highly benefited from T2T genomes.

Ruan Jue: T2T is accepted for most people by the meaning of complete genome assembly, so it is ok.

Guojie Zhang: Though T2T became a trend now and I fully appreciate the value of T2T genomes for every single species, I would like to point out that this should not be considered as the only standard or prerequisite of a genomic project. Scientific projects should prioritize question-driven design over assembly perfectionism. Many critical biological questions can be addressed effectively with a draft genome, rendering T2T necessary for numerous applications. Practically, T2T requirements impose prohibitive constraints: megabase-long DNA demands exclude small organisms and archival specimens, while sequencing/computational costs could divert resources from broader sampling or multi-omics integration. Mandating T2T risks delaying conservation genomics for endangered species and disadvantaging researchers in resource-limited settings, inadvertently stifling methodological innovation.

Is “T2T” adequate as a label?

While catchy, “T2T” oversimplifies by emphasizing termini over other critical features (e.g., centromeres). A more precise term like “Complete Genome Assembly” (CGA) better reflects the inclusion of all chromosomal regions, but “T2T” remains pragmatically useful for its recognizability. Hybrid phrasing (e.g., “T2T-CGA”) could balance accuracy and branding.

Ruan Jue. I agree,  T2T is accepted for most people by the meaning of complete genome assembly, so it is ok.

Over a decade ago the Assemblathon and Assemblathon 2 competitions highlighted the high degree of variability between genome assemblies, and helped kickstart more standardised ways to benchmark and assess reference genome quality. Do you think we are at a similar inflection point in genomics again, and what do you think are the minimal requirements necessary for a true T2T genome?

Guojie Zhang: Producing and evaluating a truly complete genome is undoubtedly challenging. All T2T genomes produced to date, including the human CHM13 genome, contain numerous problematic regions that require further refinement.

A T2T-CGA should represent a fully complete genome assembly. Compromising this definition by producing a semi-T2T genome is not meaningful. It is essential to recognize that a single standard should not be universally applied to all genome projects. Instead, quality metrics should be tailored based on the intended use of the genome, accommodating varying standards of quality. Again, T2T-CGA quality reflects technical capability, not scientific merit. Please don’t evaluate the quality of the study based on this metric.

Ruan Jue: T2T genomes aim to reveal the evolutionary rules of highly repetitive regions, and the function effect of them.

Heng Li: Assemblathon is outdated. Near T2T assemblies are supposed to have similar quality.

In a strict sense, even CHM13 is not T2T because the rDNAs are not assembled. I more often prefer “near T2T”. An assembly labeled with near T2T, in my opinion, should have all chromosomes in T2T scaffolds and some in T2T contigs. The number of gaps plus noticeable misassemblies should be below one per chromosome, I think.

How big a problem is closing gaps, and what do you find as the best way to handle these?

Heng Li: Gap patching is case dependent. If the assembly graph is contiguous, we can manually inspect and trace paths in the graph. We may align contigs around the gap and see if they have overlaps, or map ultra-long reads to see if some of them may go through the gap. These may lead to misassembly and need to be validated carefully.

Guojie Zhang This is a technical question I would leave it to Ruan Jue.

Ruan Jue.: Closing gaps is the biggest problem in T2T-assembly. Most tools depend on advances in sequencing technology, but few progress on seq alignment and variant calling of tandem repeats. So current T2T-genomes are lucky ones.

Can you recommend any specific metrics for T2T genome quality, and are there any specific benchmarking methods that you recommend should be used for this purpose?

Heng Li: Lower BUSCO and compleasm scores can be excused if there is an educated explanation. Nucfreq and GCI are important for inspecting misassemblies. I think these plots should be included as supplementary figures. I haven’t run the tools and don’t know what metrics to use.

Guojie Zhang: A true T2T-CGA should meet all of the following:

Telomere-to-telomere chromosomal spans
Zero gaps (N50 ≥ chromosome count)
Phased haplotypes with validated ploidy, particularly for sex chromosomes
Base error rate <0.01% (Q30)
Centromeric/telomeric repeats resolved

Jue Ruan: Base error rate <= e-6; structure (potential) error < 10 per chromosome, and structural errors might be more than base errors.

Now we have T2T-resolved human reference genomes (and a growing number of non-human ones); what is next for the field?

Guojie Zhang: Application of T2T-CGA in human can really boost personal genomic medicine. Every one should have their own T2T-CGA in future. And if producing T2T-CGA for every single species is too ambitious, having a representative species for each taxa (order/family/genus) will be a highly valuable goal for biodiversity genomics research.

Heng Li: Automatic T2T assembly.

 

Watch this space for future parts of this series of posts, and please check out the T2T series page to follow how these efforts progress:

https://academic.oup.com/gigascience/pages/t2t-series-closing-the-gaps-from-telomere-to-telomere

References:

Nurk S et al. The complete sequence of a human genome. Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987.

Bradnam KR, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.

Huang N, Li H. compleasm: a faster and more accurate reimplementation of BUSCO. Bioinformatics. 2023 Oct 3;39(10):btad595. doi: 10.1093/bioinformatics/btad595.

Chen Q, Yang C, Zhang G, Wu D. GCI: a continuity inspector for complete genome assembly. Bioinformatics. 2024 Nov 1;40(11):btae633. doi: 10.1093/bioinformatics/btae633.

Leave a Reply