A paper published in Nature Biotechnology today reveals the most comprehensive catalogue of genes in any single microbiome to date. While the roughly 20,000 genes in the human genome have been available for over a decade, the gene catalogue of the microbiome, our much larger “other genome” has to date been much more poorly understood and characterized. The team, including multiple authors from our host institution BGI, reveal a staggering 9.8 million genes in the collective non-redundant microbiome of ~1250 human gut microbiomes sampled worldwide to date.
The researchers combined metagenomic sequences from several previous large studies (MetaHIT, HMP and a type 2-Diabetes study from BGI) together with nearly 250 newly sequenced human gut samples, and added in the complete genome sequences of 500 known gut-related microbes.
The cohort of samples includes European, American and Asian, but is notably lacking any representative samples from Africa and other more isolated cultures. This fact will likely mean that there are even more genes to be added to this catalogue in the future as these samples are added.
The authors claim that on average any one of the samples holds 762,665 genes, and that only 469 of those (on average) are unique to that sample, which could be interpreted to mean 99.94% of all gene content in any one sample was shared with at least one other sample. They also suggest approximately one third of gene content is shared between any two samples in the cohort.
GigaDB are hosting not only the complete reference catalogue in both nucleotide and amino acid translation, but also each individual samples assembly and open reading frame predictions from which the catalogue was generated, as well as all the links to the original raw sequence reads in the SRA from all samples processed, which together with the methods published in the paper should make this research reproducible by anyone with the compute power and desire to do so. It joins other useful resources for microbiome researchers in GigaDB such as the original type-2 diabetes study data, and the software and example files for the EMPeror microbial ecology visualization tool.
Chris Hunter, Lead Biocurator, GigaDB/GigaScience
1. Li, J; et al. An integrated catalog of reference genes in the human gut microbiome. Nature Biotech (2014) doi:10.1038/nbt.2942
2. Li, J; et al. (2014): Supporting data for the paper: “An integrated catalog of reference genes in the human gut microbiome”. GigaScience Database. http://dx.doi.org/10.5524/100064