Another incremental step has been achieved for the adoption of the practice of data citation; this week, Nature Biotechnology has included one of our dataset DOIs in their references for the first time. In "Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome", Zhiyu Peng et al. produced a new pipeline to filter and compare RNA-seq transcriptome and whole genome sequencing data to detect RNA-editing events. Much of the supporting data has been released pre-publication and hosted by our GigaDB database and, as RNA-editing is still quite a controversial phenomenon, the greater transparency enabled by making all of this data publicly available is obviously very welcome.
This "RNA-editome" is the latest "ome" (apologies to Jonathan Eisen) to come from the Yanhuang (YH) Genome project – named after the two emperors thought to be the ancestors of China’s largest ethnic group (hence the blog title and picture). After the publication of the YH reference Asian diploid genome in 2008, a peripheral blood mononuclear cell methylome and now RNA-editome have been released from the same anonymous Chinese donor. All raw data and assemblies have been made available through NCBI, and this has been complemented by these and additional datasets from the whole genome, epigenome and transcriptome being made publicly available in a citable form from our GigaDB database.
With the assistance of the British Library and DataCite consortium we have been releasing datasets (many pre-publication) with DOIs since the launch of our database last year, and we have already written much about the issues surrounding this relatively new form of data release in GigaBlog. Things have been hotting up in the data publishing field in the last few months, and while editorial policies regarding pre-publication data release in this manner are still unclear for many publishers, the wonderful people at the newly launched F1000 Research have been compiling a very useful list of journals that have now drafted policies.
On top of journals allowing data to be disseminated in this way, one of the key steps to allow data-citation to work and be trackable is to actually cite the data in the references. While GigaScience data DOIs have been been previously included in publications in Nature Biotechnology (two Macaque genomes) and Science (the genome of an Aboriginal Australian individual), these were not listed in the references. Following on from the recent inclusion of data from the sorghum genome in the references a Genome Biology paper, this is this is the first time we have managed to get DOIs listed in the references of a Nature journal. We’d like to thank the authors of the manuscript for making their data available in this way, and the editorial and production teams at Nature Biotechnology for working with us to include the DOIs.
Z et al., Comprehensive analysis of RNA-Seq data
reveals extensive RNA editing in a human transcriptome. Nat Biotech
2012, advance online publication.
2. Tian Z et al., (2011): Transcriptome from a
lymphoblastoid cell line taken from the YH Han Chinese individual.
3. Hayden EC. Evidence of altered RNA stirs debate. Nature. 2011 26;473(7348):432.
4. Wang J et al., The diploid genome sequence of an Asian
individual. Nature. 2008 Nov 6;456(7218):60-5.
5. Li Y et al., The DNA methylome of human peripheral blood mononuclear cells. PLoS
Biol. 2010 Nov 9;8(11):e1000533.
J et al., (2011): Genome
sequence of YH: the first diploid genome sequence of a Han Chinese individual.
Y et al.,
(2011): DNA methylome of human peripheral blood mononuclear cells from the YH
Han Chinese individual. GigaScience. http://dx.doi.org/10.5524/100014
8. Yan G et al., Genome sequencing and comparison of two nonhuman primate animal models, the
cynomolgus and Chinese rhesus macaques. Nat Biotech 2011 advance online
9. Rasmussen M et al., An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia. Science 2011 Oct 7;334(6052):94-8.
LY et al., Genome-wide patterns of genetic variation in sweet and grain
sorghum (Sorghum bicolor). Genome
Biol. 2011 Nov 21;12(11):R114.