Below, in a guest blog, Professor Huanming Yang presents his thoughts on the next method that could be used to get around this issue, using human genome sequencing information as a mechanism to eliminate the need for those two steps. Prof Yang is one of the founders of BGI (our co-publisher), and has an interest in synthetic biology, previously publishing a guest blog with us on synthetic genomes.
An in silico system for storage of information using the human genome sequence
So far, all the proposed methods or ideas for DNA information storage require the following: 1) An in silico system with relevant programs to encode and decode the information. This makes possible conversion from the information stored in a binary code (0/1 of the digital information) to then be converted to DNA sequence (A/T/C/G of the nucleotide) and vice versa. 2) A DNA synthesizer to synthesize a series of DNA nucleotides, stored either in vitro or in vivo. 3) A DNA sequencer to read the DNA sequences, and the final steps to re-convert the DNA sequence to the binary code from which it began and can be machine read. While this is a completely feasible system using current technologies, both steps 2 and 3 are very expensive and laborious, limiting the feasibility of this approach for broad use.
Here, my colleagues at BGI and I propose a fully in silico system for storing information and archiving data using DNA sequence that will be universally applicable, reliable, convenient, and inexpensive. The major advantage of our proposed system is that it eliminates the need for both DNA synthesizers and sequencers. The basic premise of the protocol is to store information in the human genome sequence.
Our proposed system contains the following elements: 1) The information to store, which might be any text, images or video. These would be converted to a machine-readable binary code, followed by further conversion into DNA sequence. (This part is the same as other proposed methods or ideas) 2) A “template” sequence such as the entire human genome DNA sequence or selected regions that are without known variations, e.g. a cDNA sequence. Here, each nucleotide, A/T/C/G, in the “template” sequence is labeled with a “position code”. 3) A computer program to (a) select each required nucleotide (N=1) or a string of several consecutive nucleotides (N>1) from the “template” sequence to form the “information” sequence (Fig. 1). This is a relatively simple system that is composed of a “position code”, which indicates the position of a nucleotide in the “template” sequence, and a “grammar code”, which indicates, for example, spaces between words, beginning or end of sentences, and other grammatical components, (b) convert the “information” sequence into the binary code, and (c) reconvert the binary code back to the initial information to store.
In addition, the “template” sequence could also be any other reference genome or a human-designed synthetic sequence. An “internationally standardized template sequence” could be adopted and stored in DNA molecules either in vivo as a few kb DNA insert in a vector replicating in a living organism, or in vitro as synthesized oligonucleotides in lyophilized aliquots for convenient manipulation and long-term storage.
It is also worth noting that this system could also be used to transfer protected or secret information at a high level of security.
In summary, this proposed system has several advantages: 1) Greatly eliminating the need to use expensive and laborious DNA synthesizers and sequencers and 2) Its capacity allows storing an unlimited amount of information. Finally, we specifically recommend use of the human genome sequence information as it is freely available to follow the HGP Spirit: “Owned by All, Done by All, and Shared by All” proposed by us and endorsed by the HGP community, and that this DNA sequence will always be present in every human being for ever and everywhere. It is fitting that storage of all of the accumulated knowledge of the human race can be stored in the in the material that encodes the information to build every human.
Feel free to add your feedback and comments via the blog commenting system, and communications should be addressed to <email@example.com>.