GigaScience, Giga-database and now GigaBlog: new resources for the big-data community

As biological data is now produced faster than it can easily be handled and stored, the dissemination of this data has become a major bottleneck. GigaScience: a new type of journal from BioMed Central and BGI — no stranger to these issues being the world’s largest Genomics center — starts taking submissions today with the goal of addressing many of the issues surrounding “big-data”. Much of the rationale and features of the GigaScience journal and its associated database is presented on our website. But with a  scope that covers any biological and biomedical “large-scale” data (and the “(Giga)n” refers to gigantic rather than a specific number), one important question is how exactly are we defining “large-scale”? The answer unfortunately is: it depends.

What makes something big-data varies greatly from field-to-field, and also changes rapidly with technological developments; so this is a question we will be regularly asking our editorial board and scientists in different research communities. But, to keep our readers and authors updated, rather than constantly changing this information in our instructions for authors, we feel a blog makes a better forum for this type of open-ended discussion. We also hope to hear from you as to your thoughts on what constitutes “big” data, especially for those areas that are not generally thought of as having large-scale data resources — like cellular development with a myriad of imaging data types, neuroscience and electrophysiology, and cohort studies with metadata that has many permissions issues needing to be discussed and solved.

Launching our first post here, and as a guest on the BMC blog, we’d like to welcome you and hope our future blog  discussions will supplement and enhance the content of the journal. Upcoming postings will provide updates on the progress of the journal up to its formal launch in November, introduce the editors and editorial board, report on conferences, and provide news on the many current issues surrounding the handling and use of large-scale data and high-throughput biology. The blog will also highlight interesting datasets deposited in our database and new types of  large-data from different, potentially unexpected, biological fields.

As part of our prelaunch activities, GigaScience has just released its first datasets that are marked with a citable DOI and  have no restrictions on use. These datasets include the sequence and assembly data from the recent deadly outbreak strain  E. coli O104 from BGI and the University Medical Centre Hamburg-Eppendorf, as well as 7 large vertebrates sequenced for  the Genome10K project, a worldwide collaborative effort to sequence 10,000 vertebrate genomes. These data include the Giant Panda, the Chinese Rhesus and Crab-Eating Cynomolous Macaques, the Polar Bear, the Emperor and Adelie Penguins, and the Domestic Pigeon. The usefulness of this novel method of rapid data release —prior to manuscript publication— is exemplified by the recent release of the E. coli O104 data as it was being created; this resulted in immediate “crowd-sourcing” of the data by the research community and has already aided the fight against this deadly outbreak.

We want to give a special thanks to the international group of researchers who took this important step toward finding the best means to balance the needs of the larger community to gain access to the data with that of obtaining credit for their work. Additionally, we would like to thank BGI and BMC for their support and help in setting up this venture. We’d like to give our  appreciation to Datacite and the British Library for working to provide DOIs for our associated datasets, and to ISA-Tab for helping  with standardization of our data-submission system to make it more adaptable, standardized, and ISA-tab compliant. We’d also like to thank our growing editorial board for their (present and future) support.

We are excited about this new endeavor and are looking forward to working with the entire community to speed research, push open access, and aid in making these important resources permanently available for use and reuse.

Laurie Goodman, Editor-in-Chief
Scott Edmunds, Editor
Alexandra Basford, Assistant Editor


