In this era of “big-data” and supposed data-tsunamis, being able to sift through the vast swathes of information and find what you are looking will become more and more important. Key to this process will be the use of controlled vocabularies and ontologies – properly tagging and labelling the data to make it findable, and because of this relevance to our scope GigaScience is attending the Bio-Ontologies Special Interest Group at the ISMB meeting in Vienna.
Making the data in our database and associated with our articles as searchable and usable as possible is a key issue we would like to focus on, and to that note we have been collaborating with the ISA-tab community and Biosharing network to optimize our database and editorial policies.
Both are represented by our editorial board member (and co-organiser of the meeting) Susanna-Assunta Sansone, who explained the growth of the Biosharing network out of the need to catalogue the hundreds of data-standard and minimum reporting guidelines now out there. The ISA-tab project was also touched upon, and an interactive session organized around the tools that we have also adopted to assist authors in the reporting and management of their experimental metadata.
Several speakers presented on new ontologies, as well as different approaches to use the semantic web and annotation techniques. Paolo Ciccarese presented DOMEO: a Document Metadata Organiser – that allows semantic annotation of digital resources, and their potential sharing by the community. Crowdsourcing is the potential goal of many of these projects, and is obviously one of the ways to deal with vast amounts of data out there. Despite recent successes by BGI and others in crowdsourcing the data from recent E. coli outbreak, other than for high-profile datasets this has in practise so far been hard to achieve. One notable exception has been the Gene Wiki project, and Andrew Su gave the days keynote on cultivating and mining Gene Wiki for crowdsourced gene annotation. Directly linking into wikipedia, this has all of its advantages (user base) and disadvantages (reliability), but aims to harness the “long tail” of scientists (many people that contribute only occasionally) for annotation. For a gene such as Fibronectin there can be 28,000 of articles in pubmed, and so taking advantage of the 10,000 gene “stubs” in wikipedia, and so using community writing to turn these into detailed yet digestible review articles is potentially very powerful (see the page for Reelin as a great example of this).
Despite the advantages of using the enormous user base, the drawback of using a wikipedia based system is the lack of semantic representation. Using semantic media wiki to combine this externally may solve some of this issue, and Ben Good followed this talk showing how genes can be linked to diseases by making a wiki mashup of SNPedia with Gene Wiki. The fact that these applications share the same API makes it much easier in this case, but the growing arsenal of these types of tools will hopefully lead to further success in this increasingly important area.