Lessons From the “Data Publication Spring”: DataCite Summer 2012

Readers of this blog will be well versed on our and others work using DataCite Digital Object Identifiers (DOIs) to cite data, and this months DataCite summer meeting in Copenhagen was a good opportunity to take stock of the many recent developments in the area of data publication, with the last six months being particularly busy with the number of new data platforms and data journals announced. On top of the many new data-journals already highlighted in our blog (see this posting), Wiley-Blackwell has just entered the data-publication arena with Geoscience Data Journal (and promoted at the meeting by their Royal Meteorological Society partners). Further data-publishing examples were also on show at the meeting, with Vishwas Chavan from the GBIF talking about the use of DOIs and data publication for biodiversity research, including GBIFs work with the publisher Pensoft in the launch of a number of data journals including Zookeys (for more see the related BMC Bioinformatics special issue and video of his talk here).

Since last years meeting (for our previous report see here) there has also been notable successes in the adoption of the practice of data-citation and its acceptance by journals. In the last month announcements have been made regarding DOIs adoption as an ISO standard (see here), and additional data platforms such as Figshare now using DOIs. During the meeting the STM-Association also signed a joint statement with DataCite to encourage publishers and data centers to link articles and underlying data. The last year has seen many improvements and additions to the resources provided by DataCite such as the metadata search engine and content resolvers. The utility of providing these functionalities and APIs is already providing useful, and Andrew Treloar from the Australian National Data Service presented at the meeting interesting examples of the use of the DataCite API to provide useful “related external dataset” functionality for their data portal (see video and slides).

Being invited to speak in the “Different Flavours of Use” session, we took the opportunity to discuss GigaScience‘s experiences releasing and citing datasets through the GigaDB database. Being a year on from the release of our first dataset with a DOI – the genome of the nasty E. coli strain that led to so many deaths in German last year (the subsequent crowdsourcing of this data just cited by the Royal Society “Science as an Open Enterprise” report as an example of “The power of intelligently open data”), it was also a good time to look back at the downstream consequences and use of this data. Following from our recent correspondence covering a lot of these issues published in the BMC Research NotesData standardization, sharing and publication series” (also covered in our blog here), our presentation on “Adventures in Data Citation” talked through the topics covered in the paper in more detail, as well as provided some perspective from working at BGI on the issues surrounding data-sharing in genomics for the very broad audience (slides are available here, and the video is also up).

Many of the speakers raised the point that data-citation had the potential of providing authors with additional credit and incentives to invest the time and effort to make data available with sufficient meta-data, although one of the key concluding issues raised in our talk was that this has so far been hindered by the major citation indexes not tracking datasets (for a good perspective on this see Heather Piwowar’s recent blog post). The announcement last week that Thomson Reuters are finally unveiling a data citation index is therefore a very well timed response to this, and hopefully will remove one of the last obstacles holding back data citation from becoming standard practice.

In the final talk of the meeting, our editorial board member Susanna-Assunta Sansone also covered the issues relating to insufficient metadata and data interoperability, and presented on the ISA-commons metadata tracking framework aiming to aid data-curators and data-submitters with these issues. Looking to the future she highlighted an exemplar article in our forthcoming first issue next month that for the first time has been submitted to the journal with all of its associated datafiles metadata in ISA-tab format (see her slides and video for more). We are excited that the proofing of our first papers is almost over and we will have a number of exciting papers and work to show very shortly. Watch this space next month for announcements relating to this over the next few weeks.

Further talks from the meeting can be viewed from the DataCite youtube page, and additional write-ups and notes have been posted by Susanna Sansone and Sarah Callaghan.


Further Reading

Scott C Edmunds, Tom J Pollard, Brian Hole, & Alexandra Basford (2012). Adventures in data citation: sorghum genome data exemplifies the new gold standard BMC Res Notes
5:223 DOI: 10.1186/1756-0500-5-223