Software Citation Comes of Age

software citation needed


Since the very start of GigaScience we’ve been strong proponents of Data Citation, helping promote and practice the procedure of affording data the same importance in the scholarly record as citations of other research objects such as publications (see examples of this in GigaBlog and BMC Res Notes). These efforts by the wider community culminated  in the Joint Declaration of Data Citation Principles that came out of FORCE11 Community meetings and workshops that we are regular attendees of, and we endorsed and included these our author guidelines as soon as they were published in 2014. As we also promote and publish the sharing of open software we’ve also asked authors to cite the use of software in the references, but the exact specifics in doing this have been a bit more unclear thanks to software being more dynamic and having many different sources compared to other citable research objects. Such as multiple URLs to the project website or code repository, user manuals as well as publications that describe or introduce the software.

The FORCE11 Software Citation Working Group (of which we again were members) published the Software citation principles in 2016, written to encourage broad adoption of a consistent policy for software citation across disciplines and venues. The FORCE11 Software Citation Implementation Working Group followed this up with checklists for authors and developers of software. Bringing this all together, to really drive adoption of this practice there was a need for more broadly applicable guidance on software citation for journals such as ours, and earlier this year we were involved in the publication of “Recognizing the value of software: a software citation guide” in F1000 Research. This peer-reviewed guidance document includes brief instructions on how software can be made citable alongside a recommended format for software citation that based on which we have updated our author guidelines in GigaScience and Gigabyte journals.

Update your reference managers, a new SAMtools is in town
The main change these new guidelines bring to us have been to include the software version in the citation, as well as a recommendation that if an article exists that describes the software, it should be cited as an additional reference on top of citing the software itself. The first examples we published using these guidelines are now listed in our Example reference style guidelines, and as a widely used project with a suite of tools and many versions, are appropriately illustrative examples for how this should work. The SAMtools suite of tools for manipulating sequencing data is one of the most ubiquitous tools in bioinformatics, acting as the “glue” holding together most genomics pipelines. GigaScience published the first update in 12 years, with two papers on what’s new in SAMtools, describing for the first time the associated BCFtools and HTSlib software library.

Users may want to discuss cite these resources in many different ways, and the new software citation guidelines and our reference style sheet provides insight and common vocabulary into how you can do this. Whether you are discussing the original SAM format and first iteration of SAMtools presented in the first paper; the subsequent spin-off BCFtools and HTSlib software library that came out after; highlighting the project website; citing the use of the many specific versions of the code for these; or discussing some of the bigger picture themes the 12th anniversary overview covered. The many options shown here hopefully demonstrating some useful guidance on how authors should subsequently credit this and other software tools they are using.

Software Citation examples

Don’t Bore Us, Get to the CHORUS
On top of providing detailed guidance, the working group has done extensive outreach to make sure this has been reviewed and adopted by a broad number of both commercial and society publishers. As part of this CHORUS, a network of agencies and publishers working on open access policies and reporting, has created a centralized index of these journal policies with links to the publisher’s site. In a similar manner to their cataloguing of journal Data Availability Policies has become a useful resource for checking journal policies on data sharing. To get the word out to authors and beyond they’ve also published guest posts in venues such as Scholarly Kitchen. We’ve carried out this outreach ourselves, promoting this guidance in a number of conference talks including our “Recognizing the value of software in the COVID-19 era” keynote lecture atthe LanBix 2021 translational bioinformatics meeting in Sri Lanka.

Like data, there is growing awareness that just being available is not sufficient,  and the sharing of research software needs to be FAIR: Findable, Accessible, Interoperable and Reusable (see the new FAIR4RS – FAIR Principles for Research Software). Software citation plays a part in this more-FAIR process through improving the transparency of research by supporting proper attribution and credit, reproducibility, collaboration and reuse. Encouraging building on the work of others to further research. These features all align with the ethos and aims of GigaScience Press, and so we wholeheartedly encourage and endorse these efforts. Hoping that this blog alongside our new author guidelines helps to further educate and promote these important efforts.

Further Reading
Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. 2016. Software citation principles. PeerJ Computer Science 2:e86 https://doi.org/10.7717/peerj-cs.86

Katz DS, Chue Hong NP, Clark T et al. Recognizing the value of software: a software citation guide [version 2; peer review: 2 approved]. F1000Research 2021, 9:1257 https://doi.org/10.12688/f1000research.26932.2