Archiving blogs: Rogue Scholar and lessons from Biome

GigaBlog is now archived in Rogue Scholar, a new service that provides what it calls “science blogging on steroids” through including full-text search, long-term archiving, DOIs and metadata for science blogs such as ours.

While this July we celebrated the 11th anniversary of the launch of our first articles at ISMB in Lyon, it was actually the 12th anniversary of the launch of GigaBlog, the blog of GigaScience Press. As outlined in our first ever GigaBlog post, it was hoped that the blog was a forum to supplement and enhance the content of the journal, saying “Postings will provide updates on the progress of the journal, introduce the GigaScience team and Editorial Board, report on conferences, and provide news on the many current issues surrounding the handling and use of large-scale data and high-throughput biology”. Once the journal launched we had plenty of interesting content to highlight in our long-running series of author Q&A’s, as well as guest posts from authors, reviewers and other interesting experts with something relevant to get off their chests. We also used the blog as a platform to expand upon important Editorial policy issues and changes, and also as a forum for announcements and updates from open science projects. With over 300 postings in 12 years we have collected many favourites over that time that we will highlight in a future post. We’ve built quite a readership over the years, many posts picking up thousands of accesses and receiving positive feedback and comments from readers. Eating our own Open Access dogfood, since 2013 we’ve made sure our blog content was also under an open CC-BY license purely requiring attribution, encouraging and streamlining the process for people interested in re-using or quoting our content (which has come in handy pre-empting and streamlining the process of integrating of our content in Rogue Scholar).

We know from experience now keeping a blog live and running for 12 years is not an easy task. Especially with multiple blog platform migrations in that time requiring a lot of work in fixing formatting, links and images. In the first 5 years of our history GigaScience was co-published by BioMed Central, and when they changed blogging platform that broke many of the links in our posts that had to be manually fixed. When we moved to Oxford University Press, as the blog was hosted in WordPress we managed to export the WordPress XML file and deploy to our own WordPress installation so we wouldn’t have to depend on third parties and could control the blog ourselves. Again many links and images were broken despite BMC being helpful with many redirects, and there were inconsistencies in tags and other metadata caused by this move that required a lot of fixes from that move. If we were less motivated in maintaining the efforts of all of the authors it would have been easier to let the blog die and maybe again. These travails made the launch of the fantastic Rogue Scholar by Martin Fenner such an attractive solution to the blog stability issue.

Treating blog posts like journal articles and giving them DOIs mean these links are stable and can be redirected in the future whenever the underlying posts migrate somewhere else. Rogue Scholar also includes features like Fediverse integration, and GigaBlog will be getting its own Mastodon feed (to join the GigaScience and GigaByte feeds already setup).

Collecting and linking the DOIs to important metadata, this makes the posts citeable and trackable via ingestion into CrossRef. As well as saving the blog links and metadata the textual content is also archived and indexed, offering full-text search, capture of tags and images, and any references also shared with CrossRef. Rogue Scholar does all of this by harvesting our content and metadata using RSS, and Martin seemingly effortlessly handled all of the setup for us. The service is currently free for up to 50 posts per year, and the only thing we had to do on top of that was pay $1 per post for our archival posts. Currently there are nearly 50 science blogs hosted, including many that we’ve been following over the years from active scientists, scholarly comms experts and other journal blogs such as ours. If you have a blog you’d like to archive contact Martin and join the growing Rogue Scholarly network.

How not to maintain blog content: the biome experience
Maintaining a blog is not easy work, but the continuing popularity of the medium shows that there is value in this form of communication. Even if science blogging networks have risen and fallen (goodbye Guardian Science Blog Network), and many informative and hilarious science blogs we loved to follow have bitten the dust (please come back Opinionomics and The Science Web). Science blogging providing the space for scientists and those involved in scholarship to better communicate what they are doing and how the scientific process works, addressing a key issue of improving openness and inclusivity in science. This need was highlighted by UNESCO in their Open Science Recommendation (see our Editorial on this), who said that this type of Open Science  “opens the processes of scienti­fic knowledge creation, evaluation and communication to societal actors beyond the traditional scientifi­c community”.

There’s obviously value to these efforts, and publishers claim their role is to “add value”, so publishers has risen to the challenge and set up a plethora of blogs and blogging networks covering their journals. While many of these blogs are just going through motions, reposting press releases and making short administrative announcements, some put the effort in to write and commission insightful posts that build upon and throw additional context and background upon their published content. One such effort was our former co-publishers BioMed Central “biome“, leveraging their blogging platform but positioned as a journal that “brings together a selection of new insights from across the entire spectrum of biology and medicine”.

Rogue Scholar blog

This included Research synopses, author spotlights and videos (e.g. our EBM Carole Goble on Open Source software), featured articles and more. As BioMed Central (now just “BMC”) has an evolving portfolio of some 300 peer-reviewed journals this provided a great means of highlighting important and interesting content that might otherwise get buried in the firehose of the hundreds of articles published each day.

As one of the journals in their portfolio when biome launched in 2013 we embraced the chance to use this excellent forum, and we and our authors and Editorial Board Members contributed lots of content. Biome ran for two years, was promoted with print versions at conferences, and definitely helped engage and widen our and other BMC journals readership. Then in 2015 without warning or any explanation biome was pulled by BMC and deleted from the web. As this was carried out suddenly and without announcement there was no way to salvage and copy over any of the content, and this is now lost in the void (or at least everywhere but the Internet Archive Wayback Machine) forever.

That was the biome that was. Historic screencap of how it looked before deletion. CC-BY BMC

While there is no legal obligation for publishers to maintain science blogs in the same manner as their scientific articles (for example BMC and GigaScience Press subscribe to CLOCKSS to enable dark archiving of content in case of journal closure), it’s still bad practice that the supposed guardians of the scientific “version of record” cannot maintain and archive their own records and blog back content. Especially as this was branded as “biome journal” rather than plain vanilla blog, and we and others contributed a lot of time and effort into providing content for it. While the reason given for closure were never given, as BioMed Central had been acquired by a much larger publisher servicing large amounts of debt, the profit-margin requirements of servicing private equity firms who own them, and multiple failed IPOs, their priorities in how much additional “value” they need to add to the scientific community are probably now different to when they were a smaller publisher. See also the sorry tale of their previously “turtley cool” OA mascot Gulliver Turtle.

OA mascot Gulliver was a popular attendee at scientific conferences, but after the BMC’s acquisition there were many rumours about his sudden disappearance.

If Rogue Scholar had existed in 2015 this may have been a different story, as it provides a great example of what should be the standard for science blogs now, and Martin has also provided a much more streamlined and simple method of carrying out the archiving and DOI registration process. We’d like to thank Martin for his excellent work here, and would encourage other science blogs to use Rogue Scholar (and also support his efforts to provide archival DOIs for the many useful blogs he’s already included). Going forward if readers ever want to link to our cite our posts we would also recommend the use of the Rogue Scholar DOIs rather than the blog URLs to future proof their long term accessibility.

References

Anderson G. GigaScience, Giga-database and now GigaBlog: new resources for the big-data community. GigaBlog. Published online July 6 2011 https://doi.org/10.59350/r3pva-55v87

Edmunds SC et al., A Decade of GigaScience: Milestones in Open Science. Gigascience. 2022 Jul 12;11:giac067. https://doi.org/10.1093/gigascience/giac067