Curators Capture Cambridge. Biocuration 2019

Biocuration 2019The 12th International Biocuration Conference was held in Cambridge, UK from April 7-10th 2019. As regular participants of the meeting you can read our write-ups of the meeting going back to 2012. This is a forum for biocurators and developers to discuss their work and to promote collaboration. GigaScience had a visible presence at Biocuration 2019, which included an oral presentation by GigaScience Lead Biocurator Chris Hunter, an engaging and thought-provoking workshop on ‘Equality, Diversity, and Inclusion’ chaired by GigaScience Data Editor and ISB Executive Committee member Mary Ann Tuli, and poster presentations by GigaScience Database Developer Xiao Si Zhe and GigaScience Data Scientist Chris Armit.

From Knowledgebases to Micropublications
Paul Sternberg, who is Director of Caltech’s Center for Biological Circuit Design and a Principal Investigator of WormBase, delivered his keynote talk on knowledgebases and highlighted that, “articles are too complex to read, write, and ensure reproducibility. Data and knowledge does not flow into knowledgebases.” An intriguing solution, according to Paul, is the example of micropublications and the Micropublication Biology journal. Breaking up the research cycle in a similar manner to the way we at GigaScience give DOIs to the more granular Research Objects such as data, protocols and workflows, in this scenario, authors are invited to complete a template, equivalent to a curation form, as part of the micropublication submission process. A feature of this submission process is that methods per publication will take longer to complete, which is intentional as it is hoped that this will enable more detailed methods to be captured per experiment than is found in regular journal articles. As an example of a micropublication that we are all familiar with, Paul showcased the famous 1953 Watson and Crick Nature paper which introduced the famous DNA double helix in a single diagrammatic figure. Paul added the quip that, “there is no data in this paper. To access the data, you would have to ask Rosalind Franklin.”

Elixir: for the many or for the few
This was followed by presentations on Global Data Coalition by EMBL-EBI Director Rolf Apweiler, and Core Data Resources by Elixir Director Niklas Blomberg. A main point that came across from this session was that reference datasets are the foundation of, what Niklas refers to as “the data ecosystem”. However, in an emerging funding crisis, University of Cambridge’s Paul Schofield had the excellent question for Rolf and Niklas, “is there danger of government funding large resources at the expense of smaller resources?” This is a salient issue as scientific research does require in-depth knowledge and expertise, and many of the smaller database resources were created to cater to the needs of a specific research community. These smaller resources are additionally those that are most under threat from funding cuts at this moment in time. Niklas acknowledged that, “small databases have to obtain funding at national level, which is difficult” and that “Elixir helps small databases connect with larger databases.” However, I did not hear an explanation of how Elixir will financially support smaller database resources, or whether this is indeed its intention, and with the existing funding crisis we have to consider that this issue remains thoroughly unresolved.

Biocuration 2019

Niklas Blomberg and Rolf Apweiler discuss Core Data Resources. This session was chaired by Sandra Orchard.

The challenge of choosing the correct Data License
The workshop on ‘Diverse Perspectives on Data Licensing’, chaired by Andrew Su, Monica Munoz-Torres and Raja Mazumder, was a welcome addition to this year’s Biocuration 2019 meeting. Licensing is an area of interest for us as we very strictly mandate CC0 public domain waivers for our datasets. Andrew Su explained that, from a data integrator’s point of view, data license restrictions impede progress and are ineffective. Specifically, the problem relates to a general mis-appropriation of licenses, whereby researchers add a license of CC-BY to their data because they want attribution and not because they actually want to have any rights as to who can reuse their dataset. Francis Davey of Anderson Law LLP additionally presented the legal perspective of ownership of databases, and highlighted that the CC-BY license, which many academic institutes opt for by default, is a complex license as it includes a lot of different options. Niklas Blomberg added that a copyright-free CC0 license would stimulate “an undergrowth ecosystem of small companies”. The solution, according to Andrew Su is a new type of license, that he describes as CC0 (+BY) where there is an expectation of attribution, rather than legal enforcement. This is effectively what we are doing at GigaScience, with our CC0 datasets having a terms of use page requesting (via research etiquette and good conduct rather than legal means) attribution.

Equality, Diversity, and Inclusion (EDI) in Biocuration
The Biocuration 2019 workshop on ‘Equality, Diversity, and Inclusion’ was chaired by GigaScience’s, Mary Ann Tuli. The introductory slides explained what these terms mean and how they are being embraced by scientific institutes in different countries. This was followed by a more in-depth and very informative presentation by the invited speaker Dr Saher Ahmed, head of EDI at the Wellcome Sanger Institute, Cambridge, UK. Dr. Ahmed discussed gender discrepancies in the workplace, and highlighted some efforts at Sanger to address these issues, such as pay transparency, changes to their leave policies, and creating a family-friendly workplace.

The remaining time was spent on some lively discussion by the 30 attendees on issues such as the gender pay gap, maternity, paternity & carers leave, cultural differences in working practices and accessibility. As an outcome of this workshop, attendees agreed there is a need for the ISB to create an EDI subcommittee and that this workshop should be held at subsequent Biocuration meetings. The EDI subcommittee is currently being formed, and the exact roles are to be defined, but they will address issues including a code of conduct amongst the Society as a whole and at conferences, and accessibility at conferences and for ISB activities.

Equality, Diversity, and Inclusion at Biocuration 2019. Image credit: George Georghiou.

GigaScience leads the way in Discoverable Data
During the Database session, GigaScience Lead Biocurator Chris Hunter presented on the ‘Increased interactivity and improvement to the GigaScience Database’ (slides here) and highlighted how web tools that have been recently incorporated into the web interface are making data easier to access. The work presented just published in Database Journal’s Biocuration 2019 virtual issue covering the latest improvements to our GigaDB repository. Chris also joined Lynn Schriml of the University of Maryland School of Medicine in a discussion on ‘Expanding MlxS Genomic Minimal Information Standards’. If you were not aware of the Genomics Standards Consortium (GSC), or seen our series of papers from them, the GSC is a ‘grass roots’ voluntary effort that aims to make genomic data discoverable through developing metadata standards that render datasets computationally comparable. Chris Hunter is on the GSC Board and ensures that GigaScience adheres to these metadata standards.

Chris Hunter presents improvements to the GigaScience Database. Image courtesy of Dr Sara El-Gebali, EBI.

FAIRsharing and the importance of Data Standards
On a relevant note, keynote speaker and GigaScience Editorial Board Member Susanna Assunta-Sansone delivered a presentation on FAIRsharing, which is a registry of interlinked meta(data) standards, repositories, knowledgebases, and policies, plus a set of tools and services that enable discovery and visualisation of these resources. Covering the new FAIRsharing community network that we wrote about last month, Susanna highlighted that private sharing of data is more common than public sharing of data, and this is to the detriment of the research community as a whole. Of interest, according to Susanna’s metrics, Poland and Germany were significantly better at public data sharing (75% or greater) whereas the UK and USA did not score as high (<60%). From a global perspective I was interested by one of the points raised by Jiawei Cui of the Chinese Academy of Sciences (CAS), who delivered a talk during the following session on ‘Data Standards and Ontologies: Making data FAIR’, and who concluded her presentation with the following statement:

“Compared with foreign countries, the development of biomedical scientific data metadata standards is not systematic, and the application has not reached expectations. Many standards are still in the exploration stage. Therefore, in the future, we should increase cooperation with foreign frontier institutions to expand the construction and application of metadata standards in the biomedical field.”

This coming together in the spirit of cooperation is the very essence of why the International Society of Biocuration exists, and we look forward to more collaboration of this nature guided by a growing international community of biocurators and developers.

The 13th International Biocuration Conference will be held in Bar Harbor, Maine, USA from May 17-20, 2020. We look forward to seeing you there.

Further Reading
Xiao SZ, Armit C, Edmunds S, Goodman L, Li P, Tuli MA, Hunter CI. Increased interactivity and improvements to the GigaScience database, GigaDB. Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz016.