Opening a Cabinet of Curiosities in Montreal
Readers of this blog must know every summer the GigaScience Press team gathers at the ISMB (International Conference on Intelligent Systems for Molecular Biology) conference, where the great and good of computational biology gather for the largest bioinformatics conference of the year. Being the meeting where GigaScience launched in 2012, which is why we always celebrate our birthday here. The heart of the scientific program is the many parallel Communities of Special Interest (COSI) meetings, and of these BOSC- The Bioinformatics Open Source Conference, is a particular fave of ours and we again were silver sponsors and participated in this track.
Another Giga-Birthday Ball, in Montreal
This year ISMB2024 was in Montreal, and the organisers (including our Ed Board Member Francis Ouellette) did a fantastic job organising a fantastic program with plenty of French-Canadian flavour. With poutine stations in the opening reception, the 25th anniversary party for the Bioinformatics.ca community, and an Explore Montreal event on the Saturday evening. We got into the Québécois spirit, providing birthday maple flavoured beignets on the OUP booth, and local boy Denis Villeneuve inspired GigaPanda t-shirts given away at our birthday party in Winnie’s bar in Old Montreal. It was another good turnout this year, and great to catch up with old and new friends to celebrate turning 12 (see the photo album of the trip).
Another famous Montreal export is Cirque Du Soleil, and this year the show had a topical theme for the nearly two-thousand computational biologists coming to the city. The show – KUROS – Cabinet of Curiosities – covered blue-skies exploration and innovation, where a scientist and his team creates a machine that leads to a place where the craziest ideas and the grandest dreams lie waiting. Talking of crazy-idea generating machines, as with last year Machine Learning was a dominant theme of the ISMB program, and on top of a number of keynotes and tracks in the other COSI’s there was dedicated “Machine Learning in Computational and Systems Biology” COSI also running.
Another Yearly Tech Check, in Quebec
On top of the usual community, standards and open science talks, BOSC had a heavy ML and LLM focus year. With Mélanie Courtot and Andrew Su both covering the topic in their keynotes, and a panel featuring them. Larry Hunter, and Thomas Mboa covering “Open Source AI/ML: A Game Changer for Bioinformatics?”. Covering the issue of responsible AI, there was interesting debate between Larry and Andrew on the effectiveness of open v closed sourced models. Larry saying we need to be patient as the open models are going to lag in utility, but the closed models will eventually be skewed by advertising and will not be suitable for good science.
Increasing Transparency in AI/ML Research
On the second day of the BOSC COSI there was a specific session on “Open approaches to AI/ML”, and we participated in this, our Data Scientist Chris Armit presenting on our trial using the DOME-ML standards and annotation wizard in our publishing process for Machine Learning submissions. You can see Chris’s slides in Zenodo, and the talk was our first public announcement on how we are using the DOME-ML standards to help us deal with the exponential increase of ML publications in biology. DOME stands for data, optimization, model and evaluation, which are the key components of an ML implementation, and DOME-ML proposes a list of minimal requirements that when followed help to assess the quality and reliability of reported methods more faithfully. Coupled with their registry and an annotation wizard that helps authors collect this key information together, and then share it with reviewers and readers to scrutinize and find the key information and inputs and outputs they need. As Chris pointed out in his talk, if you submit this type of work to GigaScience you will see this workflow in practice, and there are already a number of papers that have passed this process. Where if you see in this example (“Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning–driven data analysis”) the DOME-ML annotations are highlighted and linked in the associated GigaDB entry, and then the end products eventually curated and hosted in the DOME-wizard and DOME-registry (see this example). This trial is still evolving and underway, and watch this space or further updates and announcements from GigaScience and the DOME-ML consortium.
In the rest of the program there was a huge amount of interest to catch up on and see. The opening keynote from Fiona Brinkman really set the conference on an Open Science path, covering “Sensitive Sustainable Science”, presenting from her microbial genomics and infectious disease background on how open science is evolving, building and building upon the FAIR and CARE Principles for Indigenous Data Governance. As Open Science is also baked into the mission of GigaScience, talks like this remind us why ISMB is our favourite meeting. There was also a great keynote from Tandy Warnow on progress in large-scale phylogenomic estimation methods, presenting examples from many of the taxa-specific phylogenomics projects that she worked on and we published the data for including the Avian Phylogenomic Project, and the 1000 Plant transcriptomes initiative (1KP).
On the last day of the meeting the ISCB Publication Committee organised a dedicated session on “Demystifying the World of Scientific Publishing”, as to most researchers the “under the hood” processes of how publishing works is every bit as much a black box as Machine Learning is. Co-hosted by our Editorial Board Member Francis Ouellette and Ragothaman Yennamalli, the session covered essential topics such as navigating the publication process, ethical considerations, different publishing models, and managing rejections. With Editors participating and presenting from many Computational Biology journals such as PLOS Computational Biology, Bioinformatics Advances, Database Journal, and ourselves. It covered many of our favourite topics, Patricia Palagi presenting on reproducible published on FAIR, Alex Bateman again covering the use of LLMs in writing papers (see our recent policies updates on the topic), and our Editor in Chief Scott Edmunds covered the tricky topic of challenges with reviewers: reviewing fatigue, acknowledgement, and dealing with delays and rejections. Making a depressing topic a bit lighter by presenting it through the medium of Canadian icon John Candy films, and also providing suggestions of making the review process more scalable through applying the FAIR priciples to it, making peer reviews more interoperable and reusable (see for example our preprint review and Sciety integration). The session then concluded with a panel, with lots of audience questions demonstrating how this is a topic badly in need of demystification.
Next year ISMB will be located in Liverpool, and we look forward to some further BOSC, Beatles and Bioinformatics experiences in 2025.
References
Carpenter EJ at al. Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP). Gigascience. 2019 Oct 1;8(10):giz126. doi:10.1093/gigascience/giz126.
Jarvis ED et al. Avian Phylogenomics Consortium. Phylogenomic analyses data of the avian phylogenomics project. Gigascience. 2015 Feb 12;4:4. doi:10.1186/s13742-014-0038-1.
Armit C. (2024). Trust and Transparency in Reporting Machine Learning: The DOME-GigaScience Press Trial. Bioinformatics Open Source Conference 2024 (BOSC2024), Montreal. Zenodo. https://doi.org/10.5281/zenodo.12752392
Walsh I, et al. DOME: recommendations for supervised machine learning validation in biology. Nat Methods. 2021 Oct;18(10):1122-1127. doi:10.1038/s41592-021-01205-4.
Akshay A et al. Machine Learning Made Easy (MLme): a comprehensive toolkit for machine learning-driven data analysis. Gigascience. 2024 Jan 2;13:giad111. doi:10.1093/gigascience/giad111.