Publishing Citizen Science Data: Q&A with the Hong Kong Jellyfish Project

May 20, 2024

Today we publish a new Data Release presenting a dataset of jellyfish sightings collected by citizen scientists from 2021 through 2023 within Hong Kong waters. This is the first example where our curation team have worked with a Citizen Science project to share their observations in the GBIF biodiversity database. Here we have a Q&A with lead author John Terenzini on why the Hong Kong Jellyfish Project is important, and how the process of harmonising, curating and sharing the data went.

Problems with scientific data collection are hindering efforts to halt mass extinction and biodiversity loss. We require this data to produce accurate models, create policies based on these model to address this loss, and also to determine and stop those who are responsible. The GBIF (Global Biodiversity Framework) platform has become the go-to home for this type of data, massively growing since its setup by OECD in 1999 to now hosting close to 3 billion records. Because it is scalable Citizen Science has become GBIF’s biggest source of data, particularly citizen science-derived data from the massively popular smartphone app-driven eBird and iNaturalist projects. These provide large volumes of data to GBIF but are also introducing biases in data types (birds are very much emphasized for example). There are also some smaller and mid-sized citizen science projects sharing their biodiversity data in GBIF but more of these are needed to fill persisting spatial and taxonomic biodiversity gaps. And there are challenges in teaching these many projects the best practices in biodiversity informatics to be able to submit using the Darwin Core standards. We’ve recently published iNaturalist Data Release descriptions in GigaByte (see more in GigaBlog), but Citizen Science data can collected many other ways and via many other platforms, and also goes beyond biodiversity to include environmental datasets and beyond.

With generous support of the WHO, targeted data publishing has been an approach we’ve collaborated with GBIF on recently to incentivise more submissions of vectors of human disease data. This included a few examples getting Citizen Science data into GBIF such as the Mosquito Alert project from Spain, and the Kissing Bugs and Chagas disease in the United States Community Science Program. See the recent Editorial from the vectors task group marking the end of the second phase of submissions for more insight into these efforts. As part of this process our curation team gained experience of handling and peer reviewing GBIF data, and GigaScience Press has also gained experience as a GBIF publisher so we can help host data for projects that may not have an obvious regional node and publisher to work with.

The GigaScience Press Hong Kong Jellyfish Project GBIF landing page

To take this approach wider we’ve started to work with Citizen Science projects directly, and the first fruits of this have just been published in GigaByte with the observation data and human readable Data Release article describing data just published by the Hong Kong Jellyfish project. It’s been noted that Citizen science is biased toward vertebrates and terrestrial ecosystems and, even within each taxon, toward particular species groups that are easier to observe or have big communities of hobbyists spotting them. Bottom of the heap then are marine creatures and invertebrates, and this project is a perfect example filling these gaps of under-represented taxa and parts of the world. We also have a soft-spot for jellyfish (see a previous blog) and are happy to help boost the profile of these under-appreciated indicators of marine pollution and climate change.

To explain more we have one of our GigaBlog Q&A’s with author and founder of the Hong Kong Jellyfish Project John Terenzini, who will give some insight into the process of working with us to curate and archive this precious biodiversity data.

John carrying out a jellyfish survey in HK

Tell us a bit about Hong Kong Jellyfish Project. Why did you feel getting citizens involved in surveying jellyfish in Hong Kong was useful?

Citizen science has been shown to be effective in monitoring biodiversity at broad geographic and temporal scales. Because jellyfish occurrences can be infrequent or irregular, having “many eyes” looking for them across the breadth of Hong Kong is the most efficient way of discovering what jellyfish we have in local waters. Also due to the difficulties in conducting marine research, from equipment needs to higher costs than terrestrial research, using citizen scientists’ observations is a cost-effective method, with a low barrier to participation due to the prevalence of smartphones used by the general public.

Gathering data from this wider range of sources, are there any new findings that have come out of it?

It has been really exciting to have citizen scientists share their discoveries with the Hong Kong Jellyfish Project (HKJP). People’s curiosity about jellyfish really drives the whole project and through their photographs and reports, the HKJP has been able to publish several new species records for Hong Kong, including new two new species records of box jellyfish to complement the new jellyfish species discovery in Mai Po by different researchers. A forthcoming paper by the HKJP will summarize the far more expansive jellyfish diversity of Hong Kong than previously known, using citizen scientists’ reports and by reviewing the scientific literature.

You’ve collected data from several different sources (your website, iNaturalist, and social media or email submissions), so what have been the challenges in bringing all this data together?

Due to the different nature of each of these data sources, the available information provided may be different and need to be organized in a similar manner. On the website, there is a form to use detailing exactly what information is requested (time, date, location, species, etc.), however on social media or through email, an observer may provide only one or two pieces of this information and requests for more information may be needed. On the website, the form is in English and has been translated to Traditional Chinese, allowing observers to use whichever language they feel most comfortable with. I hope this lowers any barriers to participation by making people comfortable sharing their observations in the language they prefer. However, as my main language is English, any observations in Traditional Chinese need to be translated, especially if there are additional comments.

Once data is collected, hopefully with photos and/or videos, it needs to be compiled into a similar format for analysis and any gaps in the data (i.e., missing location) need to be addressed if possible. Jellyfish need to be identified to the lowest taxonomic level, preferably species, using the available photos/videos and information provided. Only after the dataset has been compiled and all gaps addressed, can the data be used for analysis. Individual observations of new species records, for example, may require in-depth research into the existing literature or online resources. So, a great deal of time and effort is required to compile the observations into a complete dataset.

This is your first time submitting data to GBIF, and our curators and the Asia Regional Support Team have worked with you to get all of the data submitted there. How did you find this, and how much have you learnt in the process? And is it going to change the way you collect data in the future?

It has been quite a learning process to be a part of getting this data onto GBIF! A huge thank you to your curators and the GBIF Regional Support Team, everyone involved in learning how to best execute this process and get the final result publicly available. It certainly required a team of people to weave the many strands together, from what to put into the dataset, to correctly formatting it, to getting it onto GBIF. There were so many things I did not know at the start of this process and learned from everyone involved. Knowing what is required to get data onto GBIF, especially from the different information received from the data sources you mention above has prompted a rethink of how the data can be collected and compiled. I hope to implement these in future data processing.

How do you hope people will use this data? What sort of scientific questions does it help answer?

Jellyfish in general are an under-studied group of organisms and it would be great to know that my data is making a small contribution to illuminating a small part of our corner of the world. As the importance of jellyfish is increasingly recognized across the marine realm, it would be great to see this type of data used to affect not only our scientific understanding of local marine ecosystems, through jellyfish roles in food webs and ecosystem services, but also see this data used in educating the general public about these often-feared organisms as beautiful and essential components of the world’s oceans. It would be a key goal for this information to inform management practices in diverse sectors as fisheries, industry, tourism and recreation.

As is so often true in science, this data only engenders more questions. By discovering what jellyfish are present locally, it is only the beginning of much longer process of understanding what ecological roles they play, how they affect a broad swathe of the marine realm, and how their effects can be managed especially in the context of Hong Kong’s marine policy.

What’s next for the HKJP and what do you plan to do with future datasets?

As the project continues, I hope to continue to develop the bigger picture about what jellyfish are present in Hong Kong, improving on existing datasets. Countries around the South China Sea are known to have even greater jellyfish diversity, so we certainly do not know everything present in Hong Kong. I will keep promoting jellyfish to the general public to increase knowledge and acceptance of these fascinating creatures. I also hope to advocate for greater recognition of jellyfish and improved research opportunities from tertiary institutions and government bodies.

Find out more about the project in this video with John, and if you are interested in participating or learning more see the Hong Kong Jellyfish Project website: https://www.hkjellyfish.com/

And if you have citizen science data of your own that you need help getting into an appropriate permanent archive such as GBIF please also get in touch with the GigaScience Press Editors and curators.

References
Južnič-Zonta Ž et al. Mosquito alert: leveraging citizen science to create a GBIF mosquito occurrence dataset. GigaByte. 2022 May 30;2022:gigabyte54. https://doi.org/10.46471/gigabyte.54

Shimabukuro P et al. Bridging Biodiversity and Health: The Global Biodiversity Information Facility’s initiative on open data on vectors of human diseases. GigaByte. 2024 Apr 11;2024:gigabyte117. https://doi.org/10.46471/gigabyte.117

Soares FM et al.: Citizen science data on urban forageable plants: a case study in Brazil, GigaByte, 2024 https://doi.org/10.46471/gigabyte.107

Terenzini J et al. Jellyfish in Hong Kong: a citizen science dataset. GigaByte. 2024. https://doi.org/10.46471/gigabyte.125

Terenzini J, Fan Y, Liu M J, Falkenberg L J (2024). Jellyfish in Hong Kong: a citizen science dataset. Version 1.3. GigaScience Press. Occurrence dataset https://doi.org/10.15468/s4qwyk accessed via GBIF.org on 2024-05-20.

Publishing Citizen Science Data: Q&A with the Hong Kong Jellyfish Project

Scott Edmunds

Blog post tags