As the ecology community expands, it is now adopting new ways of making sense of the plethora of data produced from diverse approaches, including ocean research, eco-genomics, limnology, and macrosystems ecology, through more integrative means – improving our understanding of biology in a broader sense. This also presents new challenges, including data hosting and sharing to improve collaborations between multiple disciplines within ecology, and to improve reproducibility and data reuse.
We always emphasize that data sharing is essential to make research more reproducible and reusable in the broadest sense. In particular, camera trap data has been difficult to share, thanks to their extremely large size – previously, authors of the Serengeti camera trap dataset, themselves highlighted that there are no venues that can handle such large datasets – until now.
In new work published in GigaScience’s Data-Intensive ecology series the authors deployed several camera traps in the Carrizo Plain National Monument (San Joaquin Desert, California, USA), and managed to capture thousands images of several species, including Jack Rabbits, squirrels and the endangered blunt-nosed leopard lizard. This entire 248 Gb dataset is freely available in our repository, GigaDB. Here we ask co-authors, Taylor Noble and Project Lead, Chris Lortie (also Guest Editor of our Data-Intensive Ecology series) about why they’re so interested in ecology, and large-scale data sharing.
Can you tell us a bit more about yourselves, and the work you do at the Department of Biology at York University?
I am a Master’s student at York University. I study interactions between shrubs and lizards, specifically the blunt-nosed leopard lizard. I am interested in habitat use and behavior of the lizards in and around the shrubs.
What attracted you to work in integrative ecology, and what made you want to start using camera traps in your research?
I’ve always been interested in animals. My family keeps several camera traps on our land to see what is in the area so I had been exposed to camera traps prior to using them in the study.
People usually think of camera traps collecting data on elusive mammals and birds, so how did you end up using them to film shrubs in the desert? What were you hoping to capture by doing this?
Camera traps have been used to survey for all types of animals including reptiles. After reviewing the literature we decided they would be a good technique to use in Carrizo, allowing us to continuously survey for lizards as well as any other animal that was in the area.
What was so special about the Carrizo Plain National Monument that made you study the biodiversity of the San Joaquin Desert?
Carrizo Plain National Monument is the largest remaining patch of San Joaquin desert. So much of this area has been developed or disturbed in some way, this is one of the few places you can see what California was like in the past. For many San Joaquin species like leopard lizards and kit fox, this is the largest piece of habitat they have left.
In particular, you had an aim observe the blunt-nosed leopard lizard – why are you so interested in this species? Are there challenges capturing footage of cold blooded species like reptiles? Before you set up your cameras, how likely did you think it would be to capture footage of them?
As an endangered species the blunt-nosed leopard lizard needs all the help it can get to survive as a species. Reptiles have been captured by camera traps in several other studies, and other members of our lab have captured images of them before so we were not worried about the technique. The real question was how large was the population of lizards in the area. We knew there were leopard lizards in the area but didn’t know how many. We thought we would get images but were sure how many.
The amount of data collected reached over 400,000 camera trap images and about 250 Gb of data once compressed. In your author video you say it takes a long time to process these types of images, so can you explain a little what this process currently is? Once you have the data processed, what are the challenges been in analyzing it?
The process is very simple: look at the picture and find the tan lizard against the tan background. The challenging part is how long it takes. 400,000 images takes a long time to go through and you have to take your time. You don’t want to miss any animals and most of the species in the area are pretty cryptic. Once you have determined what images are positive hits you have presence/absence data for your species. Depending on the design of your study and what other data you have collected you can look at habitat use, species distribution, and in some cases behavior.
What about the challenges in sharing it? Other attempts to share large-scale animal trap datasets had complained that there are no archiving systems available for storing the raw images, but this example shows it is possible. What do you hope others will do this data?
The challenges in sharing it are making sure the images are organized in a fashion that is understandable to others, and the size of the files, which affects upload time (long) and storage needed (a lot).
We hope this data will be useful to anyone studying similar species or doing research in the same area. Anything that makes this dataset’s findings more useful or helps to protect these species and this area is welcome.
Citizen science volunteers have been a useful resource for processing the data from these types of studies, particularly popular via crowdsourced science projects on the Zooniverse research portal such as Penguin Watch and Snapshot Serengeti. As there is an explosion of these types of studies and the resulting data, do you think there is a saturation point and limit to what can be done from the public in this area?
If the crowdsourcing project is done well then citizen science volunteers can be extremely helpful. However, depending on time, and training these projects can be limited. Identifying large mammals versus insects from an image are very different. I would say that crowdsourcing can be very useful but can’t be relied upon in all situations.
Seeing all the efforts people are putting into Pokemon Go, is gamification part of the answer to keep people engaged and interested in this data?
This can be useful to a certain point as long as the quality of the data is maintained.
Any thoughts on what people working on automatic processing algorithms could do with this camera trap data?
We’d love to see them find animals in our pictures for us or use our dataset as practice. I am very interested in the development of this technology but not being a computer guy or having any idea how automatic processing algorithms work that’s all I can say.
Where do you think the field of integrative ecology is moving as use of camera traps grow popular?
I think we’re going to see more big open science projects like this one, whether they are imagery datasets or other types of surveying. As they get more and more common, it will become easier to tie new projects into them that will increase the value of both the new and old datasets. We also suspect that many novel uses for cameras will continue to evolve. Imagery is a precious form of evidence, and there is every reason to expect that machine learning and other developments in data science and programming will feedback into ecology and other disciplines if we archive and share our data streams. We will also likely deploy capture devices more extensively in natural systems, form more extended sensor webs, and measure new species.
What are your thoughts on data sharing within the ecology community – is it something of interest or are people still wary about it?
I think it is something people are interested in. People may wait until they publish and make the dataset available with it but more and more data is being shared. It helps everyone, someone who is interested in your dataset is likely interested in your papers as well. These concerns are generally evaporating. Synthesis is so important in every discipline of research. Ecology included.
Here Taylor shares his thoughts on the re-use of their unique dataset.