Great Lakes of Data: More with Pat Soranno on the challenges of data-integration in ecology

Ecology data in the 21st Century: multiscaled, and with “big” potential
At GigaScience we are always promoting and finding new ways to foster open data, open science and reproducibility. Our broad scope covers the entire spectrum of life and biomedical sciences – also encompassing ecology and the numerous research communities within it. Macrosystems ecology is an emerging discipline which involves data-intensive methods of viewing ecosystems integrating the finest to the broadest scale. The ecology community has considerable site-based data for individual or groups of ecosystems, these data are disparate with different formats, and remain widely inaccessible. This also applies to data at a broader scale, such as geospatial data of land and water that originate from different sources at different temporal and spatial resolutions. Broader-scale, open science ecology methods will be needed to tackle global conservation problems.

This week, in a new paper published by Pat Soranno and colleagues in our new “Data Intensive Ecology” series present a major step forward for reproducible research and public data-sharing in ecology. In order to help harmonise and integrate the many disparate and different types of ecological data and foster more reproducible research in the field, the authors have created their new database, LAGOS (LAke multi-scaled GeOSpatial and temporal database). Collecting hundreds of heterogenous datasets, LAGOS includes data compiled from tens of thousands of lakes across 17 states of the U.S., and includes combinations of lake chemistry, with multi-thematic geospatial data including: climate, atmospheric deposition, land use/cover, hydrology, geology, and topography data measured across a range of spatial and temporal extents. In their new paper the LAGOS team present a description of the general approach to data-integration in the environmental sciences, the challenges and solutions for data integration, and the detailed technical documentation for LAGOS.

In a recent Q&A in the On Biology blog, Pat Soranno and her collaborators share their thoughts on this work, and give some general insights into macrosystems ecology research. Having a more data-centric audience and focus in GigaBlog, we expand a little more in a follow-up Q&A with a few more questions on some of the practical details and challenges working with this data.

IMG_2008-crop_000Pat is a Professor at Michigan State University’s Department of Fisheries and Wildlife. As a landscape limnologist, her research involves integrating and studying multi-scaled spatial and temporal drivers of aquatic chemistry and biology. She is also a macrosystems ecologist and a pioneer working with several multidisciplinary groups to develop concepts, approaches, and datasets needed to foster the development of data-intensive approaches in ecology.

Can you tell us a bit about the types of data collected? As ecology has traditionally composed of small, heterogenous datasets, how is it being scaled up in the “big data” era to look regional, continental and global-scale systems?
We think ecologists are lucky in that there are quite a few ways that we can collect or compile data to try to study broad-scale systems. In fact, we need more than one way to do this type of research because each method has its strengths and weaknesses, and the best method depends on the research question. For example, data have been collected across the globe from satellites for over 30 years, and new sensors are being launched, with even better capabilities for measuring the Earth. Another option is for different research teams to work together as a ‘network’, using common methods for either observing ecosystems (using sensors placed on the ground in different sites) or conducting the same experiment, across regions or continents, allowing for a more complete picture of the what controls ecosystem variation globally. Yet another option is to conduct ‘big science’ through large government programs, such as the US-NEON program to understand ecological change at the continental scale. Finally, another option is to collate many small, heterogeneous datasets that ecologists have collected over time to build a database that spans regional to continental scales and to integrate those datasets with other datasets that are available at continental scales, such as from remotely sensed imagery. This latter option is the approach that we took in the research project described here. However, depending on the research question, many of these options are valid approaches for studying macrosystems from regions to continents and the globe.

Because the ecology community is so broad and yet to get a whole picture of an ecosystem – researchers need to be able to integrate and share data to improve reproducibility – is there a need for more databases such as LAGOS and open-science/open data approaches?
Absolutely. We have learned a lot of important lessons by building LAGOS, one of them being that it takes a lot of thought at the very beginning of the effort to make a database that is reproducible and usable by other scientists.

We also knew early on that we wanted the database to be extensible – i.e., allowing others to build onto it with new data in a variety of different ways. In this way, other researchers can use our database to answer research questions that we hadn’t yet thought of, and we can speed up the pace of science by providing others with the work that we put into LAGOS as the stepping off point for new research questions and directions. In fact, we are also planning on continuing to add to LAGOS ourselves, to expand LAGOS to other regions and continents and to eventually create a network of such large, integrated lake databases to address continental to global questions for freshwater systems.

We are fortunate in that there are quite a few other lake scientist groups and networks that have similar visions and that are complementary to our efforts that we hope to work with, such as GLEON and GloboLakes.

Screen shot 2015-07-01 at 4.09.37 PM

How has the greater limnology community always been pro open science/open data? Has it been difficult to get people to contribute data in an open manner? What are the incentives for people to contribute?
As with all disciplines, there is a segment of limnologists who are totally on board with sharing their data and actively do so; there are some who are opposed to these practices for a variety of reasons; and there are those who have not thought much about open science because they haven’t needed to since the status-quo is closed science. But, science and publishing are changing, with sharing of science ‘products’ becoming more commonplace. So, it is likely that scientists’ attitudes about such behaviors are changing too since data sharing and open-access publications (and open-science more generally) are on the rise.

There are many benefits of data sharing for science and society, such as an increased ability to address many grand-challenges in science, increased reproducibility, increased speed of scientific progress, and as some of my colleagues and I have argued in a recent article, increased scientific inclusivity. But, currently most scientists are rewarded professionally mainly for the research articles that they lead or co-lead. Individual scientists are not often rewarded when they share their data or when other scientists use their data (although several research articles have shown that sharing one’s data results in higher citation rates for the scientist who shared their data). Changes in institutions and reward structures are needed to fix this disconnect between the above benefits of data sharing for the broader scientific and societal good, and the perceived lack of benefits of data sharing for individual researchers’ careers.

What type of things do you hope others will do with this data?
There are a lot of different potential directions that we think others can take LAGOS. For example, we started by building a database that focuses mainly on lake nutrients and water chemistry. Other researchers may extend LAGOS by integrating biotic data and asking questions about how complex interactions between lake nutrients, land use, and climate change may affect species distributions. Others might be interested in doing a comparative study between our study region and one on another continent. We think that there are a lot of other potential opportunities as well.

References

Heffernan, JB et al. Macrosystems ecology: Understanding ecological pattern and process at continental scales. Frontiers in Ecology and the Environment 2014, 12(1). http://www.esajournals.org/doi/full/10.1890/130017

Soranno, PA et al. Cross-scale interactions: Quantifying multi-scaled cause-effect relationships in macrosystems. Frontiers in Ecology and the Environment 2014, 12(1). http://www.esajournals.org/doi/full/10.1890/120366

This and our Ocean Sampling Day consortium paper are our first papers in our new “Data Intensive Ecology” series edited by Christopher Lortie, Noah Lottig, Mark Schildhauer and Xin Zhou. Please let us know if you have similar work to highlight, and keep checking the series page for more big data ecology papers to come: https://academic.oup.com/gigascience/pages/data_intensive_ecology

Slide1