An ABS2024 Workshop on Data Past, Present and Future Concludes: Lots of Work for our Animal Behavior Data Future
On June 25, 2024, we conducted a pre-conference workshop on why, how and where to preserve data generated in the course of behavioral research. The motivation for this workshop was several-fold. The global challenges of changing environments have made documenting past conditions increasingly critical for understanding current effects or projecting the future; thus there is renewed interest in data sets from years ago. Behavioral biologists join ecologists, conservation biologists, and many others in wrestling with the problems of finding, rescuing, and interpreting historic data sources that may originate from different socio-cultural contexts. At the same time, we must manage the increasingly varied forms of data we now produce. Behavioral research poses special and unsolved challenges both in leveraging past data and in making our current results shareable, accessible and re-useable in fifty years.
First, we present an overview of the workshop, especially the lively discussion that ensued in its second half. But beyond that, we hope to highlight the real outcome of the workshop: We, as behavioral biologists, must and can take an active part in solving our data preservation problems. As established scientists, newly starting researchers, and teachers mentoring the students who will continue after us, we have a responsibility to participate, motivate and even innovate in producing these solutions.
Our workshop was divided into two parts: a more formal presentation1, the goal of which was to highlight the different aspects of the “future-proofing” problem and assess the current state of solutions ; and an informal, participant-driven discussion2 of particular problems together with the strengths and weaknesses of the current options for addressing these problems. We briefly summarize points from these two halves of the workshop here. We then conclude by highlighting special challenges for behavioral biologists and making a few broad suggestions for how the Society as a unit might play a part in future-proofing and reusing our animal behavior data.
The workshop presentation began with “WHICH Data” or an overview of the types of data that behavioral biologists are likely to produce in different research programs. These include experimental results to be sure, but also field observations, audio/vibrometric recordings, video/rich media, as well as biometrics and genetic samples on individual subjects, environmental data or context as part of specimen collection. Basically, archivable data is generated at nearly every stage of the collection and analysis process, and which data will be most valuable in future is not predictable. Throughout the workshop, we emphasized the FAIR principles3 that ideally guide all data archiving today, i.e., once archived or published, data should be Findable, Accessible, Interoperable and Reusable.
We then addressed the problem of WHERE to put any data set, a discussion of considerations for choosing where to deposit your data. These include what data to save, which formats are most likely to be usable in the future, and how to treat reused data from other researchers. We emphasized that the FAIR principles of Findability and Accessibility are important in where data is held, as well as considerations of cost, longevity, and capacity. We evaluated several classes of digital storage options, ranging from university repositories to large, established ones like Dryad and then showed some informal “search test” results that suggest that Findability is already a big problem for data sets in existing repositories.
Next were some nuts and bolts of HOW data could be more Findable, especially the importance of metadata and its role in Findability and Reusability. Metadata include all the information needed for anyone to understand each data type one deposits. The construction of metadata adequate to support this goal is one area where researchers and repositories need to collaborate. We highlighted the role of controlled vocabularies and ontologies in annotating and creating metadata that allow users to interpret the variables within a data set and can also be used as keywords to allow others to identify the contents of the dataset. We finished the presentation with a discussion of issues related to rich media (e.g., video, audio), ranging from the best long-term formats to the lack of media repositories. There are various format choices, some to be recommended over others currently. We noted that researchers should not ignore sources of help, including the advisory role your local (digital) librarian can play. Whatever repository is chosen, appropriate metadata are particularly important to identify their main content. At the same time, video or audio are “raw” data that could be reused in ways very different from why they were collected…if their contents and context are well documented.
After a break, the participants chose to discuss topics as a committee of the whole. After a vote, we selected three topics which clearly focused on the future as much as current solutions:
- How should or can we prepare for the future use of AI?
The discussion of future implications of AI in extracting data from images and video focused on the variability of animal behavior (between and within species or even studies) as well as locations and contexts. Participants emphasized that we need to offer it well-documented but varied material on which to learn, and a theme of “save everything” emerged. The utility of AI will depend on our own consistency and role in training. - What incentives and social changes will motivate our community towards responsible preservation of data of all kinds, especially more FAIR data preservation?
The problem of bringing about social change in researcher behavior was perceived as related to both the stressors on researchers (publication pressure) and inadequate training or guidelines starting with undergraduate courses but also assistance in data management for beginning graduate students and early-stage researchers. Journals can play a role and possible ways that Animal Behaviour could assist were discussed, e.g. by providing guidelines, possibly standardized data sheets, etc for both submitting researchers and reviewers to use. Social change also would speed up with undergraduate training for the goals of this research world–use of ontologies or vocabularies, goals of standardization when collaborating, etc. The incentives include registered protocols and publication assurance as well as increased collaboration opportunities when data sets were going to be small. - What recommendations can we make for behavior researchers just starting out even as early as undergraduate and graduate training?
The issues for researchers starting out included many of the incentives and social change issues, but stressed opportunities early (as grad students) to learn the how-to, best practices of whole-lab data management–everything from notebook digitizing and wise use of cloud deposits to standardizing or small but critical decisions for taking duplicate drives in separate vehicles in the field!
In short, our discussion revealed that the participants, most of them graduate students, were deeply concerned about how to solve the problems highlighted by the workshop. They–we all–know we need help with this goal of FAIRly future-proofing, sharing and repurposing our community’s data–and we want to stress that our participants truly had lofty goals. As workshop organizers who spent much time combing the literature to add to what we already knew, we see the need to find solutions beyond the generosity and time investment of researchers who are already pushed to accomplish so much individually. Progress will require collaborations with societies, publishers, funders, repositories, and companies or governments.
We organizers and, we believe, participants came away from the workshop both optimistic and humbled--optimistic about how concerned and energized we all are about keeping our data safe in ways that make it available now and in the future to our colleagues and our future selves; humbled by how much there is to be done to that end. The problems are, in many ways, general to all fields of knowledge. But, speaking now as the organizers of the workshop and leaning on the participants’ discussion and feedback from a post-workshop survey, there seem to be two big challenges specifically for Animal Behavior or Behavioral Biology writ large. The first is the development of stable repositories and maintenance strategies for our rich media. The second related challenge is to produce and require a standardized set of terms and methods of using them that will facilitate finding data sources that relate to specific research questions. Importantly, the Animal Behavior Society was perceived as potentially playing a guiding role. It could be the starting point for young researchers seeking training in best practices for their own data, it could overtly motivate and reward efforts to share and reuse data, and its journal could model ways in which editors and reviewers could help authors do the best possible job of preserving their data.
1 See “Workshop presentation..” in https://drive.google.com/drive/folders/1NPo27auhrM3gLKBTpW86AVR0jniEIkZ_?usp=sharing
2 See “Discussion summary….” in https://drive.google.com/drive/folders/1NPo27auhrM3gLKBTpW86AVR0jniEIkZ_?usp=sharing
3. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18