GSA 2025 Workshop

Open Science, Collaboration, and Reproducibility in Paleontology

Saturday, October 18, 2025
8:00 AM - 5:30 PM

Henry B. González Convention Center

Welcome

Since the development of large paleontological datasets from the 1970s onwards, paleontologists have increasingly adopted computational approaches to address questions about the history of life on Earth. This initiated a “Golden Age” of paleontology, where extensive datasets of various formats are used to test macroevolutionary and macroecological hypotheses. In parallel, the broader scientific community has been pushing for science to become more transparent, equitable, collaborative, and reproducible under the umbrella of “Open Science”. This culminated in 2023 being designated as the “Year of Open Science” by the White House Office of Science and Technology Policy. This short course will bridge these two movements to introduce the tenets of Open Science and how they can be incorporated into existing and future paleontological research workflows. First, we will provide an introduction to collaboration, version control, and data storage via services including Git, GitHub, Zenodo, and FigShare. We will then build upon this foundation by exploring the R programming language. R is one of the most popular languages in the world of data science and has been widely adopted by the paleontological community to clean, analyze, and plot data. General familiarity with R allows users to expand the potential of their research and automate routine tasks. We will introduce a suite of R packages that have been designed to standardize and streamline various parts of paleontological workflows (e.g., data cleaning). As part of this, we will briefly introduce existing paleontological databases (e.g., Paleobiology Database) and how they can be accessed from within the R framework. Finally, we will discuss the use of visualizations and how they can be efficiently and effectively developed to increase the transparency and equitability of paleontological research. This short course will provide a great opportunity for attendees to work with different researchers and gain experience working collaboratively in R to generate reproducible research. Further, we hope that this short course will bring the community together to share resources, reach agreed standards, and improve reproducibility in paleontological research. We anticipate this short course will be of value to paleontologists of all career stages.

Arrival

The event starts at XXX on the XXX and will take place at XXX.

Schedule

Time Event
08:00 AM Welcome and introduction
08:30 AM Setting up a reproducible workflow (GitHub, GitHub Desktop, RStudio)
10:00 AM Coffee break
10:15 AM Data acquisition
10:45 AM Data Processing I: Data exploration and cleaning
12:00 PM Lunch break 🥪
1:30 PM Data Processing II: Data visualization and synthesis
3:15 PM Coffee break
3:30 PM Open science presentation/breakout (reporting, archiving, and publishing)
4:45 PM Closing remarks

Instructors

This edition of the Workshop event is organised and led by the following members and friends of the Palaeoverse team.

Will Gearty
Syracuse University, USA

Erin Dillon
Smithsonian Tropical Research Institute

Mark Nikolic
Stanford University

Brok Kokesh
University of California, Berkeley

Pedro M. Monarrez
Virginia Tech

Installation

Please ensure that you have the latest version of R for the workshop, which can be downloaded here. We also recommend installing the latest version of RStudio, which can be downloaded here. To minimize any installation issues during the workshop, please also install the following R packages:

install.packages()

Acknowledgements

This event is run by the Palaeoverse and supported by the Paleontological Society.