Reproducible and Collaborative Workflows

Using RStudio and GitHub

Goals

  • Understand the importance of reproducible workflows and version control
  • Learn how to set up a project in RStudio
  • Explore version control with GitHub and GitHub Desktop
  • Fork, edit, and publish a GitHub repository

Why reproducible workflows are important

  • Its good science
  • Improves uptake of methods
  • Ease of collaborations
  • Funders love Open Science

What makes a workflow reproducible?

  • Works on any machine
  • Yields identical results every time
  • Can be understood by externals (and future self)

Workspace Hygiene

Start from a clean slate!

Your workspace is your laboratory - keep it free of contamination!

Tools > Global Options > General

In the “Basic” tab

  • Unceck “Restore .RData into workspace at startup”
  • Set “Save workspace to .RData on exit” to “Never”

Workspace Hygiene

Absolute and Relative Paths

Your code should run on any machine!

# BAD PRACTICE: path and data does not exist on other machines
read.csv("C:/path/to/important/raw/data.csv")
# BAD PRACTICE: works only on one specific machine
setwd("C:/path/to/folder/that/only/exists/on/my/machine")
# BAD PRACTICE: requires manual work
# to run this script, go to
# Session -> Set Working Directory -> To Source File Location
# to set your working directory correctly

Introducing: RProjects

  • Project files that set working directory automatically after opening the project
  • Whole project folder can be passed between machines and people
  • Specific to RStudio

Working with RProjects

File > Open Project > Select RProject file

Work in Rstudio as usual

All paths are set relative to the .RProject file

Advanced RProjects

RProjects are critical infrastructure for R code

Version history

gitGraph
    commit id: "Initial commit"
    commit id: "Add data"
    commit id: "Add analysis script"
    commit id: "Update analysis script"

  • Tracked changes
  • Allows for rollbacks
  • Easy collaboration, but with potential conflicts
  • Software determines save points
  • No experimentation
  • Can lead to feature breakage

vs

Version control

gitGraph
    commit id: "Initial commit"
    commit id: "Add data"
    commit id: "Add analysis script"
    branch will
    commit id: "Add viz script"
    checkout main
    commit id: "Update analysis script"
    merge will
    

  • Tracked changes
  • Allows for rollbacks
  • Easier collaboration without direct conflicts
  • Users determine save points
  • Allows for experimentation
  • Prevents feature breakage

Version control and collaboration

Version control with git and GitHub

Repositories

https://github.com/willgearty?tab=repositories

Repositories (cont.)

https://github.com/palaeoverse/2025-CPEG-workshop

Forking a repository

You can easily make your own copy of a repository by “forking” it

Cloning your new repository

Now we’re going to “clone” this repo to our local machine using GitHub Desktop. You can also do this in RStudio, but we will use GitHub Desktop for this example.

Making a new R project

Now let’s make a new R project based on this cloned repository…

Git in RStudio

Making and committing changes

Now let’s make some changes to the repository:

  1. Open the README.md file in RStudio and add some text to it. Perhaps your name or today’s date. Make sure to save the file.
  2. You’ll now see that the README.md file has been modified in the Git panel in RStudio.
  3. Click the checkbox next to the README.md file in the Git panel to stage the changes.
  4. In the “Commit message” box, write a brief description of the changes you made (e.g., “Added my name to README”).
  5. Click the “Commit” button to commit the changes to your local repository.
  6. Click the “Push” button in the Git panel to upload your changes to GitHub.

Advanced GitHub

Some other topics that you can explore on your own:

  • Branching and Merging: Manage changes to your codebase with branches and merge them back into the main branch
  • GitHub Issues: Track bugs and feature requests (yes, this applies to research code!)
  • Pull Requests: Propose changes to a repository and collaborate with others
  • GitHub Actions: Automate workflows
  • GitHub Pages: Host websites directly from your GitHub repository