An intro to data visualization

by William Gearty

Presented at the
North American Paleontological Convention 2024

Schedule


10 - 10:15 AM: Theory of data visualization

10:15 - 10:45 AM: Practical data visualization

Theory of data visualization

  1. What is data visualization?
  2. Basic design principles (color & font)
  3. Picking a graph type
  4. Common mistakes
  5. File types

What is data visualization?

What is data visualization?


The representation of data through the use of graphs and figures.

Often the data is complex, but the representation should be approachable and easy-to-understand.

What is data visualization?

For example, here is some data:
(even this is a data visualization)

What is data visualization?

And here is a data visualization of that data:

What is data visualization?

Here’s another example. What data is being represented?


Basic design principles

Basic design principles


  • Color

  • Type (aka font)

  • Positioning (R takes care of most of this)

Color

Lots of colors exist!

Color

Picking colors

  • Value (lightness/darkness)

Color

Picking colors

  • Temperature (cool vs warm)

Color

Picking colors

  • Saturation (intensity of color)

Color

Picking colors

  • Palettes (collections of colors that work well together)

Color

Picking colors

  • Palettes (collections of colors that work well together)

Color

Declaring colors in R

Color

Getting HEX codes

Color

Or use premade color palette packages!

Color

Color accessibility is important!

The more color contrast the better!

Color

Don’t forget about color blindness!

Check your graphs with a color blindness simulator (e.g., ColourSimulations or ColorOracle).

Color

Your graphs should even be legible in grayscale.


Font

There are lots of fonts/types, too!

Font

We want easily readable fonts:

Font

Tips

  • One font is usually enough (max two)
  • Make sure font size is big enough (who is the audience?)
  • Use bold for emphasis, but avoid italics and underlines
  • Left-aligned text is most readable
  • Use size as a tool to hierarchize content
  • Use 1.5 line spacing for better readability

Picking a graph type

Graph type depends on data type

Common mistakes

  • Our brains cannot compare angles, so avoid pie charts

Common mistakes

  • Volumes and perspective can be tough too, so best to avoid 3D plots

Common mistakes

  • Never truncate your X/Y axes or zoom on one part of them without showing the overall pattern as well

Common mistakes

  • Don’t overcomplicate your graphs!
    • Is the main point of the graph clear?

Common mistakes

  • Don’t overcomplicate your graphs!
    • Think about how much data to show

Image and file types

Image types

Raster Vector
Scale-dependent Not scale-dependent
Large size files Smaller size files
Not easily editable Easily editable
Can have lots of detail Usually fewer details/textures
Can not be converted easily to vector files Can easily be converted to raster files
(e.g., made in Photoshop) (e.g., made in Illustrator)

File formats

You can export plots from R in many file formats (we’ll mostly use ggsave()):

File format Image type Notes
.jpg Raster Can’t have transparent parts
.png Raster Can also have transparent parts
.svg Vector Can edit in Inkscape/Illustrator
.pdf Vector Not actually an image file type, but can be used

Visualization in R

http://cran.nexr.com/web/views/Graphics.html

Visualization in R

R already has lots of visualization functionality built-in:

  • plot()
  • barplot()
  • hist()
  • boxplot()
  • axis()
  • legend()
  • lines(), segments(), rect(), text(), etc.

Visualization in R

Visualization in R

Many packages build on these ‘base’ graphics:

  • {plotrix}
  • {rgl}: for 3D interactive graphics
  • {gplots}
  • {scatterplot3d}: 3D scatterplots
  • {palaeoverse}!

Visualization in R

Then ‘grid’ graphics came along:

  • More complex layouts
  • Scaling is maintained on resizing
  • Nested graphs and more interactivity

Visualization in R

Many packages build on these ‘grid’ graphics:

  • {lattice}: trellis graphics
  • {vcd}: for categorical data
  • {ggplot2}: “grammar of graphics”
  • {hexbin}: hexagonal bins
  • {patchwork}: combine (ggplot2) plots
  • {deeptime}!

Visualization in R

Packages for spatial visualization:

  • {sf}: basic objects and methods for vector data
  • {terra}: basic objects and methods for raster data
  • {ggplot2}: works for plotting spatial data, too
  • {raster}: plotting raster data

Data visualization books


The Visual Display of Quantitative Information by Edward R. Tufte Better Data Visualizations by Jonathan Schwabish
Fundamentals of Data Visualization by Claus O. Wilke Building Science Graphics by Jen Christiansen

Practical data visualization

1. Data visualization using ggplot2

2. Data visualization of spatial data