Resources for dplyr and ggplot2

Today we introduced the R packages 'dplyr' and 'ggplot2'. We only had time for a few brief demos, but these packages are very powerful and you may be using them quite a bit!

More about dplyr

dplyr is useful for manipulating data. Most of the time you'll be using one of 5 main functions:

filter(): subsets rows based on a condition

select(): selects specific columns

mutate(): creates a new column

group_by(): groups data by some variable

summarize(): returns new data frame with specified summary statistics

These aren't the only functions in the dplyr package, but you can get pretty far with your data manipulation with only these 5 functions!

Some dplyr resources:

Main dplyr webpage

dplyr cheatsheet (includes some functions from package tidyr)

Tutorial from RStudio

Tutorial with biological data

More about ggplot2

You can make plots using base R (i.e. no packages loaded), but sometimes you may want to make certain plots that are a challenge in base R. ggplot2 is the main graphics package people use in R: it's very powerful and you can make great visualizations with it.

To best understand ggplot2, you'll need to start thinking in terms of "the grammar of graphics." It can take a while to get a hang of the ggplot2 syntax, particularly the aesthetics, or aes(), portion, but with it you can really fine tune your graphics.

Some ggplot2 resources:

Main ggplot2 website

ggplot2 cheatsheet

Tutorial using social science data

Tutorial on making different kinds of plots

Graphics gallery (with code)

Another graphics gallery (with code)

Welcome to the tidyverse!

This is also our first introduction to the "tidyverse" - R packages for data science that are designed to work well together. You can learn more about the the idea and implementation of the R tidyverse here.