Introduction to R - A Roadmap

Author: Alli N. Cramer

This is an attempt to orient you around R. To give you a roadmap of sorts to help you find your way when learning R. The following text will show you:

  • What R is
  • What R studio is
  • How to do basic math in R
  • How to add, save, and bring in data
  • How to read basic R syntax
  • How to make a linear model and a basic plot
  • How to add packages
  • What a dplyr pipe is
  • How to plot with ggplot

This is, of course, just the tip of the iceberg as far as R goes but hopefully this orients you so that you can follow along in future R sessions.

Once you've mastered the skills in this post I suggest you download and do the Lab 1 Walkthrough using the Floral Diversity Dataset created by our very own Rachel Olsson. This is an in-depth walk through and a great way to familiarize yourself with R terms and capabilities.

What is R?

R is a free software for stats. It also is useful for data cleaning and making graphics. R runs in what is called the "console". If you open R (not in R studio) you will see a simple box for code input, perhaps with some red or blue colors. This is basic R.

What is RStudio?

RStudio is a wrapper for R, or an IDE (integrated development environment). It runs the R Console within it, normally at the bottom left. The top right is the script pane.The Environment on the top right shows you what data R is remembering and keeping track of. The bottom right pane has help (super useful), shows plots, lets you point and click to install packages, and more.

The Scripte Pane lets you write scripts, but not necessarily have R do anything until you send the code to the R Console. To send information from scripts to the console, you can highlight the script and press the green "run" arrow at the top right of the script pane. You can also use some short cuts to sent code back and forth from the console and script pane :

Some shortcuts for RStudio:

Highlight the code chunk, or run whatever line your cursor is on.

  • ctrl + Enter (shortcut to run code)
  • ctrl + 1 (go to script pane)
  • ctrl + 2 (go to R Console)
  • alt + - (shortcut to make "<-" symbol)

Using Basic R

R as a calculator

First, lets explore R like any simple computer program - as a calculator. If we simply type "2+3" into the script pane and then send it to the R Console:

12+3
2
3## [1] 5

We can see that R understood the code, and gave us the right answer. We can also see however, by looking at the Environment pane on the top right, that R doesn't remember this data. To do that we will need create an object (something for R to keep track of) and assign a value.

Assign values to objects

To assign a value we use a "<-" symbol and a name. lets use x and y. We can also assign a value to the answer, lets use z:

1#use alt + - for arrow shortcut
2x <- 2
3
4y <- 3
5
6z <- x + y

We can see that R now keeps track of these objects by looking in the environment:

Working with Data

Using R as a calculator is great, but lets work with some data. We will start by working with built in data. This is data that comes pre-loaded in R to assist teaching. We will work with the "iris" data set. Other good built in data sets, which work with this lesson, are "AirPassengers" and "ChickWeight". There is also the BACups data set, a data set of bronze age cup dimensions. To see the iris data set, type "iris".

  1#built in data
  2iris
  3
  4##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
  5## 1            5.1         3.5          1.4         0.2     setosa
  6## 2            4.9         3.0          1.4         0.2     setosa
  7## 3            4.7         3.2          1.3         0.2     setosa
  8## 4            4.6         3.1          1.5         0.2     setosa
  9## 5            5.0         3.6          1.4         0.2     setosa
 10## 6            5.4         3.9          1.7         0.4     setosa
 11## 7            4.6         3.4          1.4         0.3     setosa
 12## 8            5.0         3.4          1.5         0.2     setosa
 13## 9            4.4         2.9          1.4         0.2     setosa
 14## 10           4.9         3.1          1.5         0.1     setosa
 15## 11           5.4         3.7          1.5         0.2     setosa
 16## 12           4.8         3.4          1.6         0.2     setosa
 17## 13           4.8         3.0          1.4         0.1     setosa
 18## 14           4.3         3.0          1.1         0.1     setosa
 19## 15           5.8         4.0          1.2         0.2     setosa
 20## 16           5.7         4.4          1.5         0.4     setosa
 21## 17           5.4         3.9          1.3         0.4     setosa
 22## 18           5.1         3.5          1.4         0.3     setosa
 23## 19           5.7         3.8          1.7         0.3     setosa
 24## 20           5.1         3.8          1.5         0.3     setosa
 25## 21           5.4         3.4          1.7         0.2     setosa
 26## 22           5.1         3.7          1.5         0.4     setosa
 27## 23           4.6         3.6          1.0         0.2     setosa
 28## 24           5.1         3.3          1.7         0.5     setosa
 29## 25           4.8         3.4          1.9         0.2     setosa
 30## 26           5.0         3.0          1.6         0.2     setosa
 31## 27           5.0         3.4          1.6         0.4     setosa
 32## 28           5.2         3.5          1.5         0.2     setosa
 33## 29           5.2         3.4          1.4         0.2     setosa
 34## 30           4.7         3.2          1.6         0.2     setosa
 35## 31           4.8         3.1          1.6         0.2     setosa
 36## 32           5.4         3.4          1.5         0.4     setosa
 37## 33           5.2         4.1          1.5         0.1     setosa
 38## 34           5.5         4.2          1.4         0.2     setosa
 39## 35           4.9         3.1          1.5         0.2     setosa
 40## 36           5.0         3.2          1.2         0.2     setosa
 41## 37           5.5         3.5          1.3         0.2     setosa
 42## 38           4.9         3.6          1.4         0.1     setosa
 43## 39           4.4         3.0          1.3         0.2     setosa
 44## 40           5.1         3.4          1.5         0.2     setosa
 45## 41           5.0         3.5          1.3         0.3     setosa
 46## 42           4.5         2.3          1.3         0.3     setosa
 47## 43           4.4         3.2          1.3         0.2     setosa
 48## 44           5.0         3.5          1.6         0.6     setosa
 49## 45           5.1         3.8          1.9         0.4     setosa
 50## 46           4.8         3.0          1.4         0.3     setosa
 51## 47           5.1         3.8          1.6         0.2     setosa
 52## 48           4.6         3.2          1.4         0.2     setosa
 53## 49           5.3         3.7          1.5         0.2     setosa
 54## 50           5.0         3.3          1.4         0.2     setosa
 55## 51           7.0         3.2          4.7         1.4 versicolor
 56## 52           6.4         3.2          4.5         1.5 versicolor
 57## 53           6.9         3.1          4.9         1.5 versicolor
 58## 54           5.5         2.3          4.0         1.3 versicolor
 59## 55           6.5         2.8          4.6         1.5 versicolor
 60## 56           5.7         2.8          4.5         1.3 versicolor
 61## 57           6.3         3.3          4.7         1.6 versicolor
 62## 58           4.9         2.4          3.3         1.0 versicolor
 63## 59           6.6         2.9          4.6         1.3 versicolor
 64## 60           5.2         2.7          3.9         1.4 versicolor
 65## 61           5.0         2.0          3.5         1.0 versicolor
 66## 62           5.9         3.0          4.2         1.5 versicolor
 67## 63           6.0         2.2          4.0         1.0 versicolor
 68## 64           6.1         2.9          4.7         1.4 versicolor
 69## 65           5.6         2.9          3.6         1.3 versicolor
 70## 66           6.7         3.1          4.4         1.4 versicolor
 71## 67           5.6         3.0          4.5         1.5 versicolor
 72## 68           5.8         2.7          4.1         1.0 versicolor
 73## 69           6.2         2.2          4.5         1.5 versicolor
 74## 70           5.6         2.5          3.9         1.1 versicolor
 75## 71           5.9         3.2          4.8         1.8 versicolor
 76## 72           6.1         2.8          4.0         1.3 versicolor
 77## 73           6.3         2.5          4.9         1.5 versicolor
 78## 74           6.1         2.8          4.7         1.2 versicolor
 79## 75           6.4         2.9          4.3         1.3 versicolor
 80## 76           6.6         3.0          4.4         1.4 versicolor
 81## 77           6.8         2.8          4.8         1.4 versicolor
 82## 78           6.7         3.0          5.0         1.7 versicolor
 83## 79           6.0         2.9          4.5         1.5 versicolor
 84## 80           5.7         2.6          3.5         1.0 versicolor
 85## 81           5.5         2.4          3.8         1.1 versicolor
 86## 82           5.5         2.4          3.7         1.0 versicolor
 87## 83           5.8         2.7          3.9         1.2 versicolor
 88## 84           6.0         2.7          5.1         1.6 versicolor
 89## 85           5.4         3.0          4.5         1.5 versicolor
 90## 86           6.0         3.4          4.5         1.6 versicolor
 91## 87           6.7         3.1          4.7         1.5 versicolor
 92## 88           6.3         2.3          4.4         1.3 versicolor
 93## 89           5.6         3.0          4.1         1.3 versicolor
 94## 90           5.5         2.5          4.0         1.3 versicolor
 95## 91           5.5         2.6          4.4         1.2 versicolor
 96## 92           6.1         3.0          4.6         1.4 versicolor
 97## 93           5.8         2.6          4.0         1.2 versicolor
 98## 94           5.0         2.3          3.3         1.0 versicolor
 99## 95           5.6         2.7          4.2         1.3 versicolor
100## 96           5.7         3.0          4.2         1.2 versicolor
101## 97           5.7         2.9          4.2         1.3 versicolor
102## 98           6.2         2.9          4.3         1.3 versicolor
103## 99           5.1         2.5          3.0         1.1 versicolor
104## 100          5.7         2.8          4.1         1.3 versicolor
105## 101          6.3         3.3          6.0         2.5  virginica
106## 102          5.8         2.7          5.1         1.9  virginica
107## 103          7.1         3.0          5.9         2.1  virginica
108## 104          6.3         2.9          5.6         1.8  virginica
109## 105          6.5         3.0          5.8         2.2  virginica
110## 106          7.6         3.0          6.6         2.1  virginica
111## 107          4.9         2.5          4.5         1.7  virginica
112## 108          7.3         2.9          6.3         1.8  virginica
113## 109          6.7         2.5          5.8         1.8  virginica
114## 110          7.2         3.6          6.1         2.5  virginica
115## 111          6.5         3.2          5.1         2.0  virginica
116## 112          6.4         2.7          5.3         1.9  virginica
117## 113          6.8         3.0          5.5         2.1  virginica
118## 114          5.7         2.5          5.0         2.0  virginica
119## 115          5.8         2.8          5.1         2.4  virginica
120## 116          6.4         3.2          5.3         2.3  virginica
121## 117          6.5         3.0          5.5         1.8  virginica
122## 118          7.7         3.8          6.7         2.2  virginica
123## 119          7.7         2.6          6.9         2.3  virginica
124## 120          6.0         2.2          5.0         1.5  virginica
125## 121          6.9         3.2          5.7         2.3  virginica
126## 122          5.6         2.8          4.9         2.0  virginica
127## 123          7.7         2.8          6.7         2.0  virginica
128## 124          6.3         2.7          4.9         1.8  virginica
129## 125          6.7         3.3          5.7         2.1  virginica
130## 126          7.2         3.2          6.0         1.8  virginica
131## 127          6.2         2.8          4.8         1.8  virginica
132## 128          6.1         3.0          4.9         1.8  virginica
133## 129          6.4         2.8          5.6         2.1  virginica
134## 130          7.2         3.0          5.8         1.6  virginica
135## 131          7.4         2.8          6.1         1.9  virginica
136## 132          7.9         3.8          6.4         2.0  virginica
137## 133          6.4         2.8          5.6         2.2  virginica
138## 134          6.3         2.8          5.1         1.5  virginica
139## 135          6.1         2.6          5.6         1.4  virginica
140## 136          7.7         3.0          6.1         2.3  virginica
141## 137          6.3         3.4          5.6         2.4  virginica
142## 138          6.4         3.1          5.5         1.8  virginica
143## 139          6.0         3.0          4.8         1.8  virginica
144## 140          6.9         3.1          5.4         2.1  virginica
145## 141          6.7         3.1          5.6         2.4  virginica
146## 142          6.9         3.1          5.1         2.3  virginica
147## 143          5.8         2.7          5.1         1.9  virginica
148## 144          6.8         3.2          5.9         2.3  virginica
149## 145          6.7         3.3          5.7         2.5  virginica
150## 146          6.7         3.0          5.2         2.3  virginica
151## 147          6.3         2.5          5.0         1.9  virginica
152## 148          6.5         3.0          5.2         2.0  virginica
153## 149          6.2         3.4          5.4         2.3  virginica
154## 150          5.9         3.0          5.1         1.8  virginica

Iris is relatively small dataset, but it is large enough to be annoying to look at. Imagine if it was even bigger! To deal with this, we can look at the top or the bottom of the data with head() and tail() commands.

 1#look at top of data(defaults to first 6 rows)
 2head(iris)
 3
 4##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
 5## 1          5.1         3.5          1.4         0.2  setosa
 6## 2          4.9         3.0          1.4         0.2  setosa
 7## 3          4.7         3.2          1.3         0.2  setosa
 8## 4          4.6         3.1          1.5         0.2  setosa
 9## 5          5.0         3.6          1.4         0.2  setosa
10## 6          5.4         3.9          1.7         0.4  setosa
11
12#look at tail of data (defaults to last 6 rows)
13tail(iris)
14
15##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
16## 145          6.7         3.3          5.7         2.5 virginica
17## 146          6.7         3.0          5.2         2.3 virginica
18## 147          6.3         2.5          5.0         1.9 virginica
19## 148          6.5         3.0          5.2         2.0 virginica
20## 149          6.2         3.4          5.4         2.3 virginica
21## 150          5.9         3.0          5.1         1.8 virginica
22
23#can change the number of rows we see
24head(iris, n = 10)
25
26##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
27## 1           5.1         3.5          1.4         0.2  setosa
28## 2           4.9         3.0          1.4         0.2  setosa
29## 3           4.7         3.2          1.3         0.2  setosa
30## 4           4.6         3.1          1.5         0.2  setosa
31## 5           5.0         3.6          1.4         0.2  setosa
32## 6           5.4         3.9          1.7         0.4  setosa
33## 7           4.6         3.4          1.4         0.3  setosa
34## 8           5.0         3.4          1.5         0.2  setosa
35## 9           4.4         2.9          1.4         0.2  setosa
36## 10          4.9         3.1          1.5         0.1  setosa

Changing the Data

We're going to add a column to iris, so the first thing we need to do is rename iris. This is a data best practice (NEVER CHANGE RAW DATA!) and will let R keep track of what we're doing. In this case lets call our new data "ire".

1#assign the built in data to a new object name
2#so that we can change things about it
3#without changing the underlying data
4ire <- iris

Now, lets add an area column. R syntax for columns is the "$". The dataset name is on the left, the column name on the right. In this case, we are going to make an Area column. First, we tell R that we are making an area column by typing the new column information on the left of the assign arrow, then the math to make the column on the right.

1#adding and "area" column to iris
2# $ references a column in a data frame
3
4ire$Sepal.Area <- ire$Sepal.Length * ire$Sepal.Width

Lets look at these new data:

 1#top few rows
 2head(ire)
 3
 4##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Area
 5## 1          5.1         3.5          1.4         0.2  setosa      17.85
 6## 2          4.9         3.0          1.4         0.2  setosa      14.70
 7## 3          4.7         3.2          1.3         0.2  setosa      15.04
 8## 4          4.6         3.1          1.5         0.2  setosa      14.26
 9## 5          5.0         3.6          1.4         0.2  setosa      18.00
10## 6          5.4         3.9          1.7         0.4  setosa      21.06
11
12#structure of the dataset
13str(ire)
14
15## 'data.frame':    150 obs. of  6 variables:
16##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
17##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
18##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
19##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
20##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
21##  $ Sepal.Area  : num  17.8 14.7 15 14.3 18 ...

Saving Data

Now we will save the data. First, we need to determine where the file will go. By default it will go to the current directory, so lets check where we currently are, then change it to where we want it to go:

1#where is the file going!?!?!?
2
3#where am I currently?
4getwd()
5
6#change location to new file path, if necessary :
7
8setwd("your_file_path/full_path")

Now, we save the data as a .csv file. For any R command, we can either specify the exact values of variables using the equals sign, or we can type information in the assumed order. See the help section for various commands to learn the assumed order. Knowing that this is possible can help you follow along when watching or reading other's R code.

1#write out data
2write.csv(x = ire, file = "Irisdata.csv", row.names = FALSE)
3
4#writing out the data using the same command, but the "assumed order", so no "x = " etc. needed
5write.csv(ire, "Irisdata2.csv")

To bring in the data, we use the read.csv() command. Remember to assign the data to an object, or R won't be able to do anything with it!

 1#bringing in data
 2#we need to assign to an object so we can use the dataset
 3dat <- read.csv("Irisdata.csv")
 4
 5head(dat)
 6
 7##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Area
 8## 1          5.1         3.5          1.4         0.2  setosa      17.85
 9## 2          4.9         3.0          1.4         0.2  setosa      14.70
10## 3          4.7         3.2          1.3         0.2  setosa      15.04
11## 4          4.6         3.1          1.5         0.2  setosa      14.26
12## 5          5.0         3.6          1.4         0.2  setosa      18.00
13## 6          5.4         3.9          1.7         0.4  setosa      21.06

Now, lets clean up our workspace and remove the old objects we don't care about. Currently, R is remembering all of them and it takes up valuable memory on our computer

 1#remove object we don't care about using rm()
 2rm(x, y, z, ire)
 3
 4#check what objects remain using ls()
 5ls()
 6
 7##  [1] "answer"             "d"                  "dat"
 8##  [4] "mod"                "nonsensecolumndata" "ob"
 9##  [7] "ob1"                "p"                  "p1"
10## [10] "pg"

Modeling and Visualization

This is the fun part!

Lets make a basic plot by plotting the new Area column against Petal Width

1#plotting and a simple model
2#plot(y~x, data)
3
4plot(Sepal.Area ~ Petal.Width, data = dat)

1#can plot with or without color.
2p <- plot(Sepal.Area ~ Petal.Width, data = dat, col = Species)

1#there are lots of plot options - see help section for more.

Plots are cool, but what about models? Lets do a simple linear regression using the lm() command, or "linear model" command. we are going to name our linear model "mod".

 1#linear model
 2mod <- lm(Sepal.Area ~ Petal.Width, data = dat)
 3mod
 4
 5##
 6## Call:
 7## lm(formula = Sepal.Area ~ Petal.Width, data = dat)
 8##
 9## Coefficients:
10## (Intercept)  Petal.Width
11##      15.872        1.627
12
13#how to see p-values etc.
14summary(mod)
15
16##
17## Call:
18## lm(formula = Sepal.Area ~ Petal.Width, data = dat)
19##
20## Residuals:
21##     Min      1Q  Median      3Q     Max
22## -7.4986 -2.0099  0.0646  1.9034 10.8946
23##
24## Coefficients:
25##             Estimate Std. Error t value Pr(>|t|)
26## (Intercept)  15.8718     0.4784  33.176  < 2e-16 ***
27## Petal.Width   1.6268     0.3370   4.828 3.41e-06 ***
28## ---
29## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
30##
31## Residual standard error: 3.135 on 148 degrees of freedom
32## Multiple R-squared:  0.136,  Adjusted R-squared:  0.1302
33## F-statistic: 23.31 on 1 and 148 DF,  p-value: 3.41e-06

Now, lets add this linear model to the plot. To do this in basic R (which is what we've used so far) we use the abline() command. Normally R doesn't care about spaces - all that matters is running the code to the next parentheses. abline() is different and * **needs* _to be right underneath the plot command_.

1#add model line to the plot
2plot(Sepal.Area ~ Petal.Width, data = dat, col = Species)
3abline(mod)

Ooooooo! A fancy plot with a linear model.

Gettin' fancy with packages

Up until this point we have only used the base capabilities of R. This is like painting with only black and white - there are so many more colors! To get more colors, or capabilities, from R we need to use packages.

Packages are sets of algorithms, functions, or mini-programs that can be added to R. These packages range from statistically focused to graphics focused. As R is open source, packages come from R users within the community. Packages that have passed a set of quality checks are hosted on the "CRAN", the same place you downloaded R from. These packages are more reliable than others, however you can also get packages off of places like github, from zip files, or even write your own.

Lets install two packages, dplyr and ggplot, to explore our data more. We can do this two ways, by using the install.packages() command, or by point and click.

1#install packages using install.packages()
2
3install.packages("dplyr")

Installing by Point and Click:

To use the functions within the packages, we need to tell R to use the packages using library(). When doing this, you may get some warnings. Read the warnings to understand what is happening. Most of the warnings are simply telling you that some commands are called the same things within R. If you load a package using library and it has functions with the same names as previously loaded packages, the default function with that name will become the most recently loaded function. See the example below with the "filter" function when we load dplyr. The stats package (which comes pre-loaded) also has a filter function:

1#to use functions in packages, we need to load the packages using library()
2library(dplyr)
3
4#you may get some warnings
5#you can use stats function filter using stats::filter()
6#otherwise R will use dplyr::filter() by default
7
8library(ggplot2)

dplyr

dplyr is an R package that is extremely useful for data manipulation. dplyr has two primary capabilities, moving data and the "pipe" command.

The Pipe Command

The pipe command, %>%, is used in modern R code even when people don't need to manipulate data. It works with other functions and packages and is extremely useful for chaining commands. The pipe command feeds the results of one function into the results of another, without requiring separate values to be assigned to each results. You can think of it like an assembly line:

Now, lets use dplyr to group our data by species and explore some patterns. Notice that when we use the pipe command, we indent the code until the pipe is done. This is best practice for readability and troubleshooting.

 1#speedin' it up with dplyr <- take data and add new columns #dplyr is great for manipulating data #example: we want to group by Species dat %>%
 2  group_by(Species) %>%
 3  summarize(avg = mean(Petal.Length))
 4
 5## # A tibble: 3 x 2
 6##      Species   avg
 7##
 8## 1     setosa 1.462
 9## 2 versicolor 4.260
10## 3  virginica 5.552
11
12# %>%  is the "pipe" - it lets you chain together commands
13#thing of it like a funnel, funneling down your results
14
15#be sure to assign output if you want to use it later on
16
17#main dplyr commands: group_by(), summarize(), filter(), select(), mutate()
18
19#group_by() groups
20#summarize() summarizes
21#filter() selects row subsets based on condition
22#select() selects column subsets
23#mutate() creates new columns
24
25#as above, instead of doing df$col <- df$oldcol * df$othercol
26nonsensecolumndata <- dat %>%
27  mutate(nonsense = Petal.Length - Petal.Width)
28#seeing our new nonsense column
29head(nonsensecolumndata)
30
31##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Area
32## 1          5.1         3.5          1.4         0.2  setosa      17.85
33## 2          4.9         3.0          1.4         0.2  setosa      14.70
34## 3          4.7         3.2          1.3         0.2  setosa      15.04
35## 4          4.6         3.1          1.5         0.2  setosa      14.26
36## 5          5.0         3.6          1.4         0.2  setosa      18.00
37## 6          5.4         3.9          1.7         0.4  setosa      21.06
38##   nonsense
39## 1      1.2
40## 2      1.2
41## 3      1.1
42## 4      1.3
43## 5      1.2
44## 6      1.3
45
46#after the pipe, next line indents automatically
47#best practice to put each section on new line, for readability

ggplot

Now, lets make a fancier plot than the one we did before. Additionally, lets see if we can break things up by species more.

Similarly to dplyr, ggplot uses a special symbol to add commands to object. It doesn't chain commands, but it does let multiple commands act on one object. For ggplot this is the "+" symbol.

GG syntax

ggplot uses slightly different syntax than the simple plot syntax we used earlier. This is very common for packages - if you are reading code and you see syntax you don't recognize, it is probably from a package you are unfamiliar with.

ggplot's syntax can be summarized like this: "first show me the data, then tell me what to do". To show ggplot the data we use the ggplot command, then tell it the data and the axis.

1#fancyin' it up with ggplot
2#aes() is our aesthetics - things from our data we want mapped to something specific
3#like x, y, color, width of lines, groups - anytime we're 'mapping' to a data frame column
4p1 <- ggplot(data = dat, aes(x = Petal.Width, y = Sepal.Area, color = Species))
5p1

This gives us an empty plot! This is because while we've shown it the data, we haven't told it what to do. To do that we need to tell it what form, or "geometry", to put on the graph.

1p1 <- ggplot(data = dat, aes(x = Petal.Width, y = Sepal.Area, color = Species)) +
2  geom_point()
3
4p1

Now we have the plot we expected!

Lets play with ggplot more to explore its facet option. This will create a separate graph by each species:

1#can build on extra from above p1 using +
2
3#have a separate plot for each species
4p1 +
5  facet_wrap(~Species)

1p1 +
2  facet_wrap(~Species, nrow = 3)

1p1 +
2  facet_grid(facet = Species ~.)

Lastly, lets add that linear model back in! First, we will need to understand our linear model a little more. ggplot cannot just add a model line, but it CAN us model coefficients.

 1#we check out mod parts using the structure command, str(mod)
 2
 3#notice this uses the same "$" symbol as the data$column syntax
 4mod$coefficients
 5
 6## (Intercept) Petal.Width
 7##   15.871801    1.626792
 8
 9head(mod$residuals)
10
11##         1         2         3         4         5         6
12##  1.652841 -1.497159 -1.157159 -1.937159  1.802841  4.537483
13
14#We can also use this coef() command to just pull out the model coefficients. This is what we need to add that line to our graph.
15coef(mod)
16
17## (Intercept) Petal.Width
18##   15.871801    1.626792

Adding in the line:

1#add model information
2p1 +
3  facet_wrap(~Species) +
4  geom_abline(intercept = coef(mod)["(Intercept)"],
5              slope = coef(mod) ["Petal.Width"])

1#same model line - our above model wasn't split by groups

Wrapping Up

When you're done with your work session, its best practice is to clear your workspace. You can do this by clicking the broom icon in the environment tab. To leave R, just type q(). In general, when R Studio asks you, do NOT save your workspace image.