A Tour of R Markdown

By Matt Brousil

Graduate students and other researchers often find themselves pasting figures or tables into Microsoft Word or other word processing software in order to share them with collaborators or PIs. Anyone who has tried to format text and images in the same Word document knows that this is a harrowing experience. Luckily, the rmarkdown package in R allows us to avoid this altogether. R Markdown lets us combine narrative text (e.g., an Intro, Methods, Discussion) with code (R or some other languages), figures, and even interactive effects. Not only can this be more reliable than using software like Word, it is also more reproducible and allows us to explain the thoughts behind our scripts in the same file we use to flesh out the script.

R Markdown's capabilities are also very extensive. You can generate things like HTML, PDF, or Word reports; websites (like this blogs post!); slideshows; and interactive documents that use Shiny.

In this walkthrough we'll take a look at the basics of R Markdown and finish up by generating an example report. It will look something like this:

1. Install

The first thing you'll need to do is install some packages. The process is described by Yihui Xie here but if you are an RStudio user, the general process is:

  1. Install the rmarkdown and tinytex packages.
  2. Use the function tinytex::install_tinytex() to give yourself the ability to generate PDF documents via LaTex with R Markdown.

2. Basic overview

Before getting started, note that I am writing this walkthrough with RStudio users in mind. If you don't use RStudio, you should refer to Yihui Xie's instructions again.

To get started making an R Markdown document, you can go to File > New File > R Markdown in RStudio. This will generate a pop-up that looks like this:

Provide it with a title for your document, the name of the author, and the type of document that you'd like to produce. HTML is typically the most reliable in my experience, and it often is best at formatting large tables and R console output.

Click OK and you should have a document that looks like this:

Note that there are three parts to this document.

The top of the document has a header that looks like this:

1---
2title: "Untitled"
3author: "Matt Brousil"
4date: "September 11, 2019"
5output: html_document
6---

This YAML text specifies some of the header info and formatting of your output document. There are guidelines on editing it here.

Then there is plain text in the main document:

This is how you include narrative text within your document. It will be rendered pretty much as you see it in the final document, but there are options to do things like bold the text, add links, images, etc.

Lastly, there are code chunks. These will be in the R language for us, but you can include several other languages as well.

These chunks of code are run by R when you compile your final document. Each code chunk can be run using the green arrow on the right side of the chunk.

Feel free to just delete everything in your new R Markdown document except the YAML header for our example. Then go ahead and save. You'll notice that the filetype for the R Markdown document is .Rmd

How to create the output HTML or PDF files?

At any time you can have R generate the HTML or PDF file you're hoping to create by clicking the Knit button. This also includes a dropdown that will let you switch between PDF, HTML, and Word outputs.


Basic syntax

There are a few key things you'll want to know how to do in R Markdown.

2.1. Make headers

In R Markdown you can make headers using the # symbol. There are several tiers:


 # The biggest header 

The biggest header


 ## The second biggest header 

The second biggest header


 ###### The smallest header 
The smallest header

2.2. Write basic text

To write normal text, just type within the main document as you normally would in a word processing program. For example:


The following section contains the results of my statistical analysis. After
performing this analysis I found a significant relationship between my variable of
interest and the experimental treatments.

The following section contains the results of my statistical analysis. After performing this analysis I found a significant relationship between my variable of interest and the experimental treatments.


However, do note that you'll need to include two spaces at the end of a line to create line breaks in your raw text. This is easily overlooked.

2.3. Format text

You can bold and italicize text using asterisks (*):


 **bold text** 

bold text


 *italic text* 

italic text


You can also do other things such as

 super^script^ 

super^script^

and adding links:

 [the text to display](www.google.com) 

the text to display


2.4. Add in code!

Here's the fun part! Now that we know how to insert and format text, you can also add in chunks of R code. Go to Code > Insert Chunk in RStudio to insert a chunk. You can also just type out a chunk manually using backticks and {r} to indicate R language.

```{r}
print("My first R Markdown code")
```

R will show both the code and its result, by default.

1print("My first R Markdown code")
1## [1] "My first R Markdown code"

In some cases you might want to include code but not print it out in the final document, e.g. including only the resulting figure or result. You can hide the raw code using echo = FALSE:

```{r echo=FALSE}
print("My first R Markdown code")
```
1## [1] "My first R Markdown code"

Code can be inserted into your narrative text (e.g. in a sentence) by using backticks with the letter 'r':

 The iris dataset contains the following column names: `r names(iris)`. 

The iris dataset contains the following column names: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species.


3. Report example

Now let's put the topics from the above section together! Below I provide a very brief template (and its knitted output) showing how you might write up a post-analysis report to share with an advisor or PI. There's plenty to customize, but this is a skeleton that might be of use. You can download this script as a .Rmd file here along with the associated dataset.


The script:

# Update on my thesis progress

```{r echo=FALSE, message=FALSE, warning=FALSE}
# This chunk won't be printed: We're just loading a package, so it might not be something that we want to have take up space in the final doc.
library(tidyverse)
library(ggpubr)
library(knitr)
```

## New data  
I recently finished collecting additional data for my research project. In this
document I will load it in and explore it.

1. Load the data
```{r, }
new_data <- read.csv(file = "new_data.csv")
```

2. Here's how the newly collected data are formatted
```{r}
str(new_data)

sample_n(tbl = new_data, size = 20)
```

3. Visualize the new data
```{r}
ggplot(data = new_data) +
  geom_point(aes(x = season, y = growth, color = season))
```

4. I ran a statistical test on the new data. Here are the results
```{r}
new_anova <- lm(formula = growth ~ season, data = new_data)

anova(new_anova) %>% kable()
```

5. Now I plot the significance to illustrate the results
```{r}
ggplot(data = new_data,
       aes(x = season, y = growth, fill = season)) +
  geom_boxplot() +
  stat_compare_means(comparisons = list(c("spring", "summer"),
                                        c("spring", "fall"), 
                                        c("summer", "fall"))) +
  stat_compare_means(method = "anova")
```

The knitted output:

Update on my thesis progress

New data

I recently finished collecting additional data for my research project. In this document I will load it in and explore it.

  1. Load the data
1new_data <- read.csv(file = "new_data.csv")
  1. Here's how the newly collected data are formatted
1str(new_data)
1## 'data.frame':	99 obs. of  2 variables:
2##  $ growth: num  100.9 96.3 97.3 102.5 101.3 ...
3##  $ season: chr  "spring" "spring" "spring" "spring" ...
1sample_n(tbl = new_data, size = 20)
 1##       growth season
 2## 1  103.16409 summer
 3## 2   91.36536   fall
 4## 3   97.43616   fall
 5## 4   91.59732   fall
 6## 5  103.16286 summer
 7## 6  102.25230   fall
 8## 7   99.32137 summer
 9## 8   96.80343   fall
10## 9  100.50303 summer
11## 10  97.32140 spring
12## 11 109.53290 summer
13## 12  94.64687 summer
14## 13 101.69457 spring
15## 14 100.36823   fall
16## 15 101.03978   fall
17## 16 100.43983   fall
18## 17  97.98659 spring
19## 18  99.34483 spring
20## 19 100.62673   fall
21## 20 108.72072 summer
  1. Visualize the new data
1ggplot(data = new_data) +
2  geom_point(aes(x = season, y = growth, color = season))

  1. I ran a statistical test on the new data. Here are the results
1new_anova <- lm(formula = growth ~ season, data = new_data)
2
3anova(new_anova) %>% kable()
Df Sum Sq Mean Sq F value Pr(>F)
season 2 233.8816 116.94082 5.015241 0.008479
Residuals 96 2238.4406 23.31709 NA NA
  1. Now I plot the significance to illustrate the results
1ggplot(data = new_data,
2       aes(x = season, y = growth, fill = season)) +
3  geom_boxplot() +
4  stat_compare_means(comparisons = list(c("spring", "summer"),
5                                        c("spring", "fall"), 
6                                        c("summer", "fall"))) +
7  stat_compare_means(method = "anova")


4. References:

Cheat sheets:

Additionally, there are some great cheat sheets available for R Markdown: