Introduction to mapping in R
By Matt Brousil
In this post I'll walk through some of the basics for getting started importing, formatting, and creating maps with spatial data in R. Many students and researchers I know (myself included) have taken GIS coursework and are familiar with tools like ArcMap, but want to work with their data in R. R provides a good alternative to costly and potentially less reproducibility-friendly GIS tools, but it can be challenging to get started! This walkthrough is by no means exhaustive, but hopefully it will provide enough to give you a foothold and begin learning independently. I'll assume you know a bit about GIS to begin with.
In order to follow along with this walkthrough you'll want to install and load the following packages:
1library(tidyverse)
2library(sf)
3library(cowplot)
4library(ggspatial)
1. Quickly plot data for coordinate points
One of the more straightforward things you might want to do with mapping in R is to plot some points.
The built-in quakes
dataset in R is an easy place for us to start. This dataset provides info on earthquakes near Fiji.
Take a look:
1head(quakes)
2
3## lat long depth mag stations
4## 1 -20.42 181.62 562 4.8 41
5## 2 -20.62 181.03 650 4.2 15
6## 3 -26.00 184.10 42 5.4 43
7## 4 -17.97 181.66 626 4.1 19
8## 5 -20.42 181.96 649 4.0 11
9## 6 -19.68 184.31 195 4.0 12
We have a column for the latitude of earthquake observations, one for the longitude, and then additional ones for the depth, magnitude and the number of stations reporting the quake.
Without even loading a package, you would be able to make a very basic map of these points using the plot
function:
1plot(quakes$long, quakes$lat)
However, if you're familiar with ggplot2
you can plot them a little more elegantly with a little bit more code:
1ggplot(quakes) +
2 geom_point(aes(x = long, y = lat))
Perhaps you'd like to make the size of the points dependent on the magnitude of the earthquake:
1ggplot(quakes) +
2 geom_point(aes(x = long, y = lat, size = mag, alpha = 0.4))
Note that right now ggplot
isn't considering this spatial data yet, but just some points with coordinates.
From here you could take things further and format with different colors, plot themes, and plenty more. We'll dive into some of this in the following sections. But let's look at how to import our own data, plot it, and format the plot.
2. Build a map using external point and polygon files and add an inset
For the rest of this walkthrough I'll be using a dataset by Pagano et al.(2019) containing polar bear satellite locations obtained from GPS collars. It's available here. If you're following along, you'll want to download and unzip this folder.
We'll be working mostly with the sf
package now. There are several good vignettes on the package provided here and an explanation of the purpose of the package provided in this publication. Some of the basics are:
-
sf
is built around "simple features". Simple features are a standard for storing spatial data -
sf
was authored by some of the same people as the earlier spatial packagesp
and allows us to work with spatial data using simple features much more effectively -
One major benefit is that
sf
has been written to coexist well with packages in thetidyverse
-
sf
doesn't work well with time series or raster data (see thestars
andraster
packages) -
sf
stores spatial data in data frames (see tidyverse comment above)
2.1 Import, format, test
Let's dive in. To start, we'll load in the dataset. It contains GPS collar data from several adult female polar bears.
1bears <- read_csv(file = "../data/polarBear_argosGPSLocations_alaska_pagano_2014.csv") %>%
2 select(Bear:LocationQuality)
3
4## Warning: Missing column names filled in: 'X7' [7], 'X8' [8], 'X9' [9]
5
6##
7## -- Column specification -----------------------------------------------------------------
8## cols(
9## Bear = col_double(),
10## Type = col_character(),
11## Date = col_datetime(format = ""),
12## latitude = col_double(),
13## longitude = col_double(),
14## LocationQuality = col_character(),
15## X7 = col_logical(),
16## X8 = col_logical(),
17## X9 = col_datetime(format = "")
18## )
You'll notice I included the select
statement above. When reading this file in I found that it contained extra empty columns. A good reminder that it's important to look over your data when you import it into R.
What do the data look like?
1head(bears)
2
3## # A tibble: 6 x 6
4## Bear Type Date latitude longitude LocationQuality
5## <dbl> <chr> <dttm> <dbl> <dbl> <chr>
6## 1 1 GPS 2014-04-10 02:00:38 70.4 -146. G
7## 2 1 Argos 2014-04-10 02:19:12 70.4 -146. L3
8## 3 1 Argos 2014-04-10 02:50:59 70.4 -146. LB
9## 4 1 Argos 2014-04-10 02:53:19 70.4 -146. L1
10## 5 1 GPS 2014-04-10 03:00:38 70.4 -146. G
11## 6 1 GPS 2014-04-10 04:00:38 70.4 -146. G
Just like the earthquake example earlier, we can plot these to get a sense for how they're arranged spatially.
1ggplot(bears) +
2 geom_point(aes(x = longitude, y = latitude))
Knowing that there are multiple bears present in the dataset, we might want to color their points by their ID number. We'll do so by converting the Bear
column to a factor so that R doesn't treat it like an ordinal or continuous variable when assigning colors.
1ggplot(bears) +
2 geom_point(aes(x = longitude, y = latitude, color = as.factor(Bear)))
Another option would be to create a separate map for each bear in the study, and to indicate the progression of time using color. We'll facet by bear ID now, and color by date
1ggplot(bears) +
2 geom_point(aes(x = longitude, y = latitude, color = Date)) +
3 facet_wrap(~ Bear)
It's very handy to be able to plot almost directly from a .csv file like this, but we could also convert this dataset into a more spatial-friendly sf
object. Note that we provide a CRS, or coordinate reference system here. 4326
corresponds to WGS84.
1bears_sf <- st_as_sf(x = bears,
2 coords = c("longitude", "latitude"),
3 # WGS84
4 crs = 4326)
NCEAS has a helpful info sheet about working with CRS in R here if you want to learn more.
What's the result of us creating this new object?
1bears_sf
2
3## Simple feature collection with 9583 features and 4 fields
4## geometry type: POINT
5## dimension: XY
6## bbox: xmin: -158.9622 ymin: 70.0205 xmax: -140.4845 ymax: 75.33249
7## geographic CRS: WGS 84
8## # A tibble: 9,583 x 5
9## Bear Type Date LocationQuality geometry
10## * <dbl> <chr> <dttm> <chr> <POINT [°]>
11## 1 1 GPS 2014-04-10 02:00:38 G (-146.3468 70.39948)
12## 2 1 Argos 2014-04-10 02:19:12 L3 (-146.332 70.4)
13## 3 1 Argos 2014-04-10 02:50:59 LB (-146.328 70.423)
14## 4 1 Argos 2014-04-10 02:53:19 L1 (-146.366 70.395)
15## 5 1 GPS 2014-04-10 03:00:38 G (-146.3469 70.39932)
16## 6 1 GPS 2014-04-10 04:00:38 G (-146.3455 70.39947)
17## 7 1 Argos 2014-04-10 04:31:44 L3 (-146.351 70.396)
18## 8 1 Argos 2014-04-10 04:32:30 L2 (-146.364 70.397)
19## 9 1 GPS 2014-04-10 05:00:38 G (-146.3459 70.39945)
20## 10 1 GPS 2014-04-10 08:00:01 G (-146.3419 70.40024)
21## # ... with 9,573 more rows
We can see that the object is a tibble (data frame) but also has spatial data associated with it. The coordinate data is now stored in the geometry
list column. If you want to pull the coordinates out of that object you can access them with st_coordinates()
.
Plotting sf
objects with ggplot
is remarkably simple using a geom_sf()
layer:
1ggplot(bears_sf) +
2 geom_sf(aes(color = as.factor(Bear)))
2.2 More complex plotting
Let's focus our efforts on plotting the GIS data for just one bear. We'll filter for bear 2's data just like we would if we were doing this operation on a normal data frame.
1bear_two <- bears_sf %>%
2 filter(Bear == 2)
I'd also like to include shapefiles in my map. In this instance, we could use these to add polygon layers for the state of Alaska and for sea ice at the time the bear was being tracked. We can use the st_read()
function to bring in shapefiles.
1# Multipolygon geometry
2alaska <- st_read(dsn = "../data/cb_2018_us_state_5m/cb_2018_us_state_5m.shp") %>%
3 filter(NAME == "Alaska")
4
5## Reading layer `cb_2018_us_state_5m' from data source `C:\Users\matthew.brousil\Documents\R-talks\intro_to_mapping\data\cb_2018_us_state_5m\cb_2018_us_state_5m.shp' using driver `ESRI Shapefile'
6## Simple feature collection with 56 features and 9 fields
7## geometry type: MULTIPOLYGON
8## dimension: XY
9## bbox: xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
10## geographic CRS: NAD83
11
12# Sea ice from the first month that the bear was tracked
13ice <- st_read(dsn = "../data/extent_N_201404_polygon_v3.0/extent_N_201404_polygon_v3.0.shp")
14
15## Reading layer `extent_N_201404_polygon_v3.0' from data source `C:\Users\matthew.brousil\Documents\R-talks\intro_to_mapping\data\extent_N_201404_polygon_v3.0\extent_N_201404_polygon_v3.0.shp' using driver `ESRI Shapefile'
16## Simple feature collection with 145 features and 1 field
17## geometry type: POLYGON
18## dimension: XY
19## bbox: xmin: -3225000 ymin: -4950000 xmax: 3425000 ymax: 5700000
20## projected CRS: NSIDC Sea Ice Polar Stereographic North
Now these files have been brought in largely ready-to-use. Let's preview:
1ggplot(ice) +
2 geom_sf()
As a quick sidenote, the tigris
package for R is an alternative resource for polygon data for US boundaries that you could use. It accesses shapefiles from the US Census and loads them as sf
objects.
Now that we have three separate layers loaded in R it's worth looking to see if they have matching coordinate reference systems. For example, if you wanted to do any direct comparisons of the layers to one another for spatial analysis you'd want to make sure they were compatible. st_crs()
will return each layer's CRS for us.
1st_crs(bear_two)
2
3## Coordinate Reference System:
4## User input: EPSG:4326
5## wkt:
6## GEOGCRS["WGS 84",
7## DATUM["World Geodetic System 1984",
8## ELLIPSOID["WGS 84",6378137,298.257223563,
9## LENGTHUNIT["metre",1]]],
10## PRIMEM["Greenwich",0,
11## ANGLEUNIT["degree",0.0174532925199433]],
12## CS[ellipsoidal,2],
13## AXIS["geodetic latitude (Lat)",north,
14## ORDER[1],
15## ANGLEUNIT["degree",0.0174532925199433]],
16## AXIS["geodetic longitude (Lon)",east,
17## ORDER[2],
18## ANGLEUNIT["degree",0.0174532925199433]],
19## USAGE[
20## SCOPE["unknown"],
21## AREA["World"],
22## BBOX[-90,-180,90,180]],
23## ID["EPSG",4326]]
24
25st_crs(ice)
26
27## Coordinate Reference System:
28## User input: NSIDC Sea Ice Polar Stereographic North
29## wkt:
30## PROJCRS["NSIDC Sea Ice Polar Stereographic North",
31## BASEGEOGCRS["Unspecified datum based upon the Hughes 1980 ellipsoid",
32## DATUM["Not specified (based on Hughes 1980 ellipsoid)",
33## ELLIPSOID["Hughes 1980",6378273,298.279411123064,
34## LENGTHUNIT["metre",1]]],
35## PRIMEM["Greenwich",0,
36## ANGLEUNIT["degree",0.0174532925199433]],
37## ID["EPSG",4054]],
38## CONVERSION["US NSIDC Sea Ice polar stereographic north",
39## METHOD["Polar Stereographic (variant B)",
40## ID["EPSG",9829]],
41## PARAMETER["Latitude of standard parallel",70,
42## ANGLEUNIT["degree",0.0174532925199433],
43## ID["EPSG",8832]],
44## PARAMETER["Longitude of origin",-45,
45## ANGLEUNIT["degree",0.0174532925199433],
46## ID["EPSG",8833]],
47## PARAMETER["False easting",0,
48## LENGTHUNIT["metre",1],
49## ID["EPSG",8806]],
50## PARAMETER["False northing",0,
51## LENGTHUNIT["metre",1],
52## ID["EPSG",8807]]],
53## CS[Cartesian,2],
54## AXIS["easting (X)",south,
55## MERIDIAN[45,
56## ANGLEUNIT["degree",0.0174532925199433]],
57## ORDER[1],
58## LENGTHUNIT["metre",1]],
59## AXIS["northing (Y)",south,
60## MERIDIAN[135,
61## ANGLEUNIT["degree",0.0174532925199433]],
62## ORDER[2],
63## LENGTHUNIT["metre",1]],
64## USAGE[
65## SCOPE["unknown"],
66## AREA["World - N hemisphere - north of 60°N"],
67## BBOX[60,-180,90,180]],
68## ID["EPSG",3411]]
69
70st_crs(alaska)
71
72## Coordinate Reference System:
73## User input: NAD83
74## wkt:
75## GEOGCRS["NAD83",
76## DATUM["North American Datum 1983",
77## ELLIPSOID["GRS 1980",6378137,298.257222101,
78## LENGTHUNIT["metre",1]]],
79## PRIMEM["Greenwich",0,
80## ANGLEUNIT["degree",0.0174532925199433]],
81## CS[ellipsoidal,2],
82## AXIS["latitude",north,
83## ORDER[1],
84## ANGLEUNIT["degree",0.0174532925199433]],
85## AXIS["longitude",east,
86## ORDER[2],
87## ANGLEUNIT["degree",0.0174532925199433]],
88## ID["EPSG",4269]]
Alas, the CRS of the three layers don't match.
Because the three layers all do have a CRS already, we'll want to do transformations to put them in the same system. For this we'll use st_transform()
. If instead you wanted to assign a CRS to an object that did not have one you would use st_crs()
.
1bear_two_pcs <- st_transform(x = bear_two, crs = 3467)
2
3alaska_pcs <- st_transform(x = alaska, crs = 3467)
4
5ice_pcs <- st_transform(x = ice, crs = 3467)
Note that these layers would have plotted successfully even if we didn't transform their CRS. But the plotting would not have been pretty.
With transformation:
1ggplot() +
2 geom_sf(data = alaska_pcs) +
3 geom_sf(data = ice_pcs) +
4 geom_sf(data = bear_two_pcs)
Without transformation:
1ggplot() +
2 geom_sf(data = alaska) +
3 geom_sf(data = ice) +
4 geom_sf(data = bear_two)
Next we'll want to zoom in a bit. It would help to know the rough spatial boundaries of the polar bear data in order to do this well. We can use st_bbox()
to get this
information in the form of a bounding box, save it, then use it to define the boundaries of our plot. I've also added colors to differentiate the sea ice from Alaska's land area.
1limits <- st_bbox(bear_two_pcs)
2
3bear_plot <- ggplot() +
4 geom_sf(data = alaska_pcs, fill = "seashell", color = "snow3") +
5 geom_sf(data = ice_pcs, fill = "lightcyan", color = "snow3") +
6 geom_sf(data = bear_two_pcs, alpha = 0.3) +
7 coord_sf(xlim = c(limits["xmin"], limits["xmax"]),
8 ylim = c(limits["ymin"], limits["ymax"])) +
9 theme_bw()
Now we have a plot that is starting to look usable:
1bear_plot
You might want to add things like a north arrow or a scale bar. The ggspatial
package provides some options for doing this with the functions annotation_scale()
and annotation_north_arrow()
.
1bear_plot <- bear_plot +
2 # Place a scale bar in top left
3 annotation_scale(location = "tr") +
4 annotation_north_arrow(location = "tl",
5 # Use true north
6 which_north = "true",
7 height = unit(0.75, "cm"),
8 width = unit(0.75, "cm"))
9
10bear_plot
For the next step we'll add an inset map to show the geographic context of where this study took place. This map will be zoomed out and show just the state of Alaska for reference. In order to do this we'll need to create a second ggplot
object and then use cowplot::ggdraw()
to combine our two maps.
First, create the inset map. We want something pretty basic. The limits
object we made from st_bbox()
earlier can be plotted if we run it through st_as_sfc()
to create a polygon from it.
1inset_map <- ggplot(data = alaska_pcs) +
2 geom_sf() +
3 geom_sf(data = st_as_sfc(limits), fill = "red") +
4 theme_bw() +
5 xlab("") +
6 ylab("") +
7 theme(axis.title.x = element_blank(),
8 axis.title.y = element_blank(),
9 panel.grid = element_blank(),
10 axis.text = element_blank(),
11 axis.ticks = element_blank(),
12 plot.background = element_blank())
13
14inset_map
Now that we've successfully made a map that we want to place as an inset it's time to combine our two ggplot
objects into one using ggdraw()
. You'll want to play around with the arguments to your inset ggdraw()
call until you've arranged it in a way you like.
1ggdraw() +
2 draw_plot(bear_plot) +
3 draw_plot(plot = inset_map,
4 x = 0.13, # x location of inset placement
5 y = 0.009, # y location of inset placement
6 width = .455, # Inset width
7 height = .26, # Inset height
8 scale = .66) # Inset scale
2.3 Taking this further
There are a couple other places you might want to make changes to the map we've created. For one, you might want to use color to code the bear's path by date to show how long it lingered in some locations.
We first will need to create breakpoints that we can feed to ggplot
for this. We'll use pretty()
from base R to generate equally spaced dates and corresponding labels for the legend.
1date_breaks <- tibble(dates_num = as.numeric(pretty(bear_two_pcs$Date)),
2 dates_lab = pretty(bear_two_pcs$Date))
3
4date_breaks
5
6## # A tibble: 7 x 2
7## dates_num dates_lab
8## <dbl> <dttm>
9## 1 1396310400 2014-04-01 00:00:00
10## 2 1398902400 2014-05-01 00:00:00
11## 3 1401580800 2014-06-01 00:00:00
12## 4 1404172800 2014-07-01 00:00:00
13## 5 1406851200 2014-08-01 00:00:00
14## 6 1409529600 2014-09-01 00:00:00
15## 7 1412121600 2014-10-01 00:00:00
You can use scale_color_viridis_c()
(or another scale_color_
option if you want) and provide it dates_num
as the color breaks and dates_lab
as the break labels. Note that I also add in a fill color for panel.background
to give the the appearance of water.
1bear_time_plot <- ggplot() +
2 geom_sf(data = alaska_pcs, fill = "seashell", color = "snow3") +
3 geom_sf(data = ice_pcs, fill = "lightcyan", color = "snow3") +
4 geom_sf(data = bear_two_pcs, aes(color = as.numeric(Date)), alpha = 0.3) +
5 coord_sf(xlim = c(limits["xmin"], limits["xmax"]),
6 ylim = c(limits["ymin"], limits["ymax"])) +
7 scale_color_viridis_c(name = "Date",
8 breaks = date_breaks$dates_num,
9 labels = date_breaks$dates_lab) +
10 theme_bw() +
11 theme(panel.background = element_rect(fill = "lightseagreen"),
12 panel.grid = element_blank()) +
13 # Place a scale bar in top left
14 annotation_scale(location = "tr") +
15 annotation_north_arrow(location = "tl",
16 # Use true north
17 which_north = "true",
18 height = unit(0.75, "cm"),
19 width = unit(0.75, "cm"))
20
21bear_time_plot
And lastly, you might want to represent the GPS data as a line instead of a series of points. (Read here for a discussion on converting to LINESTRINGS
). You'll need to convert the points first and then you can plot as normal.
1# Convert points to LINESTRING
2bear_two_line <- bear_two_pcs %>%
3 summarise(do_union = FALSE) %>%
4 st_cast("LINESTRING")
5
6bear_time_plot +
7 geom_sf(data = bear_two_line, color = "black") +
8 coord_sf(xlim = c(limits["xmin"], limits["xmax"]),
9 ylim = c(limits["ymin"], limits["ymax"]))
10
11## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
1ggdraw() +
2 draw_plot(bear_time_plot +
3 geom_sf(data = bear_two_line, color = "black") +
4 coord_sf(xlim = c(limits["xmin"], limits["xmax"]),
5 ylim = c(limits["ymin"], limits["ymax"]))) +
6 draw_plot(plot = inset_map,
7 x = 0.03, # x location of inset placement
8 y = 0.009, # y location of inset placement
9 width = .455, # Inset width
10 height = .26, # Inset height
11 scale = .66) # Inset scale
12
13## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
References
-
Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. 2017, updated daily. Sea Ice Index, Version 3. [Monthly ice extent ESRI shapefile, April 2014]. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N5K072F8. [Accessed 2021-02-07].
-
Pagano, A. M., Atwood, T. C. and Durner, G. M., 2019, Satellite location and tri-axial accelerometer data from adult female polar bears (Ursus maritimus) in the southern Beaufort Sea, April-October 2014: U.S. Geological Survey data release, https://doi.org/10.5066/P9VA5I0M.
-
Pebesma E (2018). "Simple Features for R: Standardized Support for Spatial Vector Data." The R Journal, 10(1), 439–446. doi: 10.32614/RJ-2018-009, https://doi.org/10.32614/RJ-2018-009.
-
2018 TIGER/Line Shapefiles (machinereadable data files) / prepared by the U.S. Census Bureau, 2018