Date and Time data in R
Author: Stephanie Labou
Getting started
First, let's read in the date/time data created for this lesson. We'll read in the data with strings as character data, not factors.
1dat_orig <- read.csv("date_time_examples.csv", stringsAsFactors = FALSE)
What does this data look like? What structure is it?
1dat_orig
2
3## dates_only date_times
4## 1 12/5/2017 5-12-2017 5:23:10
5## 2 12/13/2017 13-12-2017 3:10:45
6## 3 1/9/2018 9-1-2018 14:15:10
7## 4 2/15/2018 15-2-2018 19:25:05
8## 5 2/21/2018 21-2-2018 8:55:35
9
10str(dat_orig)
11
12## 'data.frame': 5 obs. of 2 variables:
13## $ dates_only: chr "12/5/2017" "12/13/2017" "1/9/2018" "2/15/2018" ...
14## $ date_times: chr "5-12-2017 5:23:10" "13-12-2017 3:10:45" "9-1-2018 14:15:10" "15-2-2018 19:25:05" ...
We'll make a new dataframe so we can compare changes we make to what we read in originally.
1dat <- dat_orig
We want to convert these character date formats into something R wil recognize as dates.
The "Date" format
R has built-in functions that can convert character data to a date format. The most popular of these is as.Date.
For example, if I want to turn the "dates_only" column into a recognizable date format, I would run:
1dat$dates_only <- as.Date(dat$dates_only, format = "%m/%d/%Y")
Now when we look at our data, we see that it is a "Date" class and the dates are all in a new standard format.
1str(dat)
2
3## 'data.frame': 5 obs. of 2 variables:
4## $ dates_only: Date, format: "2017-12-05" "2017-12-13" ...
5## $ date_times: chr "5-12-2017 5:23:10" "13-12-2017 3:10:45" "9-1-2018 14:15:10" "15-2-2018 19:25:05" ...
We also see that our date data has been restructured to be year-month-day.
1dat
2
3## dates_only date_times
4## 1 2017-12-05 5-12-2017 5:23:10
5## 2 2017-12-13 13-12-2017 3:10:45
6## 3 2018-01-09 9-1-2018 14:15:10
7## 4 2018-02-15 15-2-2018 19:25:05
8## 5 2018-02-21 21-2-2018 8:55:35
The "format" argument in as.Date() specifies what format the data are in originally. In this case, we have a "month/day/year" format. Since date data comes in all kinds of formats, built-in codes include:
%d | Day of the month (decimal number) |
%m | Month (decimal number) |
%b | Month (abbreviated) |
%B | Month (full name) |
%y | Year (2 digit) |
%Y | Year (4 digit) |
It is very important to specify the date format of the character data you are trying to convert to a "Date" format! If you don't specify a format, things can go very wrong with your data:
1# Make new data frame
2dat2 <- dat_orig
3
4# Use as.Date() without specifying format
5dat2$dates_only <- as.Date(dat2$dates_only)
6
7# Check out the result
8dat2
9
10## dates_only date_times
11## 1 0012-05-20 5-12-2017 5:23:10
12## 2 <NA> 13-12-2017 3:10:45
13## 3 0001-09-20 9-1-2018 14:15:10
14## 4 <NA> 15-2-2018 19:25:05
15## 5 <NA> 21-2-2018 8:55:35
Something went awry here with no error. Not good!
In some cases, you may encounter actual errors instead of behind-the-scenes shenanigans. For example:
1as.Date("2.4.2017")
2
3## Error in charToDate(x): character string is not in a standard unambiguous format
POSIXct format
POSIXct stands for "Portable Operating System Interface calendar time." The name is a mouthful, but POSIXct is useful when you time data along with date data, which we have in the second column of our data.
1## dates_only date_times
2## 1 12/5/2017 5-12-2017 5:23:10
3## 2 12/13/2017 13-12-2017 3:10:45
4## 3 1/9/2018 9-1-2018 14:15:10
5## 4 2/15/2018 15-2-2018 19:25:05
6## 5 2/21/2018 21-2-2018 8:55:35
For instance, what happens when we try to use as.Date() with date time data?
1## [1] "2017-12-05" "2017-12-13" "2018-01-09" "2018-02-15" "2018-02-21"
Even if we try to specify a day/month/year hour:minute:second format, we lose the time information.
The equivalent function as.POSIXct can handle time data as well as date data.
1as.POSIXct(dat3$date_times, format = "%d-%m-%Y %H:%M:%S")
2
3## [1] "2017-12-05 05:23:10 PST" "2017-12-13 03:10:45 PST"
4## [3] "2018-01-09 14:15:10 PST" "2018-02-15 19:25:05 PST"
5## [5] "2018-02-21 08:55:35 PST"
Note that this defaulted to Pacific Standard Time. We can change this by specifying which timezone we want.
1as.POSIXct(dat3$date_times, format = "%d-%m-%Y %H:%M:%S", tz = "GMT")
2
3## [1] "2017-12-05 05:23:10 GMT" "2017-12-13 03:10:45 GMT"
4## [3] "2018-01-09 14:15:10 GMT" "2018-02-15 19:25:05 GMT"
5## [5] "2018-02-21 08:55:35 GMT"
If we assign the date_times column to be a date using POSIXct, we can see this reflected in the data structure.
1# Change date_times column to be POSIXct date
2dat3$date_times <- as.POSIXct(dat3$date_times, format = "%d-%m-%Y %H:%M:%S", tz = "GMT")
3
4str(dat3)
5
6## 'data.frame': 5 obs. of 2 variables:
7## $ dates_only: chr "12/5/2017" "12/13/2017" "1/9/2018" "2/15/2018" ...
8## $ date_times: POSIXct, format: "2017-12-05 05:23:10" "2017-12-13 03:10:45" ...
So far, this seems pretty similar to as.Date() output, except we can include hours/minutes/seconds. On the surface, this is true, but differences appear when we look closer.
Let's check out what's going on behind-the-scenes with Date objects.
1unclass(dat$dates_only)
2
3## [1] 17505 17513 17540 17577 17583
Unexpected! Looking closer at the Dates documentation, we see that "dates are represented as the number of days since 1970-01-01, with negative values for earlier dates."
What about as.POSIXct?
1unclass(dat3$date_times)
2
3## [1] 1512451390 1513134645 1515507310 1518722705 1519203335
4## attr(,"tzone")
5## [1] "GMT"
Behind the scenes, POSIXct stores data as seconds since the Unix epoch.
POSIXlt format
POSIXlt stands for "Portable Operating System Interface local time." The syntax (as.POSIXlt) and output look extremely similar to POSIXct.
1# New dataframe
2dat4 <- dat_orig
3
4# Use POSIXlt for dates
5as.POSIXlt(dat4$date_times, format = "%d-%m-%Y %H:%M:%S")
6
7## [1] "2017-12-05 05:23:10 PST" "2017-12-13 03:10:45 PST"
8## [3] "2018-01-09 14:15:10 PST" "2018-02-15 19:25:05 PST"
9## [5] "2018-02-21 08:55:35 PST"
As with Date and POSIXct, the difference is behind the scenes: POSIXlt stores data as a list of day, month, year, hour, minute, second, and attributes.
1# Format dates using POSIXlt
2dat4$date_times <- as.POSIXlt(dat4$date_times, format = "%d-%m-%Y %H:%M:%S")
3
4# Check data structure
5str(dat4)
6
7## 'data.frame': 5 obs. of 2 variables:
8## $ dates_only: chr "12/5/2017" "12/13/2017" "1/9/2018" "2/15/2018" ...
9## $ date_times: POSIXlt, format: "2017-12-05 05:23:10" "2017-12-13 03:10:45" ...
10
11# Unclass to see behind the scenes
12unclass(dat4$date_times)
13
14## $sec
15## [1] 10 45 10 5 35
16##
17## $min
18## [1] 23 10 15 25 55
19##
20## $hour
21## [1] 5 3 14 19 8
22##
23## $mday
24## [1] 5 13 9 15 21
25##
26## $mon
27## [1] 11 11 0 1 1
28##
29## $year
30## [1] 117 117 118 118 118
31##
32## $wday
33## [1] 2 3 2 4 3
34##
35## $yday
36## [1] 338 346 8 45 51
37##
38## $isdst
39## [1] 0 0 0 0 0
40##
41## $zone
42## [1] "PST" "PST" "PST" "PST" "PST"
43##
44## $gmtoff
45## [1] NA NA NA NA NA
Packages for date/time data
Everything we've done so far as been using built-in R functions. So let's say we get our data in the right format for the type of data.
1# Make useable dataframe
2dat_use <- dat_orig
3
4# Format dates_only to be Date objects
5dat_use$dates_only <- as.Date(dat_use$dates_only, format = "%m/%d/%Y")
6
7# Format date_times to be POSIXct objects
8dat_use$date_times <- as.POSIXct(dat_use$date_times, format = "%d-%m-%Y %H:%M:%S")
9
10# Check out structure
11str(dat_use)
12
13## 'data.frame': 5 obs. of 2 variables:
14## $ dates_only: Date, format: "2017-12-05" "2017-12-13" ...
15## $ date_times: POSIXct, format: "2017-12-05 05:23:10" "2017-12-13 03:10:45" ...
16
17# Check out data
18dat_use
19
20## dates_only date_times
21## 1 2017-12-05 2017-12-05 05:23:10
22## 2 2017-12-13 2017-12-13 03:10:45
23## 3 2018-01-09 2018-01-09 14:15:10
24## 4 2018-02-15 2018-02-15 19:25:05
25## 5 2018-02-21 2018-02-21 08:55:35
The package lubridate is a wrapper for POSIXct objects and also works for Date objects.
Once lubridate is installed, use library() to make the lubridate functions accessible.
1library(lubridate)
Note that this masks "date" from the base package, which may not be what you want for certain analyses.
Lubridate is useful for wrangling date data, or extracting data subsets.
For instance, extract months:
1# Use month to get months date
2month(dat_use$dates_only)
3
4## [1] 12 12 1 2 2
or years
1# Use year to get years from date
2year(dat_use$date_times)
3
4## [1] 2017 2017 2018 2018 2018
Another frequently used package is zoo, especially in cases of time series data.
1library(zoo)
Again, we see masking - now as.Date is masked from base!
Zoo is especially useful for working with monthly data.
1# Use as.yearmon to get year and month data
2as.yearmon(dat_use$dates_only)
3
4## [1] "Dec 2017" "Dec 2017" "Jan 2018" "Feb 2018" "Feb 2018"
Other date/time packages include: chron, timeDate, and xts. What package you'll need depends on what kinds of analysis you're doing and how complex your data are. In most cases, base and lubridate/zoo are sufficient for data wrangling, while other packages are primarily aimed at facilitating time series modeling.
Additional resrouces
Overview of dates and times in R from Berkeley stats department
Walk-through of Date, POSIXct, and POSIXlt from R-bloggers
CRAN vignette about lubridate
RStudio lubridate cheat sheet (scroll to "Dates and Times Cheat Sheet")
Stack Overflow post about POSIXct vs POSIXlt
DataCamp xts cheat sheet
Post on time zone conversions