R functions and the Apply family

Feb 12, 2020 · 11 min read · apply functions lapply ·

By Vera Pfeiffer

Data for today

 1BadFungi<-read.csv("BadFun.csv", row.names=1)
 2head(BadFungi)
 3
 4##               Botrytis Cladosporium  Monilinia
 5## X10orgedge  0.02131367   0.15160858 0.04504021
 6## X10orgint   0.02654155   0.09718499 0.03900804
 7## X11convedge 0.04959786   0.24664879 0.02560322
 8## X11convint  0.03391421   0.32600536 0.04571046
 9## X15softedge 0.10442359   0.17694370 0.05201072
10## X15softint  0.09195710   0.21796247 0.01300268
11##             Mycosphaerella Penicillium
12## X10orgedge       0.3230563           0
13## X10orgint        0.1786863           0
14## X11convedge      0.6823056           0
15## X11convint       0.5143432           0
16## X15softedge      0.3900804           0
17## X15softint       0.4044236           0
18
19Envi<-read.csv("FunEnvi.csv", row.names=1)
20head(Envi)
21
22##                   PCNM1       PCNM2 Precip MinT VPDmin
23## X10orgedge  -0.11090592 -0.01442449   2.31 37.0   1.15
24## X10orgint   -0.11092923 -0.01445247   2.31 37.0   1.15
25## X11convedge -0.11065756 -0.01294471   2.31 37.0   1.15
26## X11convint  -0.08651768  0.69306694   2.31 37.0   1.15
27## X15softedge -0.11087064 -0.03984493   2.18 37.9   1.27
28## X15softint  -0.11088600 -0.04012614   2.18 37.9   1.27
29##             VPDmax  PropPear PropFruit   PropFor soft org
30## X10orgedge   10.20  7.219268 0.1145916 5.7295780    0   1
31## X10orgint    10.20  7.219268 0.1145916 5.7295780    0   1
32## X11convedge  10.20 66.119330 1.6042818 0.3437747    0   0
33## X11convint   10.20 66.119330 1.6042818 0.3437747    0   0
34## X15softedge  11.28 14.782311 1.4896903 0.0000000    1   0
35## X15softint   11.28 14.782311 1.4896903 0.0000000    1   0

R packages for today

install.packages("CCA"), install.packages("vegan")

R functions

Everything that exists in R is an object, and everything that is done (data transformations, plotting, operations) in R is a function.

There are many functions supplied by the base R software package, as well as libraries of code accessible through open access packages available at R Cran repositories.

You can also write your own functions, and activate them by running them in your script. Or alternatively, you can to use source() to load them from a separate file in your working directory.

Writing functions

Abstracting your code into many small functions is a great way to improve your organization, comfort, and efficiency in R. You can store your functions in .R documents and source them from a script if they are long and obscure the flow of your working version of code. Try to avoid making very long or complex functions that do too much at once.

FUNCTION.NAME = function(PARAMETER.LIST) {

BODY

}

A function call proceeds as follows:

Execution jumps to the first line of the function upon seeing the call FUNCTION.NAME(ARGUMENT.LIST)
Function's PARAMETER.LIST is copied from caller's ARGUMENT.LIST by name or position, and from defaults specified as PARAMETER.NAME=DEFAULT in PARAMETER.LIST
Assignment to function parameters and local variables doesn't affect caller's variables
Code in function is executed until return(EXPRESSION), or until function's closing }
Execution returns to caller; if caller assigned a variable to function it gets EXPRESSION from function's return() or last expression

 1a=4
 2b=5
 3square.a <- function(a = 1, b = 2) {
 4  b=100
 5  c=a*a
 6  return(c)
 7}
 8square.a
 9
10## function(a = 1, b = 2) {
11##   b=100
12##   c=a*a
13##   return(c)
14## }
15
16square.a()
17
18## [1] 1
19
20square.a(a=3)
21
22## [1] 9
23
24square.a(3)
25
26## [1] 9
27
28square.a(3,7)
29
30## [1] 9
31
32a
33
34## [1] 4
35
36b
37
38## [1] 5

You don't have to use a return statement...

 1sum.of.squares <- function(x,y) {
 2  x^2 +y^2
 3}
 4
 5sum.of.squares(5,6)
 6
 7## [1] 61
 8
 9x<-c(2:11)
10y<-c(3:12)
11sum.of.squares(x,y)
12
13##  [1]  13  25  41  61  85 113 145 181 221 265

Function goals:

short
performs one opperation
intuitive name

Functions should primarily help you improve the readability of your code, and secondarily improve your efficiency and organization. It also helps you maintain your code by storing it in separate easily accessible files, rather than in super, super, long whole-project document.

Bigger function example

  1badfungiCC<-CCorA(Envi,BadFungi, stand.X=TRUE,stand.Y=TRUE)
  2summary(badfungiCC)
  3
  4##              Length Class  Mode
  5## Pillai         1    -none- numeric
  6## Eigenvalues    5    -none- numeric
  7## CanCorr        5    -none- numeric
  8## Mat.ranks      2    -none- numeric
  9## RDA.Rsquares   2    -none- numeric
 10## RDA.adj.Rsq    2    -none- numeric
 11## nperm          1    -none- numeric
 12## p.Pillai       1    -none- numeric
 13## p.perm         1    -none- logical
 14## Cy           150    -none- numeric
 15## Cx           150    -none- numeric
 16## corr.Y.Cy     55    -none- numeric
 17## corr.X.Cx     25    -none- numeric
 18## corr.Y.Cx     55    -none- numeric
 19## corr.X.Cy     25    -none- numeric
 20## control       12    how    list
 21## call           5    -none- call
 22
 23badfungiCC$CanCorr
 24
 25## [1] 0.8717640 0.8035587 0.6797292 0.5107511 0.4393814
 26
 27#0.87 0.80
 28#Eigenvalues are canonical correlations squared
 29#Pillai's trace is the sum of squared Canonical correlations
 30print(sum(badfungiCC$CanCorr^2))
 31
 32## [1] 2.321634
 33
 34print(sum(badfungiCC$Eigenvalues))
 35
 36## [1] 2.321634
 37
 38print(badfungiCC$Pillai)
 39
 40## [1] 2.321634
 41
 42#RDA R squares
 43print(badfungiCC$RDA.Rsquares)
 44
 45## [1] 0.1611034 0.4759253
 46
 47# ...tell you how much variance in the *dependent* variables is explained by the independent *canonical variate* (and vice versa)
 48# i.e. X|Y = variance in X variables explained by the Y canonical variate
 49# i.e. Y|X = variance in Y variables explained by the X canonical variate
 50
 51# IMPORTANT
 52# corr.Y.Cy = Correlation of each variable in Y with its Canonical variate (similarly, corr.X.Cx)
 53# Also called Canonical loadings or canonical structure correlations, interpreted as factor loadings
 54# ...tells you how much do individual variables contribute to their own canonical variates
 55badfungiCC$corr.Y.Cy
 56
 57##              CanAxis1    CanAxis2    CanAxis3    CanAxis4
 58## PCNM1     -0.32081197 -0.33933054 -0.21230081 -0.27697487
 59## PCNM2      0.20369029  0.14892281 -0.22857132  0.48254574
 60## Precip    -0.18490434 -0.22563958 -0.08315862 -0.23894850
 61## MinT       0.03450369  0.15228199  0.23966211  0.26635597
 62## VPDmin    -0.13952698 -0.21987239  0.03910030  0.14645347
 63## VPDmax    -0.12043970  0.25484033 -0.05257090 -0.13428438
 64## PropPear   0.45021330  0.69746308 -0.10002690  0.25950088
 65## PropFruit  0.20965421  0.37011541  0.14175772  0.33301858
 66## PropFor   -0.20070175 -0.34619064 -0.19023253 -0.33176122
 67## soft       0.32626004  0.05835548  0.07913943 -0.59234743
 68## org       -0.21437285 -0.34322738 -0.06779914 -0.01384469
 69##              CanAxis5
 70## PCNM1      0.16866057
 71## PCNM2      0.12143280
 72## Precip     0.26614803
 73## MinT      -0.23270298
 74## VPDmin     0.13155815
 75## VPDmax    -0.41006701
 76## PropPear  -0.05107942
 77## PropFruit -0.07686641
 78## PropFor    0.26656009
 79## soft      -0.28857927
 80## org        0.15500380
 81
 82badfungiCC$corr.X.Cx
 83
 84##                   CanAxis1   CanAxis2    CanAxis3
 85## Botrytis       -0.58500580 0.49284541 -0.40984921
 86## Cladosporium   -0.04194937 0.70546350 -0.46226576
 87## Monilinia       0.66393123 0.18551419  0.44195394
 88## Mycosphaerella -0.28847351 0.81947263  0.09200482
 89## Penicillium     0.26637032 0.09997318 -0.14498754
 90##                  CanAxis4    CanAxis5
 91## Botrytis       -0.4107330  0.27963134
 92## Cladosporium    0.4506787  0.28941422
 93## Monilinia      -0.4621564  0.34039386
 94## Mycosphaerella  0.4837810  0.05233234
 95## Penicillium    -0.2441335 -0.91565805
 96
 97# So the Canonical axis 1 correlation is driven by the high influence of the first spatial distance vector and the proportion of pear in the surrounding area - on Botrytis and Monilinia
 98
 99# The second Canonical axis 2 correlation shows an even stronger influence of the proportion of pear, as well as more distributed effects of the proportion of fruit, the first spatial vector, organic management and precipitation on Cladosporium and Mycosphaerella
100
101# ...NOT THAT INFORMATIVE
102# corr.Y.Cx = Correlation of each variable in Y with each Canonical variate in X (similarly corr.X.Cy)
103# Also called canonical cross-loading. Correlation of each variabel with the opposite canonical variate
104# ...tells you how strongly are individual variables correlated with the canonical variates of the dependent/independent set
105
106badfungiCC$corr.Y.Cx
107
108##              CanAxis1    CanAxis2    CanAxis3     CanAxis4
109## PCNM1     -0.27967234 -0.27267200 -0.14430705 -0.141465227
110## PCNM2      0.17756987  0.11966822 -0.15536659  0.246460778
111## Precip    -0.16119295 -0.18131464 -0.05652534 -0.122043214
112## MinT       0.03007908  0.12236752  0.16290533  0.136041613
113## VPDmin    -0.12163460 -0.17668037  0.02657761  0.074801276
114## VPDmax    -0.10499500  0.20477916 -0.03573398 -0.068585896
115## PropPear   0.39247977  0.56045251 -0.06799120  0.132540364
116## PropFruit  0.18276900  0.29740945  0.09635686  0.170089613
117## PropFor   -0.17496457 -0.27818449 -0.12930660 -0.169447413
118## soft       0.28442177  0.04689206  0.05379338 -0.302542112
119## org       -0.18688254 -0.27580334 -0.04608505 -0.007071191
120##              CanAxis5
121## PCNM1      0.07410631
122## PCNM2      0.05335531
123## Precip     0.11694049
124## MinT      -0.10224536
125## VPDmin     0.05780420
126## VPDmax    -0.18017580
127## PropPear  -0.02244335
128## PropFruit -0.03377367
129## PropFor    0.11712154
130## soft      -0.12679636
131## org        0.06810578
132
133badfungiCC$corr.X.Cy
134
135##                   CanAxis1   CanAxis2    CanAxis3
136## Botrytis       -0.50998702 0.39603021 -0.27858646
137## Cladosporium   -0.03656995 0.56688132 -0.31421552
138## Monilinia       0.57879137 0.14907154  0.30040898
139## Mycosphaerella -0.25148083 0.65849435  0.06253836
140## Penicillium     0.23221207 0.08033432 -0.09855226
141##                  CanAxis4    CanAxis5
142## Botrytis       -0.2097823  0.12286480
143## Cladosporium    0.2301846  0.12716321
144## Monilinia      -0.2360469  0.14956272
145## Mycosphaerella  0.2470917  0.02299386
146## Penicillium    -0.1246915 -0.40232309
147
148biplot(badfungiCC)

...

Effect sizes and significant canonical variates for Bad Fungi

 1badfungiCCA<-cc(Envi,BadFungi)
 2#Effect sizes
 3badfungiCCA$ycoef
 4
 5##                      [,1]      [,2]       [,3]      [,4]
 6## Botrytis       -22.438664 11.750585  -8.474913 32.178310
 7## Cladosporium    12.718896  2.526191 -17.735366 -6.120606
 8## Monilinia        2.291911  1.647968   1.267286  1.787275
 9## Mycosphaerella  -3.954103  4.631983   7.749042 -1.847856
10## Penicillium      1.862247  2.207945  -1.680171  1.763883
11##                     [,5]
12## Botrytis        4.676756
13## Cladosporium    5.152196
14## Monilinia       1.122518
15## Mycosphaerella -2.364223
16## Penicillium    -5.048976
17
18badfungiCCA$xcoef
19
20##                   [,1]         [,2]         [,3]
21## PCNM1      20.20318698 -30.87925582  -3.54177300
22## PCNM2       1.22514085  -0.68045395  -2.74271617
23## Precip    -30.26201821  40.71110573  24.93730815
24## MinT        1.26711128  -2.85800379   4.65692142
25## VPDmin    -25.44970665  30.85066149 -11.54735182
26## VPDmax     -5.64031688   6.12719196  -1.63367262
27## PropPear   -0.01383938   0.07172806  -0.02568558
28## PropFruit   0.87891684  -0.85420311   0.48111233
29## PropFor     0.02406463  -0.04232783  -0.05020187
30## soft        2.06630668  -0.45726184   0.57663036
31## org         0.56344937   0.21743181   0.63884373
32##                    [,4]         [,5]
33## PCNM1     -16.952541694 -53.60023442
34## PCNM2      -1.837648383   2.73584459
35## Precip     19.380736253  38.64802445
36## MinT       -3.515535279  -9.39025411
37## VPDmin     18.076110329  38.23013274
38## VPDmax      3.280948570   6.08024432
39## PropPear    0.001137222  -0.06726919
40## PropFruit   0.006018099   0.43610758
41## PropFor     0.005374205   0.12673231
42## soft        1.806271076  -1.39622883
43## org         0.126172135  -2.55539272

FIND WHICH CANONICAL DIMENSIONS ARE SIGNIFICANT

 1ev=(1 - badfungiCCA$cor^2)
 2n=dim(Envi)[1]
 3p=length(Envi)
 4q=length(BadFungi)
 5k=min(p, q)
 6m=n - 3/2 - (p + q)/2
 7w=rev(cumprod(rev(ev)))
 8# initialize
 9d1=d2=f=vector("numeric", k)
10for (i in 1:k) {
11  s=sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
12  si=1/s
13  d1[i]=p * q
14  d2[i]=m * s - p * q/2 + 1
15  r=(1 - w[i]^si)/w[i]^si
16  f[i]=r * d2[i]/d1[i]
17  p=p - 1
18  q=q - 1
19}
20pv=pf(f, d1, d2, lower.tail = FALSE)
21dmat=cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
22
23print(dmat)
24
25##          WilksL         F df1      df2          p
26## [1,] 0.02728644 1.4637894  55 68.38983 0.06724978
27## [2,] 0.11368049 1.1369873  40 58.73361 0.32254775
28## [3,] 0.32086535 0.8348354  27 47.37057 0.68786933
29## [4,] 0.59643919 0.6265401  16 34.00000 0.83991561
30## [5,] 0.80694401 0.6151972   7 18.00000 0.73655072

As a function...

 1TestSignificance <- function(veganCCA,x,y) {
 2
 3ev=(1 - veganCCA$cor^2)
 4n=dim(x)[1]
 5p=length(x)
 6q=length(y)
 7k=min(p, q)
 8m=n - 3/2 - (p + q)/2
 9w=rev(cumprod(rev(ev)))
10# initialize
11d1=d2=f=vector("numeric", k)
12for (i in 1:k) {
13  s=sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
14  si=1/s
15  d1[i]=p * q
16  d2[i]=m * s - p * q/2 + 1
17  r=(1 - w[i]^si)/w[i]^si
18  f[i]=r * d2[i]/d1[i]
19  p=p - 1
20  q=q - 1
21}
22pv=pf(f, d1, d2, lower.tail = FALSE)
23dmat=cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
24
25print(dmat)
26
27}
28
29TestSignificance(badfungiCCA, Envi, BadFungi)
30
31##          WilksL         F df1      df2          p
32## [1,] 0.02728644 1.4637894  55 68.38983 0.06724978
33## [2,] 0.11368049 1.1369873  40 58.73361 0.32254775
34## [3,] 0.32086535 0.8348354  27 47.37057 0.68786933
35## [4,] 0.59643919 0.6265401  16 34.00000 0.83991561
36## [5,] 0.80694401 0.6151972   7 18.00000 0.73655072

Apply function family

Apply functions are implicit loops that iterate over the elements in an object like a dataframe, list, matrix, or array (3-d matrix) and execute some set of functions.

Apply functions are a more efficient way to write a loop - which can help to keep your code organized and efficient. They are also more computationally efficient.

Depending on your inputs and outputs - there is a whole family of apply functions. The first four apply a function over all the elements in a single object.

lapply(X, FUN, ...) list apply; returns a list of values obtatined by applying a function to a vector, list, or data frame.

 1lapply(Envi$Precip, sqrt)
 2
 3## [[1]]
 4## [1] 1.519868
 5##
 6## [[2]]
 7## [1] 1.519868
 8##
 9## [[3]]
10## [1] 1.519868
11##
12## [[4]]
13## [1] 1.519868
14##
15## [[5]]
16## [1] 1.476482
17##
18## [[6]]
19## [1] 1.476482
20##
21## [[7]]
22## [1] 1.476482
23##
24## [[8]]
25## [1] 1.476482
26##
27## [[9]]
28## [1] 1.486607
29##
30## [[10]]
31## [1] 1.486607
32##
33## [[11]]
34## [1] 1.486607
35##
36## [[12]]
37## [1] 1.486607
38##
39## [[13]]
40## [1] 1.593738
41##
42## [[14]]
43## [1] 1.593738
44##
45## [[15]]
46## [1] 1.593738
47##
48## [[16]]
49## [1] 1.593738
50##
51## [[17]]
52## [1] 1.593738
53##
54## [[18]]
55## [1] 1.593738
56##
57## [[19]]
58## [1] 1.479865
59##
60## [[20]]
61## [1] 1.479865
62##
63## [[21]]
64## [1] 1.479865
65##
66## [[22]]
67## [1] 1.479865
68##
69## [[23]]
70## [1] 1.519868
71##
72## [[24]]
73## [1] 1.519868
74##
75## [[25]]
76## [1] 1.584298
77##
78## [[26]]
79## [1] 1.584298
80##
81## [[27]]
82## [1] 1.486607
83##
84## [[28]]
85## [1] 1.486607
86##
87## [[29]]
88## [1] 1.486607
89##
90## [[30]]
91## [1] 1.486607

sapply(X, FUN, ...) Like lapply, but simplifies the output to the more simple data type... may return a vector instead of a list.

1sapply(Envi$Precip, sqrt)
2
3##  [1] 1.519868 1.519868 1.519868 1.519868 1.476482 1.476482
4##  [7] 1.476482 1.476482 1.486607 1.486607 1.486607 1.486607
5## [13] 1.593738 1.593738 1.593738 1.593738 1.593738 1.593738
6## [19] 1.479865 1.479865 1.479865 1.479865 1.519868 1.519868
7## [25] 1.584298 1.584298 1.486607 1.486607 1.486607 1.486607

apply(X, MARGIN, FUN, ...) returns a vector or array or list of values obtatined by applying a function to margins (1 = rows, 2 = columns) of a data frame, matrix or array.

1apply(Envi, MARGIN=2, mean)
2
3##         PCNM1         PCNM2        Precip          MinT
4## -5.086263e-18 -6.604124e-17  2.309333e+00  3.710000e+01
5##        VPDmin        VPDmax      PropPear     PropFruit
6##  1.216000e+00  1.051000e+01  2.711236e+01  1.443854e+00
7##       PropFor          soft           org
8##  9.862514e+00  3.333333e-01  3.333333e-01

tapply(X, INDEX, FUN = NULL) Table apply, this apply statement requires an index values (object of the same length as X), and allows you to apply a function to a subset of your data.

1tapply(Envi$MinT, Envi$Precip, mean)
2
3## 2.18 2.19 2.21 2.31 2.51 2.54
4## 37.9 37.9 37.3 37.0 35.8 36.3

... and...

The multiple arguments apply function allows you to apply a function across multiple objects... takes several vectors, and applies the function to the first object of each, then the second object of each... can be different lengths

mapply(FUN, ...) returns a vector or array or list of values obtatined by applying a function to margins of a matrix or array.

1x= 1:4
2y= 5:8
3z= 9:12
4mapply(sum, x, y, z)
5
6## [1] 15 18 21 24

MAP(FUN, ...) is apparently similar and often preferable version, that does not simplify data structures or break lazy argument loading norms.